cf-proxy README

1) Overview

The cf-proxy can be used for proxying content links. The step 'Proxy URL'
translates a linke like
  http://site.com/d?page=17
into
  http://999999.cfproxy.idexdata.com/site.com/d?page=17
and saves the state of the browser in a session file (here 999999). When the
results get displayed on the user's screen, this is where the link points to.
When the user clicks on the link, the request comes to the proxy, which does
its best to re-establish the session, place the right cookies, etc, and then
fetches the original page and returns that. The result is that the user sees
the page linked to, without having to log in to the content provider's site.


The proxy consists of four parts:
  1. The 'Proxy URL' step that rewrites the URL and saves the session
  2. Apache configuration for proxying incoming requests to the perl script,
     and for translating URLs in the links etc.
  3. A perl scrips loads the session file, and acts as a HTTP client, fetching
     the page from the original URL.
  4. DNS setup that directs all proxied URLs to the script

2) Installation

The proxy should be installed on the same machine as the cf-engine, since they
need to share session data. It might be possible to use different machines and some
shared storage for this, but we have not tried this out.

2.1) Requirements

Apache needs to be installed and running on the machine.
We need the following modules enabled: 
  proxy-http
  proxy-html
  headers

We need the perl modules
  CGI 
  POSIX 
  File::stat
  LWP::UserAgent
  HTTP::Cookies
These should be standard on most Linuxes.

On Debian machines you may need to edit the html proxy config file
  /etc/apache2/mods-available/proxy_html.conf 
uncomment the two lines about frames. Restart apache.

On CentOS/Redhat boxes, there seems to be no default file. One should be
distributed with the software. Copy or link that to /etc/httpd/conf.d and
restart apache.


2.2) Apache configuration
There is an example file apache-config.

2.3) DNS configuration
We need a wildcard entry for *.cfproxy.indexdata.com, and another name cfproxy2.indexdata.com - These can be different for different installations. The names
in the apache configuration, and in the cproxy.cfg must match these!

2.4) Configuration file
The proxy expects to find a config file at /etc/cf-proxy/cproxy.cfg
It should contain three lines like these:
  proxyhostname: cfproxy.indexdata.com
  sessiondir: /tmp
  cfengine: localhost:9000


Here, cfengine, is the host:port of Metaproxy running, locally. It
will be contacted via SRU.

2.5) Installation on Debian
  apt-get install apache2 libapache2-mod-proxy-html
  apt-get install libwww-perl

  a2enmod proxy-http
  a2enmod proxy-html
  a2enmod headers

  (edit in .../cf/cfproxy/apache-config)
  cd /etc/apache2/sites-available
  ln -s /home/heikki/cf/cfproxy/apache-config ./cfproxy
  a2ensite cfproxy

TODO - Need to package the proxy for Debian, and update these instructions


2.6) Installation on RedHat

TODO - Something like Debian above ;-)


