Solr Replication, Load Balancing, HAProxy and Drupal
I use Apache Solr for search on several projects, including a few with Drupal. Solr has built in support for replication and load balancing, unfortunately the load balancing is done on the client side and works best when using a persistent connection, which doesn’t make a lot of sense for PHP based webapps. In the case of Drupal, there has been a long discussion on a patch in the issue queue to enable Solr’s native load balancing, but things seem to have stalled.
In one instance I have Solr replicating from the primary to a secondary, with the plan to add additional secondary backends if the load justifies it. In order to get Drupal to write to the primary and read from either node I needed a proxy or load balancer. In my case the best lightweight http load balancer that would easily run on the web heads was haproxy. I could have run varnish in front of Solr and had it do the load balancing but that seemed like overkill at this stage.
Now when an update request hits HAProxy it directs it to the primary, but
for reads it balances the requests between the 2 nodes. To get this
setup running on ubuntu 9.10 with HAProxy 1.3.18, I used the following
/etc/haproxy/haproxy.cfg on each of the web heads:
global log 127.0.0.1 local0 log 127.0.0.1 local1 notice maxconn 4096 nbproc 4 user haproxy group haproxy daemon defaults log global mode http option httplog option dontlognull retries 3 maxconn 2000 balance roundrobin stats enable stats uri /haproxy?stats frontend solr_lb bind localhost:8080 acl primary_methods method POST DELETE PUT use_backend primary_backend if primary_methods default_backend read_backends backend primary_backend server solr-a 192.168.201.161:8080 weight 1 maxconn 512 check backend secondary_backend server solr-b 192.168.201.162:8080 weight 1 maxconn 512 check backend read_backends server solr-a 192.168.201.161:8080 weight 1 maxconn 512 check server solr-b 192.168.201.162:8080 weight 1 maxconn 512 check
To ensure the configuration is working properly run
wget http://localhost:8080/solr -O - on each of the web heads.
If you get a connection refused message HAProxy may not be running.
If you get a 503 error make sure solr/jetty/tomcat is running on
the Solr nodes. If you get some html output which mentions Solr,
then it should be working properly.
For Drupal’s apachesolr module to use this configuration, set the hostname to localhost and the port to 8080 in the module configuration page. Rebuild your search index and you should be right to go.
If you had a lot of index updates then you could consider making the primary write only and having 2 read only secondary back ends, just change the IP addresses to point to the right hosts.
For more information on Solr replication refer to the Solr wiki, for more information on configuring HAProxy refer to the manual. Thanks to Joe William and his blog post on load balancing CouchDB using haproxy which helped me get the configuration I needed after I decided what I wanted.