Blog

Solr Replication, Load Balancing, HAProxy and Drupal

I use Apache Solr for search on several projects, including a few with Drupal. Solr has built in support for replication and load balancing, unfortunately the load balancing is done on the client side and works best when using a persistent connection, which doesn’t make a lot of sense for PHP based webapps. In the case of Drupal, there has been a long discussion on a patch in the issue queue to enable Solr’s native load balancing, but things seem to have stalled.

In one instance I have Solr replicating from the primary to a secondary, with the plan to add additional secondary backends if the load justifies it. In order to get Drupal to write to the primary and read from either node I needed a proxy or load balancer. In my case the best lightweight http load balancer that would easily run on the web heads was haproxy. I could have run varnish in front of Solr and had it do the load balancing but that seemed like overkill at this stage.

Now when an update request hits HAProxy it directs it to the primary, but for reads it balances the requests between the 2 nodes. To get this setup running on ubuntu 9.10 with HAProxy 1.3.18, I used the following /etc/haproxy/haproxy.cfg on each of the web heads:

global
    log 127.0.0.1   local0
    log 127.0.0.1   local1 notice
    maxconn 4096
    nbproc 4
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    retries 3
    maxconn 2000
    balance roundrobin
    stats enable
    stats uri /haproxy?stats

frontend solr_lb
    bind localhost:8080
    acl primary_methods method POST DELETE PUT
    use_backend primary_backend if primary_methods
    default_backend read_backends

backend primary_backend
    server solr-a 192.168.201.161:8080 weight 1 maxconn 512 check

backend secondary_backend
    server solr-b 192.168.201.162:8080 weight 1 maxconn 512 check

backend read_backends
    server solr-a 192.168.201.161:8080 weight 1 maxconn 512 check
    server solr-b 192.168.201.162:8080 weight 1 maxconn 512 check

To ensure the configuration is working properly run wget http://localhost:8080/solr -O - on each of the web heads. If you get a connection refused message HAProxy may not be running. If you get a 503 error make sure solr/jetty/tomcat is running on the Solr nodes. If you get some html output which mentions Solr, then it should be working properly.

For Drupal’s apachesolr module to use this configuration, set the hostname to localhost and the port to 8080 in the module configuration page. Rebuild your search index and you should be right to go.

If you had a lot of index updates then you could consider making the primary write only and having 2 read only secondary back ends, just change the IP addresses to point to the right hosts.

For more information on Solr replication refer to the Solr wiki, for more information on configuring HAProxy refer to the manual. Thanks to Joe William and his blog post on load balancing CouchDB using haproxy which helped me get the configuration I needed after I decided what I wanted.