Solr Replication, Load Balancing, haproxy and Drupal
I use Apache Solr for search on several projects, including a few using Drupal. Solr has built in support for replication and load balancing, unfortunately the load balancing is done on the client side and works best when using a persistent connection, which doesn't make a lot of sense for php based webapps. In the case of Drupal, there has been a long discussion on a patch in the issue queue to enable Solr's native load balancing, but things seem to have stalled.
In one instance I have Solr replicating from the master to a slave, with the plan to add additional slaves if the load justifies it. In order to get Drupal to write to the master and read from either node I needed a proxy or load balancer. In my case the best lightweight http load balancer that would easily run on the web heads was haproxy. I could have run varnish in front of solr and had it do the load balancing but that seemed like overkill at this stage.
Now when an update request hits haproxy it directs it to the master, but for reads it balances the requests between the 2 nodes. To get this setup running on ubuntu 9.10 with haproxy 1.3.18, I used the following /etc/haproxy/haproxy.cfg on each of the web heads:
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
maxconn 4096
nbproc 4
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
retries 3
maxconn 2000
balance roundrobin
stats enable
stats uri /haproxy?stats
frontend solr_lb
bind localhost:8080
acl master_methods method POST DELETE PUT
use_backend master_backend if master_methods
default_backend read_backends
backend master_backend
server solr-a 192.168.201.161:8080 weight 1 maxconn 512 check
backend slave_backend
server solr-b 192.168.201.162:8080 weight 1 maxconn 512 check
backend read_backends
server solr-a 192.168.201.161:8080 weight 1 maxconn 512 check
server solr-b 192.168.201.162:8080 weight 1 maxconn 512 check
To ensure the configuration is working properly run
wget http://localhost:8080/solr -O -on each of the web heads. If you get a connection refused message haproxy may not be running. If you get a 503 error make sure solr/jetty/tomcat is running on the solr nodes. If you get some html output which mentions Solr, then it should be working properly.
For Drupal's apachesolr module to use this configuration, simply set the hostname to localhost and the port to 8080 in the module configuration page. Rebuild your search index and you should be right to go.
If you had a lot of index updates then you could consider making the master write only and having 2 read only slaves, just change the IP addresses to point to the right hosts.
For more information on Solr replication refer to the Solr wiki, for more information on configuring haproxy refer to the manual. Thanks to Joe William and his blog post on load balancing couchdb using haproxy which helped me get the configuration I needed after I decided what I wanted.
RSS Feed
Solr Replication, Load Balancing, haproxy and Drupal
Phine wrote:Recently, I came across an interesting article which shared deep insights on built in concept of replication in Solr.U can refer to http://www.lucidimagination.com/blog/2009/05/31/solr-index-replication/ for detailed information.
Need to Clarify haproxy with multiple solr servers
Ashok wrote:hi,
Please let us know how can configure haproxy with multiple solr servers and also let me know how can verify data are going to both servers.
Thanks
security problem ahead
Glenn Plas wrote:Be aware that anyone who can navigate to your solr core dashboard can actually wreck havoc and drop a core.
RE: security problem ahead
Dave wrote:@Glenn the security issue you've identified is a problem for all Solr instances regardless of the use of haproxy. Various options are available for restricting access to the dashboard including jetty/tomcat/haproxy config or iptables.