03 April 2011
for those who don't know sphinx yet, sphinx is http://sphinxsearch.com/ full text search engine.

1. geo distance

It has been quite popular for location application recently. see you have a user table with lat/lon data, and you want to find out the nearest people or order by distance.

it's pretty hard to do it with SQL but very easy with sphinx. sample:

* the config file:
sql_query = SELECT user_id, radians(longitude) as longitude, radians(latitude) as latitude FROM user_location
sql_attr_float = longitude
sql_attr_float = latitude
* perl sample code:
my $pi = atan2(1,1) * 4;
sub deg2rad {
    my ($deg) = @_;
    return ($deg * $pi / 180);
$sphinx->SetSortMode(SPH_SORT_EXTENDED, '@geodist ASC');
#### $lat/$lon is the point you want to be based on
$sphinx->SetGeoAnchor('latitude', 'longitude', deg2rad($lat), deg2rad($lon));
#### $radius is how far you want to search with
my $circle = $radius * 1.609344; # meter
$sphinx->SetFilterFloatRange('@geodist', 0, $circle);
my $ret = $sphinx->Query();
it's simple, sphinx did all the magic you want. the stuff you want to know is that sphinx can do it. :)

2. haproxy as load balancer

when you have many sphinx servers, one choice is that you can do disturbed index as mentioned in the doc. the other is to put load balancer before them.
well, I'm not saying the built-in disturbed index is bad or something like that, actually I haven't tried that yet. below is just my cents when I use haproxy with sphinx servers.

* when you put 'log   local0' in the conf, don't forget to follow the docs with google search, put 'local0.* /var/log/haproxy.log' and the "-r" in SYSLOGD_OPTIONS="-m 0 -r"

actually I'm not 100% satisfied with haproxy.
* haproxy doesn't support TCP stats like http as I tested with haproxy 1.4.13
* haproxy can't check sphinx searchd status. it's not the fault of haproxy, searchd doesn't have a simple way to verify it's working smoothly or have fatal error inside. it can be fixed by a Perl script but I'd like that searchd has this inside.

but it really works pretty well, I may write a Perl script as the 'check' in haproxy so that it can auto failover then I think I'll be more happy.

3. new rotating way

To do full index on every sphinx server is really dumb. it puts heavy load on the underlying MySQL server. Big thanks to the sphinx forum, I have the answer to do something like below and it works pretty good.

* create a new index section in the conf
index XXX {
    source = XXX
    path    = /var/data/sphinx/XXX
index XXX_new : XXX {
    path = /var/data/sphinx/new/XXX.new
* never run indexer --all on XXX, instead, always run it with
indexer --all --config /path/to/above.conf XXX_new
* run bash or Perl script to do the full index, after run the command above. copy those XXX.new.sp* to destination by cp or scp. then send kill -1 to the pid of searchd which can use `cat /the/pid/file/in/conf/XXX.pid`. (-1 is SIGHUP).
* NOTE here: when you do the ->Query API call, you have to put XXX as the second arg as index name. or it will search with XXX;XXX_new, and it wastes. (or you can try start the searchd with --index XXX)

the real magic here is that
if you run indexer --all with XXX. it will create XXX.new.sp* file. and once you sent the SIGHUP by --rotate or restart the searchd, the XXX.new.sp* will becomes XXX.sp*
so you can run the indexer --all with XXX_new, and it will generate XXX.new.sp* which is the same as the ones you do it with --all XXX, and you can copy it to the directory XXX.sp* lives, and SIGHUP can make it becomes XXX.sp* without any fault.

well, not funny but maybe useful when you have the same situation.


blog comments powered by Disqus