Car News, Articles and Stories

Category archives: Tech

Optimizing Solr (Or How to 7x Your Search Speed)

Apache Solr powers enterprise search on sites from Ebay to Zappos. It also powers Carsabi, but when we reached 1.8M listings per month (passing Autotrader & Cars.com) our basic installation began to run about as fast as an octogenarian in congealing cement. I’d like to share the basics of Solr optimization, as well as some data on real world gains.

Very briefly, our stack has gone through a few iterations which may be sufficient for your corpus volume – no sense in over-engineering. Postgres tables had to be denormalized at 100k vehicles, and we switched to WebSolr’s extremely convenient Solr solution at 300k – their Heroku plugin will create an installation in minutes for just $20/month. This worked very well until about 1M listings, at which point even their beefiest plan was returning results with >800ms latency.

Hardware: Bigger Is Better. A Lot Better

Our previous Solr-as-a-Service had been hosted on an Amazon EC2 Large instance and returned in 800ms. Fortunately, we had spare capacity on an EC2 Cluster Compute Eight Extra Large, which we use for our webcrawler, and just moving to this machine dropped our query time to 282ms – a speed increase of 2.84x. Notice this corresponds to the processor speed increase of 2.75x between ...

Continue reading

Step by Step Solr Sharding

The easiest way to shard your index is to create multiple “Solr cores” – a core just means a separate index in the same Solr server. To do this, simply copy the example/multicore/ configuration directory that's distributed along with Solr. You should see core0 and core1 as subfolders.

The configuration for each core is stored in multicore/core<N>/. Copy your existing config (or the defaults in example/solr/conf/ if you're just getting started) into the core<N>/conf directory. Once you've got one core setup the way you want it, copy that directory until you ...

Continue reading