| Share |
Updated: September 28th, 2009
In the past few weeks I've been implementing advanced search at Plaxo, working quite closely with Solr enterprise search server. Today, I saw this relatively detailed comparison between Solr and its main competitor Sphinx (full credit goes to StackOverflow user mausch who had been using Solr for the past 2 years). For those still confused, Solr and Sphinx are similar to MySQL FULLTEXT search, or for those even more confused, think Google (yeah, this is a bit of a stretch, I know).
Similarities
- Both Solr and Sphinx satisfy all of your requirements. They're fast and designed to index and search large bodies of data efficiently.
- Both have a long list of high-traffic sites using them (Solr, Sphinx)
- Both offer commercial support. (Solr, Sphinx)
- Both offer client API bindings for several platforms/languages (Sphinx, Solr)
- Both can be distributed to increase speed and capacity (Sphinx, Solr)
Here are some differences
- Solr, being an Apache project, is obviously is Apache2-licensed. Sphinx is GPLv2. This means that if you ever need to embed or extend (not just "use") Sphinx in a commercial application, you'll have to buy a commercial license.
- Solr is easily embeddable in Java applications.
- Solr is built on top of Lucene, which is a proven technology over 7 years old with a huge user base (this is only a small part). Whenever Lucene gets a new feature or speedup, Solr gets it too. Many of the devs committing to Solr are also Lucene committers.
- Sphinx integrates more tightly with RDBMSs, especially MySQL.
- Solr can be integrated with Hadoop to build distributed applications
- Solr can be integrated with Nutch to quickly build a fully-fledged web search engine with crawler.
- Solr can index proprietary formats like Microsoft Word, PDF, etc. Sphinx can't.
- Solr comes with a spell-checker out of the box.
- Solr comes with facet support out of the box. Faceting in Sphinx takes more work.
- Sphinx doesn't allow partial index updates for field data.
- In Sphinx, all document ids must be unique unsigned non-zero integer numbers. Solr doesn't even require a unique key for many operations, and unique keys can be either integers or strings.
- Solr supports field collapsing to avoid duplicating similar results. Sphinx doesn't seem to provide any feature like this.
Related questions
- http://stackoverflow.com/questions/1284083/choosing-a-stand-alone-full-text-search-server-sphinx-or-solr
- http://stackoverflow.com/questions/1132284/full-text-searching-with-rails
- http://stackoverflow.com/questions/737275/pros-cons-of-full-text-search-engine-lucene-sphinx-postgresql-full-text-searc
Conclusion
In my experience, Solr is very-very fast on the query side. It is also very powerful. The indexing side is very CPU and memory intensive and is an unfortunate side effect of having such a feature-rich, fast application. Nevertheless, I highly recommend Solr.
For disclaimer purposes, I have not had much experience with Sphinx and, again, all credit for this comparison goes to mausch.
By the way, here's a really good resource for Solr 1.4 that just came out: Solr 1.4 Enterprise Search. I have this book and it's quite helpful in explaining such topics as multicore setup, search methods, replication, etc.
Artem Russakovskii is a San Francisco programmer, blogger, and future millionaire (that last part is in the works). Follow Artem on Twitter (@ArtemR) or subscribe to the RSS feed.
In the meantime, if you found this article useful, feel free to buy me a cup of coffee below.


beer planet is a blog about technology, programming, computers, and geek life. It is run by Artem Russakovskii - a local San Francisco geek who currently works at
Artem,
"if you ever need to embed or extend (not just "use") Sphinx in a commercial application, you'll have to buy a commercial license."
I think you need to be more careful about the wording here. In fact, I think it is best to stop at "it is GPLv2," because it is very difficult to say anything accurate about the GPL's restrictions in a few words like that. You need to define a commercial application, for example, and you need to define embed, extend, and use.
Baron, Sphinx's license is *very* clear about its GPL situation and the meaning of "embed", "extend" and "use". See http://www.sphinxsearch.com/licensing.html
Field collapsing is not complete in Solr (click on the link you linked). That's why I'm switching to Xapian.
Thanks for the article! Cheers
Unfortunately SOLR does not have a background indexer like Sphinx does which can automatically index in the background a mysql database and you must write more code to do so.
I will correct myself by saying that i just noticed the DataImportHandler (http://wiki.apache.org/solr/DataImportHandler)
[...] See original here: Comparison Between Solr And Sphinx Search Servers (Solr Vs Sphinx … [...]
[...] Comparison Between Solr And Sphinx Search Servers (Solr Vs Sphinx – Fight!) (beerpla.net) [...]
[...] comparison of Solr and Sphinx var addthis_pub = ''; var addthis_language = 'en';var addthis_options = 'email, favorites, digg, [...]