When to use Apache Solr with Drupal
Drupal’s core search technology is a good fit for small to medium sites, or where the search requirements aren’t particularly sophisticated. The benefits of core search are zero setup and no additional server requirements; the node content is indexed in the database.
However, for busy sites, sites with a lot of content, or if features such as faceting are required, then Drupal can be combined with Apache Solr, a specialised search platform.
Apache Solr provides scalability and performance benefits over core database search, as well as providing some features that are difficult to deliver (or difficult to deliver with acceptable performance) using core search.
What is Apache Solr?
Apache Solr is a search platform focused on delivering enterprise class, high performance search functionality. The software was originally created as an internal CNET project and then donated to the Apache Foundation in 2006. The Apache Solr Drupal integration module makes it relatively easy to replace Drupal core search with this external search platform.
Apache Solr runs as a separate service from the web server and the database, so requires some extra resources. Normally this means a dedicated server rather than cheaper shared hosting. The fact that it is separate means that it can scale independently of the other two services, from being run on its own dedicated server through to its own cluster.
Use Apache Solr to help sell content with Drupal and UberCart
Drupal’s e-commerce module, UberCart, has two great features that make selling content online easy. Firstly, you can sell file downloads; for example, you can sell PDFs with training content. We recently launched Soccer Coaching Club for Green Star Media, providing users with the ability to find and purchase the right piece of content from thousands of potential choices.
UberCart also allows you to change a user’s role for a certain period. For example, users can buy a subscription to allow access to premium videos or articles for a month via UberCart and be automatically reverted to a ‘free’ role after this period.
UberCart’s content purchasing functionality, combined with Apache Solr’s powerful content filtering, represents an exciting and entirely open source solution.
Search facets with Apache Solr and Drupal
Facets are attributes of content that allow filtering alongside the user’s search query. A well known example is Amazon: a search for ‘John Grisham’ brings up results from the Book, Film & TV and MP3 Downloads categories. By selecting ‘Book’, you eliminate the results related to DVDs and audio books that are in the other categories. You can also filter for certain delivery options or customer ratings.
The Apache Solr module provides Drupal facet data to Apache Solr. For example the content type (eg News, Blog post or Research paper), the publication data and, most usefully, all of the taxonomy terms (tags) associated with that piece of content. This means users can use a keyword search in conjunction with the site’s taxonomy to further narrow their search, providing a very usable and powerful tool.
Performance improvements with Apache Solr
High traffic sites running search queries against the database can start to degrade the site’s overall performance if the database becomes the bottleneck. This is also true for lower traffic sites with lots of content. Complex search queries can be slow to run.
Requirements for features such as faceted search (see below) are becoming increasingly common. This can be delivered in conjunction with Drupal core search using the Faceted Search module but the inherent scalability and performance implications are well documented by the module’s maintainers.
Other useful Apache Solr features
Apache Solr also supports indexing and searching multiple sites (imagine internal intranet site and external corporate site), indexing attachments (eg PDFs, Excel documents) and recommended content blocks driven by a node’s taxonomy. The module page and the Acquia Search overview both have a good overview of the Apache Solr features that Drupal supports.
How does Apache Solr fit with Acquia Search?
Acquia Search is a cloud-based ‘Platform as a Service’ (PaaS) delivery of Apache Solr. The difference is in where Apache Solr is hosted; it’s essentially the same software backed by an SLA. The main benefits are ease of set up and scalability. No local installation or management of Apache Solr is required. You just enter a license key into your Drupal site. Also because it is hosted by Acquia (on their Amazon EC2 infrastructure), you don’t have to worry about scaling or managing the load of your Apache Solr usage.
In many cases, the fact that Acquia Search simplifies the hosting stack, potentially reduces hosting costs and Just Works™ means that it’s the default choice over a local Apache Solr install. Projects requiring bespoke Solr configuration, or an unwillingness to rely on a 3rd party solution, should consider their own local Apache Solr install. For example, a project we worked on recently required a custom Solr synonyms configuration file to ‘educate’ the search on a niche subject’s search terms. This isn’t possible with Acquia Search currently.
The difference between Apache Solr and Apache Lucene
For most people, there isn’t one. Lucene is the internal indexing and search library that Apache Solr uses to deliver its search functionality, Solr can be considered the ‘service’ wrapper around the Lucene engine. They were originally separate projects but have merged. Outside of the technical community, generally people use the terms Solr and Lucene interchangeably. If you are using Solr, you are implicitly using Lucene, and vice versa.