4th December 2013

Coder Lounge - November 2013

John Ennew
Technical Director

Apache Solr 4

This month the devs of Deeson mostly worked on Apache Solr. In an attempt to uncover their inner secrets we dissected the apachesolr, apachesolr_views, search_api and sarnia modules. This involved loading apache solr 4 onto a local machine and getting this up and running. For our initial hacking exploits we decided to try and manually insert data into solr and search on it programatically using drush. This code needs the apachesolr module enabled as it makes use of it's api functions.

  /**
   * Implements hook_drush_command().
   */
  function mymodule_drush_command() {
    $items = array();

    $items['solr-insert'] = array(
      'description' => 'Insert data into solr programatically',
      'aliases' => array('solri'),
    );

    $items['solr-search'] = array(
      'description' => 'Search solr programatically',
      'aliases' => array('solrs'),
    );

    return $items;
  }

  /**
   * Insert data into solr programatically.
   */
  function drush_mymodule_solr_insert() {
    $env = apachesolr_get_solr('solr');

    // Blank solr to start with.
    $env->deleteByQuery('*:*');
    $env->commit();
    $env->optimize();

    // Make a document.
    $documents = array();

    $document = new ApacheSolrDocument();
    $document->id = 'first_record';
    $document->label = "First record content label";
    $document->entity_type = 'solr';
    $document->bundle = 'not_a_real_bundle';
    $document->ts_a_single_text_field = 'some text';
    $document->sm_other_record_ids = array('second_record');

    $documents[] = $document;

    $document = new ApacheSolrDocument();
    $document->id = 'second_record';
    $document->label = "Second record content label";
    $document->entity_type = 'solr';
    $document->bundle = 'not_a_real_bundle';
    $document->ts_a_single_text_field = 'some text';
    $document->sm_other_record_ids = array('first_record');

    $documents[] = $document;

    $env->addDocuments($documents);
    $env->commit();
  }

 /**
   * Search solr programatically.
   */
  function drush_mymodule_solr_search() {
    $env = apachesolr_get_solr('solr');
    $q = $env->search(NULL, array('fq' => array('label:"First"', 'entity_type:solr')));
    drush_print_r($q);
  }

Solr 4 also does joins, allowing additional searches on sub sets of data after a single search. This is a little like joining a table to itself in SQL but the results are not quite what a seasoned SQL expert would expect. In the data we inserted above we can see that the records actually reference each other via the Solr field sm_other_record_ids. We can write an initial query, for example one which searches for all records by matching against the label field. That result set can then be joined to the solr data to retrieve data which matches the first data via a field. Unlike SQL, what is returned is the 'right' side of the join, the 'left' side is not returned. An example query from the above data would look like this:

q={!join from=sm_other_record_ids to=id}label:First

Here, the first search matches all records with 'First' in its label field. The result is then joined to other records whose id field is referenced by their sm_other_record_ids field of the first results. The result returned is therefore only the second record. Here we make the query programatically with drush by modifying our search function:

   /**
   * Search solr programatically.
   */
  function drush_mymodule_solr_search() {
    $env = apachesolr_get_solr('solr');
    $q = $env->search('{!join from=sm_other_record_ids to=id}label:First');
    drush_print_r($q);
  }

For a better explanation, read the following documentation page: http://wiki.apache.org/solr/Join