out of time (mat brown on programming)

May 26 2009

Sunspot 0.8 is out

On Friday, I released the next milestone in Sunspot, version 0.8. This version doesn’t add to or change any of the basic functionality, but does add some advanced features which the app I work on for my day job happens to demand. Here’s a rundown:

Direct access to the Query API

Users of Sunspot will doubless be familiar with Sunspot’s search DSL, which gives an English-like interface for constructing search parameters. In some cases, however, such a DSL is actually counterproductive, particularly when searches are being built by an intermediate object, and thus not necessarily all in one place. So, the new methods Sunspot.new_search() and Search#query() are exposed, and the Sunspot::Query class itself is now part of the public API. What I have in mind in particular here is an application of the Go4 Builder pattern, along with ActiveRecord’s hash-initializer pattern, to elegantly translate web query parameters into a Sunspot search. Here’s a stripped-down example of what I think the code will look like to do that:

class EventSearchBuilder
  attr_reader :search


  def initialize(options = {})
    @search = Sunspot.new_search(Event)
    options.each_pair do |attr, value|
      if respond_to?("#{attr}=")
        send("#{attr}=", value)
      end
    end
  end


  def when=(day_string)
    case day_string
    when 'future'
      @search.query.add_restriction(:start_time, :greater_than, Time.now)
    when 'past'
      @search.query.add_restriction(:start_time, :less_than, Time.now)
    else
      date_time = Date.parse(day_string).to_time
      @search.query.add_restriction(:start_time, :between, date_time..(date_time + 1.day))
    end
  end


  def page=(page)
    @search.query.paginate(page)
  end


  def sort=(field)
    @search.query.order_by(field)
  end
end

Then in controller code, it’s as simple as:

def search
  @search = EventSearchBuilder.new(params).search
  @search.execute!
end

Dynamic Fields

I wouldn’t be surprised if I’m the only person who ever uses this feature of Sunspot, but just in case, let’s look at a real-world example. Let’s say part of my data model uses free-form key-value pairs, which use a constrained (but user-definable) set of keys and free-form values. I’ll call my model KeyValuePairs.

The trick I would like to pull here is that I would like to treat each key as a separate field in search, so that I can constrain, order, facet, etc. on the values for one key without them being affected by other keys. Since the keys are user-defined, I can’t just set up normal fields at build time; they need to be defined at index time. Enter Sunspot’s dynamic fields (we’ll use Sunspot::Rails’s wrapper API here):

class Business < ActiveRecord::Base
  has_many :key_value_pairs


  searchable do
    dynamic_string :key_value_pairs do
      key_value_pairs.inject({}) do |hash, pair|
        hash.merge(pair.key.to_sym => pair.value)
      end
    end
  end
end

This sets up a dynamic field which is populated using the given block. What’s important there is that the field is populated using a hash - the keys of the hash become individual dynamic fields, and the values populate those fields in the index. The “base name” of the field is key_value_pairs, which is used to namespace the dynamic names that come out of the hash.

Working with dynamic fields is a lot like working with regular ones, except in the query, calls are wrapped in a dynamic block:

Business.search do
  dynamic :key_value_pairs do
    with(:cuisine, 'Sushi')
    facet(:atmosphere)
  end
end

Naturally, those field names (:cuisine, :atmosphere) wouldn’t be hard-coded in a real application, since they would not be known at build time.

Dirty Sessions

Sessions now track whether any operations have been performed since the last time a commit was issued. The Session#dirty? method answers that question, and the Session#commit_if_dirty does exactly what it sounds like. Useful methods if you want to keep your commits to a minimum (you do) but you may have various parts of the code issuing Sunspot operations without any central knowledge on the part of your application.

That’s all for now

Sunspot 0.9 is up next; the main goal for that version is to replace solr-ruby with RSolr as the low-level Solr interface, which will open the door to more features in future versions (query-based faceting, LocalSolr support, etc.), but probably won’t have much effect on the API for that version (other than supporting use of the faster Curb library for the HTTP communication with Solr).

Comments (View)
Apr 29 2009

Sunspot 0.7 is out

Momentous news: Yesterday I released the 0.7 version of Sunspot, my library for awesome Solr interaction in pure Ruby. This is the first release that I consider basically feature complete, meaning there aren’t any gaping holes in the feature set - not that there aren’t more features in the pipeline! Read on for all the new goodies.

Documentation!

Sunspot is now fully documented. In order to distinguish the public and private APIs, I made liberal use of :nodoc: on classes and methods that are not part of the public API. So, everything in the RDoc is fair game; if you find yourself needing to call methods that aren’t in the RDoc, let me know so I can expose what you need in the public API.

API-private methods are still documented in the code.

Less magic in the search DSL

In earlier versions, the search DSL looked like this:

Sunspot.search(Post) do
  with.blog_id 1
  with.average_rating.less_than 4
end

Now it looks like this:

Sunspot.search(Post) do
  with :blog_id, 1
  with(:average_rating).less_than 4
end

I find the new syntax to be more intuitive, and my colleagues whom I polled unanimously agreed. Sometimes Ruby developers get perhaps a bit too excited about how cleverly one can construct English-like DSLs in the language, and I would plead guilty to that charge where the earlier version is concerned.

Negative scoping

The search DSL now provides without, which is a counterpart to the with method that negates the restriction. So, you can do the following:

Sunspot.search(Post) do
  without :blog_id, 2
end

That would exclude all posts whose blog_id is 2.

Exclusion by identity

A special use of the without method is excluding specific objects from the search results. This is done by just passing the objects you want to exclude to without. For example, if you have an instance current_post that you don’t want in the search results:

Sunspot.search(Post) do
  without current_post
end

Restrict by empty values

You can now pass nil to an equality restriction, which will restrict the results to documents for which the given field has no value. For example:

Sunspot.search(Post) do
  with :category_id, nil
end

The above would return only documents that do not have a category_id. without works as expected, returning only documents that do have a value for the field in question. Passing nil to other restriction types is not allowed.

Faceting

One of Solr’s most powerful features is faceting, which returns all of the values stored for a given field, and the number of documents that have each value. It’s perfect for building drill-down search interfaces. Sunspot now supports it:

search = Sunspot.search(Post) do
  with :blog_id, 2
  facet :category_ids
end

category_ids_facet = search.facet(:category_ids)
category_ids_facet.rows.map { |row| [row.value, row.count] }
  #=> [[25, 3], [13, 1]]

The above results indicate that there are 3 documents with blog_id 2 and category_id 25, and 1 document with blog_id 2 and category_id 13. Note that facet results are for the total number of documents that match the search conditions, not just the current result set (which is paginated). Note also that Sunspot casts facets into the appropriate Ruby object for their type - thus a time field’s facet values will be Time objects, etc.

Solr also provides even more powerful query-based faceting (to facet by ranges of values, for instance), which I plan to tackle in a future version of Sunspot.

Explicit commits

Changing data in Solr is a two-step process - first, data is added, updated, or removed; then the changes are committed. When a commit is called, all pending changes are written to disk, and Solr instantiates a new searcher object with the updated index. In order for changes to appear in search, they must be committed, but the commit is a fairly expensive operation; thus, if you are making multiple updates as part of one operation, it’s highly advisable to commit once, after making all of the changes. Earlier versions of Sunspot automatically committed after each change; now a commit method is exposed, as well as bang!-versions of the update methods, which perform a commit immediately. So:

Sunspot.index(my_document)
Sunspot.commit
# does the same thing as
Sunspot.index!(my_document)

Boolean field type

boolean is now an available field type. It works pretty much as you’d expect. Note that in order for a false value to be indexed, it has to explicitly be falsenil will not be indexed at all. Anything else is indexed as true.

Attribute field flexibility

You can now tell an attribute field to pull data from an attribute other than the one named by the field. For example:

Sunspot.setup(Post) do
  float :average_rating, :using => :ratings_average
end

This would index a field called average_rating, pulling data from the ratings_average method.

Virtual field evaluation flexibility

If the block specified for a virtual field takes an argument, the block will be passed the instance, rather than being evaluated in its context:

Sunspot.setup(Post) do
  string :sort_title do
    title.downcase
  end
  # is the same as
  string :sort_title do |post|
    post.title.downcase
  end
end

Use whichever one feels more natural for the given object - for instance, I would use the first form for a model I had written, but the second form for File objects.

Order by multiple fields

You can now call order inside the search DSL more than once. Earlier calls get higher precedence.

New adapter API

Sunspot is intended to be flexible in what it can index and search; to that end, it provides a pluggable adapter architecture. Sunspot 0.7 makes several changes to the adapter API; this should be the final API that goes into the 1.0 release.

Instead of building a single adapter module that contains two classes with preset names, the new API allows (and indeed requires) the two classes to be registered seperately. An adapter should consist of two classes: a subclass of Sunspot::Adapters::InstanceAdapter, and a subclass of Sunspot::Adapters::DataAccessor. Check out the Rdoc for information on what methods each should and can implement. Here’s an example of how one might build an adapter for File objects:

class FileInstanceAdapter < Sunspot::Adapters::InstanceAdapter
  def id
    File.expand_path(@instance.path)
  end
end

class FileDataAccessor < Sunspot::Adapters::DataAccessor
  def load(id)
    @clazz.open(id)
  end
end

Sunspot::Adapters::InstanceAdapter.register(FileInstanceAdapter, File)
Sunspot::Adapters::DataAccessor.register(FileDataAccessor, File)

Goodbye Builder API

The last release of Sunspot attempted to provide a framework for using the Builder pattern to convert external parameters into searches. Upon further reflection, I found the API and implementation rather awkward, and it wasn’t really a core part of what Sunspot was trying to do. The search method still can accept a hash of parameters, but you really shouldn’t use that because the DSL is way better.

This also means that, for the moment, Search objects don’t provide access to the parameters passed in. That’s probably not a good thing, so I’ll try and find a clean, intuitive way to provide that access in a new version.

Goodbye extlib

Extlib is cool, but Sunspot was only using three methods from it, and it was causing some sort of weird error when I tried to run the tests on the installed gem. So I implemented the three methods myself and got rid of the extlib dependency.

That’s all, folks

That should just about cover all of the external-facing changes in the new version. There’s also lots of internal refactoring and simplification that should make the code leaner, faster, and more maintainable. This is the first version that I would feel comfortable putting into a production environment - if you’re using it, I’d love to know.

Comments (View)
Feb 14 2009

Sunspot 0.0.2 released

Finally, another release of sunspot, my library for awesome interaction with Solr in Ruby. I haven’t had much time to work on it in the past couple of months, so version 0.0.2 isn’t exactly earth-shattering. The good news is that, starting in a couple of weeks, my employers have me slated to spend much of my workday developing Sunspot, with the goal of putting it into production early this spring. So expect more, and bigger, releases soon.

In the meantime, here’s what’s new:

The sunspot-solr executable

As promised in the README for version 0.0.1, I have made running a solr instance for development less of a hassle. Now all you’ve got to do is:

sunspot-solr start

Hey, nice. The executable takes two arguments: -p (that’s for port) and -d (that’s for data directory). By default, solr runs on port 8983 and saves the index in your /tmp directory. So if you’re doing anything other than playing around or running tests, definitely make sure to specify that -d option. To play nice with daemons, throw a double-dash before the solr options:

sunspot-solr start -- -p 8982 -d data/solr/development

The executable takes all the usual daemon commands, such as sunspot-solr stop to stop it and sunspot-solr run to run in the foreground (this is an especially good idea if solr isn’t starting properly and you’d like to figure out why).

The executable is intended for use in development/testing scenarios; if you’re running Sunspot in production (which, frankly, I wouldn’t do yet), you should set up your own standalone Solr instance.

Build searches with Builder objects

Version 0.0.1 of sunspot allowed you to put together a search like this:

search = Sunspot.search(Post) do
  keywords 'great pizza'
  with.blog_id 1
end

Or like this:

search = Sunspot.search(Post, :keywords => 'great_pizza', :conditions => { :blog_id => 1 })

So does version 0.0.2; however, in 0.0.2, the second syntax delegates the interpretation of that args hash to a Builder object (in this case, a Sunspot::StandardBuilder). As well as translating the hash into an actual search (internally accessing the same API that the block DSL does), the builder also holds onto the hash itself, allowing access to the search parameters later on in the game:

search.builder.keywords #=> "great pizza"
search.builder.params[:keywords] #=> "great pizza"
search.builder.conditions.blog_id #=> 1
search.builder.params[:conditions][:blog_id] #=> 1

The idea here is that search parameters passed from other layers of the application - particularly from user input - can be dropped directly into the search builder, and then accessed where needed (in Rails, for instance, the object interface of the builder lends itself to FormBuilder integration; the hash interface makes generating URLs easy).

One thing I’m looking forward to doing in a future release is opening up the builder API for developers to create and use their own builders, allowing easy and clean encapsulation of rules for translating user input into search parameters (we actually use this approach at Patch now with an older home-grown Solr API that doesn’t explicitly support it, and it works well). Anyway, expect to hear more about that.

For the next Sunspot release I’ll be tackling faceting, which is currently the most glaring omission from Sunspot’s support for Solr’s featureset. Further down the line will be ORM (etc.) adapters (expect ActiveRecord, DataMapper, and File objects); plug-ins for Merb and Rails; and more of the builder stuff. I’m also looking into possibly switching Sunspot to run on top of a newer, more actively maintained fork of solr-ruby.

Well anyway check it out if you want:

sudo gem install outoftime-sunspot --source http://gems.github.com
Comments (View)
Page 1 of 1