Sunspot 0.7 is out
Momentous news: Yesterday I released the 0.7 version of Sunspot, my library for awesome Solr interaction in pure Ruby. This is the first release that I consider basically feature complete, meaning there aren’t any gaping holes in the feature set - not that there aren’t more features in the pipeline! Read on for all the new goodies.
Documentation!
Sunspot is now fully documented. In order to distinguish the public and private APIs, I made liberal use of :nodoc: on classes and methods that are not part of the public API. So, everything in the RDoc is fair game; if you find yourself needing to call methods that aren’t in the RDoc, let me know so I can expose what you need in the public API.
API-private methods are still documented in the code.
Less magic in the search DSL
In earlier versions, the search DSL looked like this:
Sunspot.search(Post) do
with.blog_id 1
with.average_rating.less_than 4
end
Now it looks like this:
Sunspot.search(Post) do
with :blog_id, 1
with(:average_rating).less_than 4
end
I find the new syntax to be more intuitive, and my colleagues whom I polled unanimously agreed. Sometimes Ruby developers get perhaps a bit too excited about how cleverly one can construct English-like DSLs in the language, and I would plead guilty to that charge where the earlier version is concerned.
Negative scoping
The search DSL now provides without, which is a counterpart to the with method that negates the restriction. So, you can do the following:
Sunspot.search(Post) do
without :blog_id, 2
end
That would exclude all posts whose blog_id is 2.
Exclusion by identity
A special use of the without method is excluding specific objects from the search results. This is done by just passing the objects you want to exclude to without. For example, if you have an instance current_post that you don’t want in the search results:
Sunspot.search(Post) do
without current_post
end
Restrict by empty values
You can now pass nil to an equality restriction, which will restrict the results to documents for which the given field has no value. For example:
Sunspot.search(Post) do
with :category_id, nil
end
The above would return only documents that do not have a category_id. without works as expected, returning only documents that do have a value for the field in question. Passing nil to other restriction types is not allowed.
Faceting
One of Solr’s most powerful features is faceting, which returns all of the values stored for a given field, and the number of documents that have each value. It’s perfect for building drill-down search interfaces. Sunspot now supports it:
search = Sunspot.search(Post) do
with :blog_id, 2
facet :category_ids
end
category_ids_facet = search.facet(:category_ids)
category_ids_facet.rows.map { |row| [row.value, row.count] }
#=> [[25, 3], [13, 1]]
The above results indicate that there are 3 documents with blog_id 2 and category_id 25, and 1 document with blog_id 2 and category_id 13. Note that facet results are for the total number of documents that match the search conditions, not just the current result set (which is paginated). Note also that Sunspot casts facets into the appropriate Ruby object for their type - thus a time field’s facet values will be Time objects, etc.
Solr also provides even more powerful query-based faceting (to facet by ranges of values, for instance), which I plan to tackle in a future version of Sunspot.
Explicit commits
Changing data in Solr is a two-step process - first, data is added, updated, or removed; then the changes are committed. When a commit is called, all pending changes are written to disk, and Solr instantiates a new searcher object with the updated index. In order for changes to appear in search, they must be committed, but the commit is a fairly expensive operation; thus, if you are making multiple updates as part of one operation, it’s highly advisable to commit once, after making all of the changes. Earlier versions of Sunspot automatically committed after each change; now a commit method is exposed, as well as bang!-versions of the update methods, which perform a commit immediately. So:
Sunspot.index(my_document)
Sunspot.commit
# does the same thing as
Sunspot.index!(my_document)
Boolean field type
boolean is now an available field type. It works pretty much as you’d expect. Note that in order for a false value to be indexed, it has to explicitly be false — nil will not be indexed at all. Anything else is indexed as true.
Attribute field flexibility
You can now tell an attribute field to pull data from an attribute other than the one named by the field. For example:
Sunspot.setup(Post) do
float :average_rating, :using => :ratings_average
end
This would index a field called average_rating, pulling data from the ratings_average method.
Virtual field evaluation flexibility
If the block specified for a virtual field takes an argument, the block will be passed the instance, rather than being evaluated in its context:
Sunspot.setup(Post) do
string :sort_title do
title.downcase
end
# is the same as
string :sort_title do |post|
post.title.downcase
end
end
Use whichever one feels more natural for the given object - for instance, I would use the first form for a model I had written, but the second form for File objects.
Order by multiple fields
You can now call order inside the search DSL more than once. Earlier calls get higher precedence.
New adapter API
Sunspot is intended to be flexible in what it can index and search; to that end, it provides a pluggable adapter architecture. Sunspot 0.7 makes several changes to the adapter API; this should be the final API that goes into the 1.0 release.
Instead of building a single adapter module that contains two classes with preset names, the new API allows (and indeed requires) the two classes to be registered seperately. An adapter should consist of two classes: a subclass of Sunspot::Adapters::InstanceAdapter, and a subclass of Sunspot::Adapters::DataAccessor. Check out the Rdoc for information on what methods each should and can implement. Here’s an example of how one might build an adapter for File objects:
class FileInstanceAdapter < Sunspot::Adapters::InstanceAdapter
def id
File.expand_path(@instance.path)
end
end
class FileDataAccessor < Sunspot::Adapters::DataAccessor
def load(id)
@clazz.open(id)
end
end
Sunspot::Adapters::InstanceAdapter.register(FileInstanceAdapter, File)
Sunspot::Adapters::DataAccessor.register(FileDataAccessor, File)
Goodbye Builder API
The last release of Sunspot attempted to provide a framework for using the Builder pattern to convert external parameters into searches. Upon further reflection, I found the API and implementation rather awkward, and it wasn’t really a core part of what Sunspot was trying to do. The search method still can accept a hash of parameters, but you really shouldn’t use that because the DSL is way better.
This also means that, for the moment, Search objects don’t provide access to the parameters passed in. That’s probably not a good thing, so I’ll try and find a clean, intuitive way to provide that access in a new version.
Goodbye extlib
Extlib is cool, but Sunspot was only using three methods from it, and it was causing some sort of weird error when I tried to run the tests on the installed gem. So I implemented the three methods myself and got rid of the extlib dependency.
That’s all, folks
That should just about cover all of the external-facing changes in the new version. There’s also lots of internal refactoring and simplification that should make the code leaner, faster, and more maintainable. This is the first version that I would feel comfortable putting into a production environment - if you’re using it, I’d love to know.