out of time
mat brown on programming, politics, cooking, and assorted nerdiness
-
Sunspot 1.2 released (finally)
CommentsAfter a firmly ridiculous amount of time in release candidate status (mainly owing to lack of time on my part), Sunspot 1.2 final is out. Here’s the inside scoop.
Upgrading
First, if you’re using Sunspot::Rails, you no longer need to explicitly load the ‘sunspot/rails’ source file (in fact, if you do, things won’t work right). So if you’re using Rails 3 (or bundler with Rails 2), your Gemfile just needs:
gem 'sunspot_rails'And if you’re using Rails 2 without Bundler, it’s just:
config.gem 'sunspot_rails'The other major change is in spatial search: Sunspot 1.2 has a complete rewrite of spatial search functionality, and both the API and the underlying implementation are quite different. I’ll go in to quite a bit of depth a little later in this post, but for now, here’s a quick before-and-after on the API.
Previously, you configured a (the) spatial field like this:
coordinates :coordinatesIn this case,
:coordinatesis just a method that’s used to return the coordinate information, which could be either a two-element array, or an object that responded to#latand#lng, or some other variants on those attribute names. Each document got exactly one set of coordinates, so there was no explicit field name associated with the information.Now, you set it up like this:
location :coordinatesSeems pretty similar, but there are a couple of crucial differences. First,
:coordinatesis an actual field name, with aLocationtype. You can think of it like any other field, and you can pass all the usual options in. You can also specify more than one location field:location :hq_coordinates location :field_office_coordinates, :multiple => trueAlso, Sunspot 1.2 is stricter about the data that’s used to populate location fields: It has to be an object (or array of objects, if it’s a multi-valued field) that responds to #lat and #lng. You can use
Sunspot::Util::Coordinatesif you’re not working with objects that already fit the bill.OK, now on to performing geo search. Before, you’d do this:
near [40.0, -70.0], :distance => 5Now you’ll do something like:
with(:hq_coordinates).near 40.0, -70.0, :precision => 8Don’t worry too much about what that
:precisionoption means right now; we’ll get into that later.What’s new in Sunspot 1.2
Spatial search with GeoHash
By far the biggest change in Sunspot 1.2 is a complete rewrite of the spatial search component. Instead of relying on solr-spatial-light, a Solr plugin I wrote, to perform spatial search, Sunspot now uses a geohash-based spatial search strategy that is implemented completely in Sunspot itself; no special functionality is needed from Solr. This has some major advantages, but it also has some disadvantages.
The good:
- Performs well at large scale, since under the hood it is executing what amounts to a relatively simple fulltext search in Solr. Contrast with solr-spatial-light, which has severe performance problems at scale.
- Allows multiple location fields in a single document, and also multi-valued location fields.
- Allows searches to incorporate both fulltext relevance and spatial proximity when calculating result score, resulting in a very “natural” default result ordering when the search contains both fulltext and spatial components.
The bad:
- Control over search “radius” is severely constrained — only provide nine precision levels, ranging from 389 miles (precision 3) to 8 feet (precision 12).
- Proximity search matches locations that inhabit the same square on a fixed grid on the globe as the search origin; if the origin is near the edge of the square, then nearby documents will be missed, whereas more distant documents will be matched.
How it works:
When locations are indexed, they are converted into GeoHash values. GeoHash is a clever algorithm that encodes coordinates on the globe into a single string which has the property that, on average, the shared prefix of two geohashes increases as the distance between the locations decreases. Thus, proximity search can be performed very efficiently by simply searching for documents which share a prefix with the origin point, and closer documents can be given higher relevance by boosting matches with longer shared prefixes.
For complete documentation on using Sunspot’s new GeoHash spatial search, see the API documentation.
Rails 3 Compatibility
Sunspot::Rails 1.2 is fully compatible with Rails 3. There’s not much more to say about that — just include it in your Gemfile, and it’ll work. Sunspot::Rails is still tied to ActiveRecord, though; I’d like to make it more ORM-agnostic in the future, much as Rails 3 is today.
Support legacy Solr schemas with :as
If you’ve got a legacy Solr schema where the field names don’t follow Sunspot’s naming conventions, you can now explicitly tell Sunspot what a field’s name in Solr is using the
:asoption:string :title, :as => :legacy_titleAs well as supporting legacy schemas, this option can be useful if you want to set up new field types in your Solr schema.
Other enhancements
- The
SilentFailSessionProxywill swallow errors that occur during write operations, useful if you’ve got an unreliable Solr instance and don’t want to throw application errors when a non-critical Solr write fails. - You can now include documents by identity, e.g.
with(some_instance). Presumably the primary use for this would be inside a disjunction. - You can call
Sunspot.optimizeto trigger a Solr optimize from inside your application.
That’s all, folks!
Well, that’s all for this release, friends. But we’ve got some big, big plans for Sunspot 1.3, due out in January 2055:
- Support for [http://wiki.apache.org/solr/FieldCollapsing](Field Collapsing) (group results by field value)
- Add NGram and EdgeNGram field types for easy prefix/substring search
- Improve edge-case spatial matching by searching proximate n+1-precision geohash boxes
- More stuff that you want!
Thank you!
Each release of Sunspot has been less of an individual effort and more of a community effort than the last, and 1.2 is by far the biggest example of that yet. From now on, it’s my official policy to give committer rights to anyone who submits a good, robust patch; I’d like for ongoing Sunspot development to be entirely community-driven, and we’re already a lot of the way there. Big thanks to everyone who has contributed to the library so far and thanks in advance to those whose patches are still to come.
-
Sunspot 0.8 is out
CommentsOn Friday, I released the next milestone in Sunspot, version 0.8. This version doesn’t add to or change any of the basic functionality, but does add some advanced features which the app I work on for my day job happens to demand. Here’s a rundown:
Direct access to the Query API
Users of Sunspot will doubless be familiar with Sunspot’s search DSL, which gives an English-like interface for constructing search parameters. In some cases, however, such a DSL is actually counterproductive, particularly when searches are being built by an intermediate object, and thus not necessarily all in one place. So, the new methods
Sunspot.new_search()andSearch#query()are exposed, and theSunspot::Queryclass itself is now part of the public API. What I have in mind in particular here is an application of the Go4 Builder pattern, along with ActiveRecord’s hash-initializer pattern, to elegantly translate web query parameters into a Sunspot search. Here’s a stripped-down example of what I think the code will look like to do that:class EventSearchBuilder attr_reader :search def initialize(options = {}) @search = Sunspot.new_search(Event) options.each_pair do |attr, value| if respond_to?("#{attr}=") send("#{attr}=", value) end end end def when=(day_string) case day_string when 'future' @search.query.add_restriction(:start_time, :greater_than, Time.now) when 'past' @search.query.add_restriction(:start_time, :less_than, Time.now) else date_time = Date.parse(day_string).to_time @search.query.add_restriction(:start_time, :between, date_time..(date_time + 1.day)) end end def page=(page) @search.query.paginate(page) end def sort=(field) @search.query.order_by(field) end endThen in controller code, it’s as simple as:
def search @search = EventSearchBuilder.new(params).search @search.execute! endDynamic Fields
I wouldn’t be surprised if I’m the only person who ever uses this feature of Sunspot, but just in case, let’s look at a real-world example. Let’s say part of my data model uses free-form key-value pairs, which use a constrained (but user-definable) set of keys and free-form values. I’ll call my model
KeyValuePairs.The trick I would like to pull here is that I would like to treat each key as a separate field in search, so that I can constrain, order, facet, etc. on the values for one key without them being affected by other keys. Since the keys are user-defined, I can’t just set up normal fields at build time; they need to be defined at index time. Enter Sunspot’s dynamic fields (we’ll use Sunspot::Rails’s wrapper API here):
class Business < ActiveRecord::Base has_many :key_value_pairs searchable do dynamic_string :key_value_pairs do key_value_pairs.inject({}) do |hash, pair| hash.merge(pair.key.to_sym => pair.value) end end end endThis sets up a dynamic field which is populated using the given block. What’s important there is that the field is populated using a hash - the keys of the hash become individual dynamic fields, and the values populate those fields in the index. The “base name” of the field is
key_value_pairs, which is used to namespace the dynamic names that come out of the hash.Working with dynamic fields is a lot like working with regular ones, except in the query, calls are wrapped in a
dynamicblock:Business.search do dynamic :key_value_pairs do with(:cuisine, 'Sushi') facet(:atmosphere) end endNaturally, those field names (
:cuisine,:atmosphere) wouldn’t be hard-coded in a real application, since they would not be known at build time.Dirty Sessions
Sessions now track whether any operations have been performed since the last time a
commitwas issued. TheSession#dirty?method answers that question, and theSession#commit_if_dirtydoes exactly what it sounds like. Useful methods if you want to keep your commits to a minimum (you do) but you may have various parts of the code issuing Sunspot operations without any central knowledge on the part of your application.That’s all for now
Sunspot 0.9 is up next; the main goal for that version is to replace solr-ruby with RSolr as the low-level Solr interface, which will open the door to more features in future versions (query-based faceting, LocalSolr support, etc.), but probably won’t have much effect on the API for that version (other than supporting use of the faster Curb library for the HTTP communication with Solr).
-
Sunspot 0.0.2 released
CommentsFinally, another release of sunspot, my library for awesome interaction with Solr in Ruby. I haven’t had much time to work on it in the past couple of months, so version 0.0.2 isn’t exactly earth-shattering. The good news is that, starting in a couple of weeks, my employers have me slated to spend much of my workday developing Sunspot, with the goal of putting it into production early this spring. So expect more, and bigger, releases soon.
In the meantime, here’s what’s new:
The
sunspot-solrexecutableAs promised in the README for version 0.0.1, I have made running a solr instance for development less of a hassle. Now all you’ve got to do is:
sunspot-solr startHey, nice. The executable takes two arguments: -p (that’s for port) and -d (that’s for data directory). By default, solr runs on port 8983 and saves the index in your
/tmpdirectory. So if you’re doing anything other than playing around or running tests, definitely make sure to specify that -d option. To play nice with daemons, throw a double-dash before the solr options:sunspot-solr start -- -p 8982 -d data/solr/developmentThe executable takes all the usual daemon commands, such as
sunspot-solr stopto stop it andsunspot-solr runto run in the foreground (this is an especially good idea if solr isn’t starting properly and you’d like to figure out why).The executable is intended for use in development/testing scenarios; if you’re running Sunspot in production (which, frankly, I wouldn’t do yet), you should set up your own standalone Solr instance.
Build searches with Builder objects
Version 0.0.1 of sunspot allowed you to put together a search like this:
search = Sunspot.search(Post) do keywords 'great pizza' with.blog_id 1 endOr like this:
search = Sunspot.search(Post, :keywords => 'great_pizza', :conditions => { :blog_id => 1 })So does version 0.0.2; however, in 0.0.2, the second syntax delegates the interpretation of that args hash to a Builder object (in this case, a Sunspot::StandardBuilder). As well as translating the hash into an actual search (internally accessing the same API that the block DSL does), the builder also holds onto the hash itself, allowing access to the search parameters later on in the game:
search.builder.keywords #=> "great pizza" search.builder.params[:keywords] #=> "great pizza" search.builder.conditions.blog_id #=> 1 search.builder.params[:conditions][:blog_id] #=> 1The idea here is that search parameters passed from other layers of the application - particularly from user input - can be dropped directly into the search builder, and then accessed where needed (in Rails, for instance, the object interface of the builder lends itself to FormBuilder integration; the hash interface makes generating URLs easy).
One thing I’m looking forward to doing in a future release is opening up the builder API for developers to create and use their own builders, allowing easy and clean encapsulation of rules for translating user input into search parameters (we actually use this approach at Patch now with an older home-grown Solr API that doesn’t explicitly support it, and it works well). Anyway, expect to hear more about that.
For the next Sunspot release I’ll be tackling faceting, which is currently the most glaring omission from Sunspot’s support for Solr’s featureset. Further down the line will be ORM (etc.) adapters (expect ActiveRecord, DataMapper, and File objects); plug-ins for Merb and Rails; and more of the builder stuff. I’m also looking into possibly switching Sunspot to run on top of a newer, more actively maintained fork of solr-ruby.
Well anyway check it out if you want:
sudo gem install outoftime-sunspot --source http://gems.github.com