ruote
Workflow engine written in Ruby
On Friday, I released the next milestone in Sunspot, version 0.8. This version doesn’t add to or change any of the basic functionality, but does add some advanced features which the app I work on for my day job happens to demand. Here’s a rundown:
Users of Sunspot will doubless be familiar with Sunspot’s search DSL, which gives an English-like interface for constructing search parameters. In some cases, however, such a DSL is actually counterproductive, particularly when searches are being built by an intermediate object, and thus not necessarily all in one place. So, the new methods Sunspot.new_search() and Search#query() are exposed, and the Sunspot::Query class itself is now part of the public API. What I have in mind in particular here is an application of the Go4 Builder pattern, along with ActiveRecord’s hash-initializer pattern, to elegantly translate web query parameters into a Sunspot search. Here’s a stripped-down example of what I think the code will look like to do that:
class EventSearchBuilder
attr_reader :search
def initialize(options = {})
@search = Sunspot.new_search(Event)
options.each_pair do |attr, value|
if respond_to?("#{attr}=")
send("#{attr}=", value)
end
end
end
def when=(day_string)
case day_string
when 'future'
@search.query.add_restriction(:start_time, :greater_than, Time.now)
when 'past'
@search.query.add_restriction(:start_time, :less_than, Time.now)
else
date_time = Date.parse(day_string).to_time
@search.query.add_restriction(:start_time, :between, date_time..(date_time + 1.day))
end
end
def page=(page)
@search.query.paginate(page)
end
def sort=(field)
@search.query.order_by(field)
end
end
Then in controller code, it’s as simple as:
def search
@search = EventSearchBuilder.new(params).search
@search.execute!
end
I wouldn’t be surprised if I’m the only person who ever uses this feature of Sunspot, but just in case, let’s look at a real-world example. Let’s say part of my data model uses free-form key-value pairs, which use a constrained (but user-definable) set of keys and free-form values. I’ll call my model KeyValuePairs.
The trick I would like to pull here is that I would like to treat each key as a separate field in search, so that I can constrain, order, facet, etc. on the values for one key without them being affected by other keys. Since the keys are user-defined, I can’t just set up normal fields at build time; they need to be defined at index time. Enter Sunspot’s dynamic fields (we’ll use Sunspot::Rails’s wrapper API here):
class Business < ActiveRecord::Base
has_many :key_value_pairs
searchable do
dynamic_string :key_value_pairs do
key_value_pairs.inject({}) do |hash, pair|
hash.merge(pair.key.to_sym => pair.value)
end
end
end
end
This sets up a dynamic field which is populated using the given block. What’s important there is that the field is populated using a hash - the keys of the hash become individual dynamic fields, and the values populate those fields in the index. The “base name” of the field is key_value_pairs, which is used to namespace the dynamic names that come out of the hash.
Working with dynamic fields is a lot like working with regular ones, except in the query, calls are wrapped in a dynamic block:
Business.search do
dynamic :key_value_pairs do
with(:cuisine, 'Sushi')
facet(:atmosphere)
end
end
Naturally, those field names (:cuisine, :atmosphere) wouldn’t be hard-coded in a real application, since they would not be known at build time.
Sessions now track whether any operations have been performed since the last time a commit was issued. The Session#dirty? method answers that question, and the Session#commit_if_dirty does exactly what it sounds like. Useful methods if you want to keep your commits to a minimum (you do) but you may have various parts of the code issuing Sunspot operations without any central knowledge on the part of your application.
Sunspot 0.9 is up next; the main goal for that version is to replace solr-ruby with RSolr as the low-level Solr interface, which will open the door to more features in future versions (query-based faceting, LocalSolr support, etc.), but probably won’t have much effect on the API for that version (other than supporting use of the faster Curb library for the HTTP communication with Solr).
As a developer of Ruby libraries and applications, I’d like to make sure my code works in all of the major ruby implementations, but I’ve also got my “main” Ruby, the one that has been with me through thick and thin and happens to be the version installed on our production servers. The other Rubies need a place on my machine, but I’d like that place to be out of the way and have no chance of conflicting with my main Ruby installation or anything else I’ve got installed.
Fortunately, the omniscient beings who created the Filesystem Hierarchy Standard anticipated this need of mine, and in their wisdom created the /opt directory for this purpose. Unlike a normal package installation, which installs files in various places across your file system - /usr/bin, /usr/lib, /etc, /var, and the like - optional package installations put everything into a single subdirectory of /opt, where they’re fairly isolated from the rest of the system.
So, here’s how I installed YARV and JRuby as optional packages. This should work for anyone using Linux or Mac OS X1:
Find a nice directory for downloads.
$ wget ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.1-p129.tar.gz
$ wget http://dist.codehaus.org/jruby/1.2.0/jruby-bin-1.2.0.tar.gz
$ sudo mkdir -pv /opt/ruby-1.9.1-p129
$ tar xzvf ruby-1.9.1-p129.tar.gz
$ cd ruby-1.9.1-p129
$ ./configure --prefix=/opt/ruby-1.9.1-p129
$ make
$ sudo make install
$ sudo tar -C /opt -xzvf jruby-bin-1.2.0.tar.gz
$ sudo rm -v /opt/jruby-1.2.0/bin/*.bat
You can also remove most of the directories in the /opt/jruby-1.2.0/lib/native directory, except the one that corresponds to your architecture. If in doubt, leaving them all in won’t hurt.
Assuming you’ve got RubyGems installed in your main Ruby installation, you don’t need to install it for your other installations - you can simply run the existing gem script using the various binaries, and it’ll work the way you want (installing the gems inside those optional package directories). For example:
$ sudo /opt/ruby-1.9.1-p129/bin/ruby -S gem install rake
$ sudo /opt/jruby-1.2.0/bin/jruby -S gem install rake
Using the small rubies script I covered in this post makes the process of installing gems (and doing anything else) in your various ruby versions considerably less painful.
1If you use MacPorts, which you probably do, you’ve got a bunch of software installed in a standard hierarchy inside of the /opt/local directory. That isn’t really the way it was intended to be used, but it won’t conflict with the installations covered in this post.
Today’s mission was to get Sunspot working in all the major Ruby implementations (MRI, YARV, JRuby). I personally use MRI 1.8.6p114, and hadn’t had a need to install any other Ruby implementations, so I first tried out multiruby, which handles all of the installing and running of different Ruby versions. Alas, it didn’t work very well - after all, package management is a nontrivial problem, and multiruby does attempt to perform that function in a sense. The packages didn’t install.
I also wasn’t a big fan of installing a whole filesystem hierarchy inside a hidden directory in my home directory (some people like this — follow your own path, young jedi); and it required I install the entire ZenTest gem, which I otherwise have not found much use for.
So, I went ahead and just installed YARV and JRuby as optional packages, installed gems for them, ran the spec suite under each version, and fixed the bugs that came up. Great! But I did notice that this involved a lot of typing out full paths to the various Ruby binaries, particularly since I will want to be running the specs under all the versions from now on. This is not a difficult problem to solve, I thought to myself. Enter quick 15-line script:
#!/usr/bin/env ruby
require 'rubygems'
gem 'escape'
require 'escape'
File.open(File.join(ENV['HOME'], '.rubies')) do |file|
file.each_line do |bin|
bin.sub!(/\n$/, '')
STDERR.puts("Executing in #{`#{Escape.shell_command([bin, '-v'])}`}")
fork do
exec(Escape.shell_command([bin].concat(ARGV)))
end
Process.wait
end
end
Then I just set up my ~/.rubies file, containing full paths to my various Ruby binaries:
/usr/local/bin/ruby
/opt/ruby-1.9.1-p129/bin/ruby
/opt/jruby-1.2.0/bin/jruby
And a quick rubies -S rake runs my suites in all the relevant ruby versions. Genius? No, but it gives me all the useful functionality of multiruby with the control over installation that I crave. Hopefully it’ll help you too.
Momentous news: Yesterday I released the 0.7 version of Sunspot, my library for awesome Solr interaction in pure Ruby. This is the first release that I consider basically feature complete, meaning there aren’t any gaping holes in the feature set - not that there aren’t more features in the pipeline! Read on for all the new goodies.
Sunspot is now fully documented. In order to distinguish the public and private APIs, I made liberal use of :nodoc: on classes and methods that are not part of the public API. So, everything in the RDoc is fair game; if you find yourself needing to call methods that aren’t in the RDoc, let me know so I can expose what you need in the public API.
API-private methods are still documented in the code.
In earlier versions, the search DSL looked like this:
Sunspot.search(Post) do
with.blog_id 1
with.average_rating.less_than 4
end
Now it looks like this:
Sunspot.search(Post) do
with :blog_id, 1
with(:average_rating).less_than 4
end
I find the new syntax to be more intuitive, and my colleagues whom I polled unanimously agreed. Sometimes Ruby developers get perhaps a bit too excited about how cleverly one can construct English-like DSLs in the language, and I would plead guilty to that charge where the earlier version is concerned.
The search DSL now provides without, which is a counterpart to the with method that negates the restriction. So, you can do the following:
Sunspot.search(Post) do
without :blog_id, 2
end
That would exclude all posts whose blog_id is 2.
A special use of the without method is excluding specific objects from the search results. This is done by just passing the objects you want to exclude to without. For example, if you have an instance current_post that you don’t want in the search results:
Sunspot.search(Post) do
without current_post
end
You can now pass nil to an equality restriction, which will restrict the results to documents for which the given field has no value. For example:
Sunspot.search(Post) do
with :category_id, nil
end
The above would return only documents that do not have a category_id. without works as expected, returning only documents that do have a value for the field in question. Passing nil to other restriction types is not allowed.
One of Solr’s most powerful features is faceting, which returns all of the values stored for a given field, and the number of documents that have each value. It’s perfect for building drill-down search interfaces. Sunspot now supports it:
search = Sunspot.search(Post) do
with :blog_id, 2
facet :category_ids
end
category_ids_facet = search.facet(:category_ids)
category_ids_facet.rows.map { |row| [row.value, row.count] }
#=> [[25, 3], [13, 1]]
The above results indicate that there are 3 documents with blog_id 2 and category_id 25, and 1 document with blog_id 2 and category_id 13. Note that facet results are for the total number of documents that match the search conditions, not just the current result set (which is paginated). Note also that Sunspot casts facets into the appropriate Ruby object for their type - thus a time field’s facet values will be Time objects, etc.
Solr also provides even more powerful query-based faceting (to facet by ranges of values, for instance), which I plan to tackle in a future version of Sunspot.
Changing data in Solr is a two-step process - first, data is added, updated, or removed; then the changes are committed. When a commit is called, all pending changes are written to disk, and Solr instantiates a new searcher object with the updated index. In order for changes to appear in search, they must be committed, but the commit is a fairly expensive operation; thus, if you are making multiple updates as part of one operation, it’s highly advisable to commit once, after making all of the changes. Earlier versions of Sunspot automatically committed after each change; now a commit method is exposed, as well as bang!-versions of the update methods, which perform a commit immediately. So:
Sunspot.index(my_document)
Sunspot.commit
# does the same thing as
Sunspot.index!(my_document)
boolean is now an available field type. It works pretty much as you’d expect. Note that in order for a false value to be indexed, it has to explicitly be false — nil will not be indexed at all. Anything else is indexed as true.
You can now tell an attribute field to pull data from an attribute other than the one named by the field. For example:
Sunspot.setup(Post) do
float :average_rating, :using => :ratings_average
end
This would index a field called average_rating, pulling data from the ratings_average method.
If the block specified for a virtual field takes an argument, the block will be passed the instance, rather than being evaluated in its context:
Sunspot.setup(Post) do
string :sort_title do
title.downcase
end
# is the same as
string :sort_title do |post|
post.title.downcase
end
end
Use whichever one feels more natural for the given object - for instance, I would use the first form for a model I had written, but the second form for File objects.
You can now call order inside the search DSL more than once. Earlier calls get higher precedence.
Sunspot is intended to be flexible in what it can index and search; to that end, it provides a pluggable adapter architecture. Sunspot 0.7 makes several changes to the adapter API; this should be the final API that goes into the 1.0 release.
Instead of building a single adapter module that contains two classes with preset names, the new API allows (and indeed requires) the two classes to be registered seperately. An adapter should consist of two classes: a subclass of Sunspot::Adapters::InstanceAdapter, and a subclass of Sunspot::Adapters::DataAccessor. Check out the Rdoc for information on what methods each should and can implement. Here’s an example of how one might build an adapter for File objects:
class FileInstanceAdapter < Sunspot::Adapters::InstanceAdapter
def id
File.expand_path(@instance.path)
end
end
class FileDataAccessor < Sunspot::Adapters::DataAccessor
def load(id)
@clazz.open(id)
end
end
Sunspot::Adapters::InstanceAdapter.register(FileInstanceAdapter, File)
Sunspot::Adapters::DataAccessor.register(FileDataAccessor, File)
The last release of Sunspot attempted to provide a framework for using the Builder pattern to convert external parameters into searches. Upon further reflection, I found the API and implementation rather awkward, and it wasn’t really a core part of what Sunspot was trying to do. The search method still can accept a hash of parameters, but you really shouldn’t use that because the DSL is way better.
This also means that, for the moment, Search objects don’t provide access to the parameters passed in. That’s probably not a good thing, so I’ll try and find a clean, intuitive way to provide that access in a new version.
Extlib is cool, but Sunspot was only using three methods from it, and it was causing some sort of weird error when I tried to run the tests on the installed gem. So I implemented the three methods myself and got rid of the extlib dependency.
That should just about cover all of the external-facing changes in the new version. There’s also lots of internal refactoring and simplification that should make the code leaner, faster, and more maintainable. This is the first version that I would feel comfortable putting into a production environment - if you’re using it, I’d love to know.
The day when we ditch Starling for RabbitMQ will be a good day.
Recently I came across a situation in which I needed to make some fairly major changes to our codebase, and those changes needed to apply to two different branches - both a version branch that’s currently in QA, and the master branch. The changes involved reverting a few commits as well as making new changes that were too big to comfortably fit into one commit. One option would be to simply make all of the commits in one branch, and then cherry-pick them by hand into the other, but that seemed awfully manual for such a powerful tool as git.
Enter git cherry-tree, a little alias I came up with that basically creates a series of patches based on the commit diff between two branches, and pipes the result directly into a third branch. Before I explain how to use it, here’s how to add the alias to your config:
git config --global alias.cherry-tree \!sh\ -c\ \'git-format-patch\ --stdout\ \$0..\$1\ \|\ \git\ am\ -3\'
For the sake of this example, we’ll call our branches qa and master. My goal is to branch qa, make a series of commits, and then apply the changes in those commits to master as well. The result should be equivalent to individually cherry-picking the commits that I’ve made, but easier and more reliable. Here we go:
git checkout qa
git checkout -b qa-big-changes
# make and commit the changes
git checkout master
git checkout -b master-big-changes
git cherry-tree qa qa-big-changes
That last command says, essentially, “Sequentially apply each commit that is in qa-big-changes, but not qa, to the current branch.” Unless you’re a naturally lucky person, you’ll probably have conflicts - when this happens, the process will stop midstream, telling you which files are in conflict. Let’s say config/environment.rb is in conflict:
# open config/environment.rb and fix the conflict
git add config/environment.rb
git am -3 --resolved
Note that, unlike with a normal merge conflict, you don’t want to commit after fixing the conflicts - just add the conflicted files to the index. The last command just says, “OK, problem solved, pick up where you left off.” Note also that, since each commit from your other branch is applied individually, this can happen more than once (of course, with different conflicts, in different commits).
Once the entire patch has run cleanly, master-big-changes will have a series of commits equivalent to the commits that you made to qa-big-changes (they won’t be the same commits, though - just like a cherry-pick). Then you can merge your changes into the respective branches:
git checkout qa
git merge qa-big-changes
git branch -d qa-big-changes
git checkout master
git merge master-big-changes
git branch -d master-big-changes
As a final note, the use of master-big-changes isn’t strictly necessary - you can just do it directly in master. I just feel safer doing this process to a separate branch and then merging it in. However, making your original set of changes in qa-big-changes, rather than directly in qa, is required for this process.
ShellElf is a small daemon that reads shell commands out of a Starling queue and runs them. Great if you need to do non-trivial processing tasks in the background. It’s lightweight but has a few neat features:
If you find yourself having to switch locally between different git branches that have different database schemata, (for instance, a master branch and a production branch), one of the biggest hassles is keeping the database in the right state. We wrote this rake task to make the process painless: just specify the branch you’re about to switch to, and it will automatically roll back to the last common database version between that and your current branch. Usage:
rake db:rollback_to_common branch=production
git checkout -m production
It will raise an exception if nothing needs to be done, which doesn’t really bother me, but if it bothers you, feel free to edit - it’s a gist.