Musings & ramblings of a Pythonista
Riak is a Amazon Dynamo inspired masterless Key-Value store written in Erlang. It is one of those NoSQL databases that is rock stable, production ready and promises zero downtime. I have been using Riak at work and was literally blown away by its simplicity (Setting up a three node cluster wouldn't even take ten minutes) and the kind of support the Riak Community provides. And it is amazingly fast too. The Data retrieval operations in Riak are basically using methods like Get (by key), MapReduce and Key Filters.
Riak also supports multiple backends for Key-value store. And Google's own LevelDB is one of them. One of the advantage of using LevelDB with Riak is that they support Secondary Indexes. This is a way to retrieve data faster when you want to use an SQL like Query interface. But the problem is that Riak only supports single index queries. That means, you will be able to query only one field at a time.
I wrote a Python wrapper that allows multiple index queries using Secondary indexes and MapReduce. The basic steps are
As suggested by Elias Levy, it is ideal to compute the intersection of all the index queries rather than evaluating filters on Map Phase as this involves parsing the data and validating filters. This could become very slow when the number of keys returned by a single index query is larger compared to other indexes. The sources are updated to reflect this change.
So the new steps are
Using this, you can write queries like
client = riak.RiakClient('localhost', 8091) bucket = client.bucket('test_multi_index') bucket.new('sree', {'name': 'Sreejith', 'age': 25}).\ add_index('name_bin', 'Sreejith').\ add_index('age_int', 25).store() bucket.new('vishnu', {'name': 'Vishnu', 'age': 31}).\ add_index('name_bin', 'Vishnu').\ add_index('age_int', 31).store() query = RiakMultiIndexQuery(client, 'test_multi_index') for res in query.filter('name', '==', 'Sreejith').run(): print res query.reset() for res in query.filter('age', '<', 50).filter('name', '==', 'Vishnu').run(): print res query.reset() for res in query.filter('age', '<', 50).order('age', 'ASC').run(): print res query.reset() for res in query.limit(1).run(): print res query.reset() for res in query.order('age', 'ASC').offset(1).limit(1).run(): print res
You can find the full source code at GitHub.