Back

Using the PostgreSQL hstore extension in Rails

by Robert Johansson - 2013-11-29

Hstore is an extension of PostgreSQL that allows you to store key-value pairs in a single PostgreSQL column. In this article we go through how to set up hstore withing a Rails app. We will also look on some use-cases, see how one can select and group on an hstore column and also have a quick look on the different ways to index them.

Introduction

The hstore in PostgreSQL stores key-value pairs of strings. Currently mySQL does not have any equivalent structure (2013-11-25) so if you choose to use hstore you are locking yourself to PostgreSQL. You can find the PostgreSQL reference for hstore here.

If you just need to serialize a hash you will not get any super gains from using hstore, in fact, you get some extra hassle from the fact that the keys and values are always converted to strings. If you for example want to store boolean values you will need to do some tweaks.

The real win comes if you need to query on the hash values. This would be really tricky if you where just using a serialized hash column but if you use hstore it is super simple. You can also use group statements on the values of a specific key inside the hstore.

How to add an hstore column to your Rails app

The hstore support is built in to Rails 4, if you are using Rails 3, take a look at the gem activerecord-postgres-hstore.

First thing is to add the Hstore extension to your PostgreSQL database, there are a few different ways you can do, I find this database migration works best for me:

class AddHstoreExtension < ActiveRecord::Migration

  def self.up
    execute "CREATE EXTENSION IF NOT EXISTS hstore"
  end

  def self.down
    execute "DROP EXTENSION hstore"
  end

end

The next step is to add model with an hstore attribute, or you could add an hstore attribute to an existing model

class CreateMyModels < ActiveRecord::Migration
  def change
    create_table :my_models do |t|
      # Default to empty string and dont allow null values means you will start with an empty hash instead of nil
      t.hstore :my_field, default: '', null: false
    end
  end
end

class MyModel < ActiveRecord::Base
end

For Rails 4 you don't need to add anything else, your column my_column will behave as a serialized hash, in Rails 3, follow the instructions for activerecord-postgres-hstore . After that you can do things like:

m = MyModel.new
m.my_field = {"my_key" => "my_value"}
One gotcha worth mentioning is the following
m = MyModel.new
m.my_field["new_key"] = "new_value"
m.save
m.my_field["new_key"] # => "new_value"
m.reload
m.my_field["new_key"] # => nil

This is due to the that field does not get marked as changed if you edit the hash in this way, so make sure to assign directly to the field to avoid the problem.

Querying

You can query on the keys and value of the hash, for example:

# All models where the myfield column has the key 'my_key' set to 'my_value'
MyModel.where("my_field -> 'my_key' = ?" , "my_value")
# All models where the myfield column has the key 'my_key' with any value
MyModel.where("my_field ? 'my_key'")

You can also group on the values of the hash. This works like a normal group by but you can group on a specific hash key:

.group( "my_field -> 'my_key'" )

It is also possible to sum the values of the a certain hash key, note that the values are strings so if you for example are storing integers you need to convert them before you sum them.

.select("SUM((my_field->'my_key')::integer) as sum_value")

Things you can't do

It is currently (2013-11-27) not possible to store anything else than simple hashes with string keys and values. That means you cannot store nested hashes, like in for example MongoDB. Also, if you try to store anything else than strings in an hstore, the values will be automatically converted to strings, so you need to handle those cases if you for example need to store integers or booleans.

Using store_accessor

ActiveRecord::Store is a wrapper around a serialized field, such as the hstore, to use it you just add a line to your model.

class MyModel < ActiveRecord::Base
  store_accessor :my_field, [:color, :locale]
end

Then you can use these fields as any other field (but beware) , the content of your fields are still converted to strings, this is done when the object is saved.

m = MyModel.new
m.color = "blue"

Use-cases

I think one of the best use-cases is for storing attributes that vary between different instances. For example in a store-application where products might have a set of different attributes, like size, color and so on. If you want users to be able to add their own custom attributes, then you could use a hstore column. The main advantage compared to using a text column with a serialized hash is the possibility to search on the hash values. A demo of this use-case can be found here. If you add some additional models, you could make something that solves the same type of problems as the EAV (Entity Attribute Value) design.

Indexing an hstore column

You can add indexes to an hstore column to speed up queries. There are a few different options.

If you add a BTREE index to an hstore column you will make the = operator fast, however this is not usually the type of query you would want to do.

It is also possible to add a BTREE index on a specific key of the hstore, that will speed up your queries on that key. This could be useful, but if you are considering this type of index, think about if you should extract this key of the hstore into its own column.

You can also add GIN or GIST indexes on the hstore columns. These indexes support the @>, ?, ?& and ?| operators. So if you add this you can rewrite your queries that check for equality on a specific key like this

# Original
MyModel.where("my_field -> 'my_key' = ?", "my_value")
# Rewritten with better performance if you have added a GIN or GIST index
MyModel.where("my_field @> 'my_key=>my_value'")

According to this page the second way is at least two times faster.

Should one use a GIN or a GIST index

As a rule of thumb, GIN indexes are best for static data because lookups are faster. For dynamic data, GiST indexes are faster to update. More detailed documentation can be found here.

Links

A nice example of how you can use hstore to store dynamic attributes. Article, video, and code on github available.

Some other code I found, I have not used any of these personally.


Comments

comments powered by Disqus
Back