This is a read-only archive!

Ruby hashes with custom objects as keys

When you're storing things in hashes, obviously you need a hash function to turn keys into numbers (or memory locations, or whatever), so you know which bucket gets which values. This hash function is nicely defined for Fixnums; two Fixnums give the same hash value no matter what, which makes sense since Fixnums are immutable, so two objects with the same Fixnum value are pretty much the same object in every way. Strings are mutable, but String#hash always returns the same hash value for two strings even if they're different objects (i.e. have different object_id's), apparently by using the string's length and contents in some way.

Things becomes screwy if you have your own class and you want to use objects of that class as hash keys though. From what I can tell, if a class doesn't define it's own method called hash, then Object#hash defaults to using an object's object_id as the hash value.

Why would you want to use your own class's objects as hash keys? Well, I got in trouble because Array#uniq happens to use that same hash function to determine uniqueness, and I want two objects with the same values for some subset of their instance methods to be considered non-unique. The default Object#hash doesn't do this.

It's not as simple as defining your own hash method; the documentation for Object#hash says:

This function must have the property that a.eql?(b) implies a.hash == b.hash

So Object#eql? is also apparently used by hashes somewhere along the way. The moral of this story is, if you want to use your objects as hash keys or ever plan to uniq an array containing them, you have to define a hash and eql? method. This code illustrates this:

def test(o1,o2)
    h = Hash.new
    h[o1] = true
    h[o2] = true

    puts "o1.object_id: #{o1.object_id}"
    puts "o2.object_id: #{o2.object_id}"
    puts "o1.hash: #{o1.hash}"
    puts "o2.hash: #{o2.hash}"
    puts "o1.eql? o2: #{o1.eql? o2}"
    puts "o1.value: #{o1.value}"
    puts "o2.value: #{o2.value}"
    puts "o1.value.object_id: #{o1.value.object_id}"
    puts "o2.value.object_id: #{o2.value.object_id}"
    puts "o1.value.hash: #{o1.value.hash}"
    puts "o2.value.hash: #{o2.value.hash}"
    puts "h.keys.length: #{h.keys.length}"
    puts "[o1,o2].uniq: #{h.keys.uniq}"
    puts "[o1,o2].uniq.length: #{h.keys.uniq.length}"
    puts
end

class Foo
    attr_reader :value
    def initialize(value)
        @value = value
    end
end

f1 = Foo.new('123')
f2 = Foo.new('123')

test(f1,f2)

class Foo
    def hash
        @value.hash
    end
end

test(f1,f2)

class Foo
    def eql?(other)
        @value.eql? other.value
    end
end

test(f1,f2)


test = 123
test2 = 123

puts test.object_id
puts test2.object_id
October 11, 2006 @ 8:47 AM PDT
Cateogory: Programming
Tags: Ruby