Home » Category » Python

Python: hash() algorithm

206| Wed, 26 Dec 2007 23:37:00 GMT| benotdejean| Comments (7)
hi. Is the hash() algorithm standard ? Does hash(some_string) will always
return the same hash code on every arch ?

i need to use a ~checksum function, like md5, but i was also thinking
about hash() which is obviously simpler. So i can safely rely on hash()
behaviour so i can use it to generate ~strong and portable
identifier/checksum ?

thank you

Keywords & Tags: hash, algorithm, python

URL: http://programming.itags.org/python/32371/
 
«« Prev - Next »» 7 helpful answers below.
Beno?t Dejean <bnet...ifrance.com> wrote:
> hi. Is the hash() algorithm standard ? Does hash(some_string) will always
> return the same hash code on every arch ?
> i need to use a ~checksum function, like md5, but i was also thinking
> about hash() which is obviously simpler. So i can safely rely on hash()
> behaviour so i can use it to generate ~strong and portable
> identifier/checksum ?

I'm not an expert, but I believe so. I just tried three machines:

OS X 10.4: (Python 2.3)
>>> hash('test')
1308370872

Solaris: (Python 1.6)
>>> hash('test')
1308370872

FreeBSD 5.2.1: (Python 2.3)
>>> hash('test')
1308370872

> thank you

--
Kristofer Pettijohn
kristofer...cybernetik.net

kristoferpettijohn | Wed, 26 Dec 2007 23:38:00 GMT |

Benot Dejean <bnet...ifrance.com> writes:

> i need to use a ~checksum function, like md5, but i was also thinking
> about hash() which is obviously simpler. So i can safely rely on hash()
> behaviour so i can use it to generate ~strong and portable
> identifier/checksum ?

I don't believe it's changed since at least 1.5.2, but I'm also pretty
sure there are no guarantees that it will remain the same going forward.

Also, how strong do you want your checksum to be? That is, how much
of a guarantee do you want that you'll be able to detect a change in
the data by a change in the checksum? MD5 will give you a really
strong guarantee, hash() - whether stable/portable or not - will give
you a reasonably weak guarantee since it's not built to be collision
free.

-- David

davidbolen | Wed, 26 Dec 2007 23:39:00 GMT |

[Benot Dejean]
> hi. Is the hash() algorithm standard ? Does hash(some_string) will always
> return the same hash code on every arch ?

No, and in fact it's almost certain to deliver a different hash on a
32-bit machine than on a 64-bit machine (Python hash codes are the
same size as the native platform C "long" type). Python doesn't
promise to deliver the same hash codes across releases either
(although it usually does anyway).

> i need to use a ~checksum function, like md5, but i was also thinking
> about hash() which is obviously simpler. So i can safely rely on hash()
> behaviour so i can use it to generate ~strong and portable
> identifier/checksum ?

It's not strong. It's easy to find distinct strings with the same
Python hash; it's widely thought to be intractable to do the same wrt
MD5 or SHA hashes.

timpeters | Wed, 26 Dec 2007 23:40:00 GMT |

Benot Dejean <bnet...ifrance.com> writes:
> hi. Is the hash() algorithm standard ? Does hash(some_string) will always
> return the same hash code on every arch ?

I'd say you should not rely on that.

> i need to use a ~checksum function, like md5, but i was also
> thinking about hash() which is obviously simpler. So i can safely
> rely on hash() behaviour so i can use it to generate ~strong and
> portable identifier/checksum ?

I don't know what you mean by "strong". I'm sure you can find collisions
in hash() without much effort. It's much harder to do that for md5.

paulrubin | Wed, 26 Dec 2007 23:41:00 GMT |

On Wed, 21 Jul 2004, [iso-8859-1] Benot Dejean wrote:

> i need to use a ~checksum function, like md5, but i was also thinking
> about hash() which is obviously simpler.

md5 is actually very easy to use on Python:

>>> import md5
>>> chksum=md5.new('apple')
>>> chksum.hexdigest()
'1f3870be274f6c49b3e31a0c6728957f'
>>> int(chksum.hexdigest())
41499123188802761002464065009245263231L

This is a little more verbose than hash(), but it's just as
straightforward, and can more easily be used with large messages (see the
.update() method of the md5 object returned by new()).

christophertking | Wed, 26 Dec 2007 23:42:00 GMT |

Christopher T King <squirrel...WPI.EDU> writes:
> >>>
> 41499123188802761002464065009245263231L

How'd you do that? You should need to say
int(chksum.hexdigest(), 16)

paulrubin | Wed, 26 Dec 2007 23:43:00 GMT |

Le Wed, 21 Jul 2004 16:34:16 -0400, David Bolen a crit*:

> Benot Dejean <bnet...ifrance.com> writes:
> I don't believe it's changed since at least 1.5.2, but I'm also pretty
> sure there are no guarantees that it will remain the same going forward.

ok

> Also, how strong do you want your checksum to be? That is, how much
> of a guarantee do you want that you'll be able to detect a change in
> the data by a change in the checksum? MD5 will give you a really
> strong guarantee, hash() - whether stable/portable or not - will give
> you a reasonably weak guarantee since it's not built to be collision
> free.

i know this. i've been using md5 for a long time, i was just wondering if
... thank you all.

benotdejean | Wed, 26 Dec 2007 23:44:00 GMT |

Python Hot Answers

Python New questions

Python Related Categories