Python: hash() algorithm

  • benotdejean / 206 / Fri, 27 Mar 2009 05:35:00 GMT / Comments (7)
  • hi. Is the hash() algorithm standard ? Does hash(some_string) will always
    return the same hash code on every arch ?

    i need to use a ~checksum function, like md5, but i was also thinking
    about hash() which is obviously simpler. So i can safely rely on hash()
    behaviour so i can use it to generate ~strong and portable
    identifier/checksum ?

    thank you
  • Keywords:

    hash, algorithm, python

  • http://programming.itags.org/python/32371/«« Last Thread - Next Thread »»
    1. Beno?t Dejean <bnet...ifrance.com> wrote:
      > hi. Is the hash() algorithm standard ? Does hash(some_string) will always
      > return the same hash code on every arch ?
      > i need to use a ~checksum function, like md5, but i was also thinking
      > about hash() which is obviously simpler. So i can safely rely on hash()
      > behaviour so i can use it to generate ~strong and portable
      > identifier/checksum ?

      I'm not an expert, but I believe so. I just tried three machines:

      OS X 10.4: (Python 2.3)
      >>> hash('test')
      1308370872

      Solaris: (Python 1.6)
      >>> hash('test')
      1308370872

      FreeBSD 5.2.1: (Python 2.3)
      >>> hash('test')
      1308370872

      > thank you

      --
      Kristofer Pettijohn
      kristofer...cybernetik.net

      kristoferpettijohn | Wed, 26 Dec 2007 23:38:00 GMT |

    2. Benot Dejean <bnet...ifrance.com> writes:

      > i need to use a ~checksum function, like md5, but i was also thinking
      > about hash() which is obviously simpler. So i can safely rely on hash()
      > behaviour so i can use it to generate ~strong and portable
      > identifier/checksum ?

      I don't believe it's changed since at least 1.5.2, but I'm also pretty
      sure there are no guarantees that it will remain the same going forward.

      Also, how strong do you want your checksum to be? That is, how much
      of a guarantee do you want that you'll be able to detect a change in
      the data by a change in the checksum? MD5 will give you a really
      strong guarantee, hash() - whether stable/portable or not - will give
      you a reasonably weak guarantee since it's not built to be collision
      free.

      -- David

      davidbolen | Wed, 26 Dec 2007 23:39:00 GMT |

    3. [Benot Dejean]
      > hi. Is the hash() algorithm standard ? Does hash(some_string) will always
      > return the same hash code on every arch ?

      No, and in fact it's almost certain to deliver a different hash on a
      32-bit machine than on a 64-bit machine (Python hash codes are the
      same size as the native platform C "long" type). Python doesn't
      promise to deliver the same hash codes across releases either
      (although it usually does anyway).

      > i need to use a ~checksum function, like md5, but i was also thinking
      > about hash() which is obviously simpler. So i can safely rely on hash()
      > behaviour so i can use it to generate ~strong and portable
      > identifier/checksum ?

      It's not strong. It's easy to find distinct strings with the same
      Python hash; it's widely thought to be intractable to do the same wrt
      MD5 or SHA hashes.

      timpeters | Wed, 26 Dec 2007 23:40:00 GMT |

    4. Benot Dejean <bnet...ifrance.com> writes:
      > hi. Is the hash() algorithm standard ? Does hash(some_string) will always
      > return the same hash code on every arch ?

      I'd say you should not rely on that.

      > i need to use a ~checksum function, like md5, but i was also
      > thinking about hash() which is obviously simpler. So i can safely
      > rely on hash() behaviour so i can use it to generate ~strong and
      > portable identifier/checksum ?

      I don't know what you mean by "strong". I'm sure you can find collisions
      in hash() without much effort. It's much harder to do that for md5.

      paulrubin | Wed, 26 Dec 2007 23:41:00 GMT |

    5. On Wed, 21 Jul 2004, [iso-8859-1] Benot Dejean wrote:

      > i need to use a ~checksum function, like md5, but i was also thinking
      > about hash() which is obviously simpler.

      md5 is actually very easy to use on Python:

      >>> import md5
      >>> chksum=md5.new('apple')
      >>> chksum.hexdigest()
      '1f3870be274f6c49b3e31a0c6728957f'
      >>> int(chksum.hexdigest())
      41499123188802761002464065009245263231L

      This is a little more verbose than hash(), but it's just as
      straightforward, and can more easily be used with large messages (see the
      .update() method of the md5 object returned by new()).

      christophertking | Wed, 26 Dec 2007 23:42:00 GMT |

    6. Christopher T King <squirrel...WPI.EDU> writes:
      > >>>
      > 41499123188802761002464065009245263231L

      How'd you do that? You should need to say
      int(chksum.hexdigest(), 16)

      paulrubin | Wed, 26 Dec 2007 23:43:00 GMT |

    7. Le Wed, 21 Jul 2004 16:34:16 -0400, David Bolen a crit*:

      > Benot Dejean <bnet...ifrance.com> writes:
      > I don't believe it's changed since at least 1.5.2, but I'm also pretty
      > sure there are no guarantees that it will remain the same going forward.

      ok

      > Also, how strong do you want your checksum to be? That is, how much
      > of a guarantee do you want that you'll be able to detect a change in
      > the data by a change in the checksum? MD5 will give you a really
      > strong guarantee, hash() - whether stable/portable or not - will give
      > you a reasonably weak guarantee since it's not built to be collision
      > free.

      i know this. i've been using md5 for a long time, i was just wondering if
      ... thank you all.

      benotdejean | Wed, 26 Dec 2007 23:44:00 GMT |

  • Python Questions

    • hash() algorithm

      hi. Is the hash() algorithm standard ? Does hash(some_string) will alwaysreturn the same hash code o...

      By benot_dejean, 7 Comments

    • hash()

      Hi,For strings of > 1 character, what are the chancesthat hash(st) and hash(st[::-1]) would retur...

      By john_marshall, 9 Comments

    • NameError: name 'guess' is not defined

      I am very new to both programming and Pyhton and while trying to dosome practice using A byte of pyt...

      By willkab6_gmail, 3 Comments

    • nameerror upon calling function

      Hi,I have a program that makes a call to a function in a different pythonscript that I wrote. But, w...

      By seancron, 1 Comments

    • Kiwi

      Is anyone using the Kiwi wrappers for pygtk? Is there an updatedversion of it? The one I can find is...

      By laughlin_josephv, 1 Comments

    • NameError

      Hi guys,probably dumb question but after googling a lot I couldn't find an answer.with a simple...

      By gmtaglia, 2 Comments

    • Program that can find a find a file for you ?

      Greetings.Im trying to write a program that can be run from the command line.If I want to search for...

      By peter_hansen, 6 Comments

    • Kiwi

      Is anyone using the Kiwi wrappers for pygtk? Is there an updatedversion of it? The one I can find is...

      By laughlin_joseph_v, 1 Comments