Python: hash() algorithm

  • benot_dejean / 200 / Fri, 27 Mar 2009 05:34:00 GMT / Comments (7)
  • hi. Is the hash() algorithm standard ? Does hash(some_string) will always
    return the same hash code on every arch ?

    i need to use a ~checksum function, like md5, but i was also thinking
    about hash() which is obviously simpler. So i can safely rely on hash()
    behaviour so i can use it to generate ~strong and portable
    identifier/checksum ?

    thank you
  • Keywords:

    hash, algorithm, python

  • http://programming.itags.org/python/32370/«« Last Thread - Next Thread »»
    1. Beno?t Dejean <bnet...ifrance.com> wrote:
      > hi. Is the hash() algorithm standard ? Does hash(some_string) will always
      > return the same hash code on every arch ?
      > i need to use a ~checksum function, like md5, but i was also thinking
      > about hash() which is obviously simpler. So i can safely rely on hash()
      > behaviour so i can use it to generate ~strong and portable
      > identifier/checksum ?


      I'm not an expert, but I believe so. I just tried three machines:

      OS X 10.4: (Python 2.3)
      1308370872

      Solaris: (Python 1.6)
      1308370872

      FreeBSD 5.2.1: (Python 2.3)
      1308370872

      > thank you

      Kristofer Pettijohn
      kristofer...cybernetik.net

      kristofer_pettijohn | Tues, 29 Apr 2008 19:11:00 GMT |

    2. Benot Dejean <bnet...ifrance.com> writes:

      > i need to use a ~checksum function, like md5, but i was also thinking
      > about hash() which is obviously simpler. So i can safely rely on hash()
      > behaviour so i can use it to generate ~strong and portable
      > identifier/checksum ?


      I don't believe it's changed since at least 1.5.2, but I'm also pretty
      sure there are no guarantees that it will remain the same going forward.

      Also, how strong do you want your checksum to be? That is, how much
      of a guarantee do you want that you'll be able to detect a change in
      the data by a change in the checksum? MD5 will give you a really
      strong guarantee, hash() - whether stable/portable or not - will give
      you a reasonably weak guarantee since it's not built to be collision
      free.

      -- David

      davidbolen | Tues, 29 Apr 2008 19:12:00 GMT |

    3. [Beno=EEt Dejean]
      > hi. Is the hash() algorithm standard ? Does hash(some_string) will always
      > return the same hash code on every arch ?


      No, and in fact it's almost certain to deliver a different hash on a
      32-bit machine than on a 64-bit machine (Python hash codes are the
      same size as the native platform C "long" type). Python doesn't
      promise to deliver the same hash codes across releases either
      (although it usually does anyway).

      > i need to use a ~checksum function, like md5, but i was also thinking
      > about hash() which is obviously simpler. So i can safely rely on hash()
      > behaviour so i can use it to generate ~strong and portable
      > identifier/checksum ?


      It's not strong. It's easy to find distinct strings with the same
      Python hash; it's widely thought to be intractable to do the same wrt
      MD5 or SHA hashes.

      timpeters | Tues, 29 Apr 2008 19:13:00 GMT |

    4. Benot Dejean <bnet...ifrance.com> writes:
      > hi. Is the hash() algorithm standard ? Does hash(some_string) will always
      > return the same hash code on every arch ?


      I'd say you should not rely on that.

      > i need to use a ~checksum function, like md5, but i was also
      > thinking about hash() which is obviously simpler. So i can safely
      > rely on hash() behaviour so i can use it to generate ~strong and
      > portable identifier/checksum ?


      I don't know what you mean by "strong". I'm sure you can find collisions
      in hash() without much effort. It's much harder to do that for md5.

      paulrubin | Tues, 29 Apr 2008 19:14:00 GMT |

    5. On Wed, 21 Jul 2004, [iso-8859-1] Beno=EEt Dejean wrote:

      > i need to use a ~checksum function, like md5, but i was also thinking
      > about hash() which is obviously simpler.


      md5 is actually very easy to use on Python:

      '1f3870be274f6c49b3e31a0c6728957f'[color
      =darkred]
      41499123188802761002464065009245263231L

      This is a little more verbose than hash(), but it's just as
      straightforward, and can more easily be used with large messages (see the=
      =20
      =2Eupdate() method of the md5 object returned by new()).

      christopher_tking | Tues, 29 Apr 2008 19:15:00 GMT |

    6. Christopher T King <squirrel...WPI.EDU> writes:
      > 41499123188802761002464065009245263231L[
      /color]

      How'd you do that? You should need to say
      int(chksum.hexdigest(), 16)

      paulrubin | Tues, 29 Apr 2008 19:16:00 GMT |

    7. Le Wed, 21 Jul 2004 16:34:16 -0400, David Bolen a crit_:

      > Benot Dejean <bnet...ifrance.com> writes:
      > I don't believe it's changed since at least 1.5.2, but I'm also pretty
      > sure there are no guarantees that it will remain the same going forward.


      ok

      > Also, how strong do you want your checksum to be? That is, how much
      > of a guarantee do you want that you'll be able to detect a change in
      > the data by a change in the checksum? MD5 will give you a really
      > strong guarantee, hash() - whether stable/portable or not - will give
      > you a reasonably weak guarantee since it's not built to be collision
      > free.


      i know this. i've been using md5 for a long time, i was just wondering if
      ... thank you all.

      benot_dejean | Tues, 29 Apr 2008 19:17:00 GMT |