Python: hash() yields different results for different platforms

  • paul_rubin / 200 / Fri, 27 Mar 2009 05:42:00 GMT / Comments (0)
  • "Qiangning Hong" <hongqn...gmail.com> writes:
    > However, when I come to Python's builtin hash() function, I found it
    > produces different values in my two computers! In a pentium4,
    > hash('a') -> -468864544; in a amd64, hash('a') -> 12416037344. Does
    > hash function depend on machine's word length?


    The hash function is unspecified and can depend on anything the
    implementers feel like. It may(?) even be permitted to differ from
    one run of the interpreter to another (I haven't checked the spec for
    this). Don't count on it being consistent from machine to machine.

    > If it does, I must consider another hash algorithm because the spider
    > will run concurrently in several computers, some are 32-bit, some are
    > 64-bit. Is md5 a good choice? Will it be too slow that I have no
    > performance gain than using the "url" column directly as the unique key?


    If you're going to accept the overhead of an SQL database you might as
    well enjoy the use of the abstraction it gives you, instead of trying
    to implement what amounts to your own form of indexing instead of
    letting the db take care of it. But md5(url) is certainly very fast
    compared with processing the outgoing http connection that you
    presumably plan to open for each url.

    > I will do some benchmarking to find it out.


    That's the right way to answer questions like this.
  • Keywords:

    hash, yields, different, results, platforms, python

  • http://programming.itags.org/python/32378/«« Last Thread - Next Thread »»