You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Paul Tremblay <pa...@gmail.com> on 2017/04/01 19:43:29 UTC
pyspark bug with PYTHONHASHSEED
When I try to to do a groupByKey() in my spark environment, I get the error
described here:
http://stackoverflow.com/questions/36798833/what-does-exception-randomness-of-hash-of-string-should-be-disabled-via-pythonh
In order to attempt to fix the problem, I set up my ipython environment
with the additional line:
PYTHONHASHSEED=1
When I fire up my ipython shell, and do:
In [7]: hash("foo")
Out[7]: -2457967226571033580
In [8]: hash("foo")
Out[8]: -2457967226571033580
So my hash function is now seeded so it returns consistent values. But when
I do a groupByKey(), I get the same error:
Exception: Randomness of hash of string should be disabled via
PYTHONHASHSEED
Anyone know how to fix this problem in python 3.4?
Thanks
Henry
--
Paul Henry Tremblay
Robert Half Technology