You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Paul Tremblay <pa...@gmail.com> on 2017/04/01 19:54:17 UTC

bug with PYTHONHASHSEED

When I try to to do a groupByKey() in my spark environment, I get the error
described here:

http://stackoverflow.com/questions/36798833/what-does-
exception-randomness-of-hash-of-string-should-be-disabled-via-pythonh

In order to attempt to fix the problem, I set up my ipython environment
with the additional line:

PYTHONHASHSEED=1

When I fire up my ipython shell, and do:

In [7]: hash("foo")
Out[7]: -2457967226571033580

In [8]: hash("foo")
Out[8]: -2457967226571033580

So my hash function is now seeded so it returns consistent values. But when
I do a groupByKey(), I get the same error:


Exception: Randomness of hash of string should be disabled via
PYTHONHASHSEED

Anyone know how to fix this problem in python 3.4?

Thanks

Henry

-- 
Paul Henry Tremblay
Robert Half Technology

Re: bug with PYTHONHASHSEED

Posted by Paul Tremblay <pa...@gmail.com>.
I saw the bug fix. I am using the latest Spark available on AWS EMR which I
think is 2.01. I am at work and can't check my home config. I don't think
AWS merged in this fix.

Henry

On Tue, Apr 4, 2017 at 4:42 PM, Jeff Zhang <zj...@gmail.com> wrote:

>
> It is fixed in https://issues.apache.org/jira/browse/SPARK-13330
>
>
>
> Holden Karau <ho...@pigscanfly.ca>于2017年4月5日周三 上午12:03写道:
>
>> Which version of Spark is this (or is it a dev build)? We've recently
>> made some improvements with PYTHONHASHSEED propagation.
>>
>> On Tue, Apr 4, 2017 at 7:49 AM Eike von Seggern <eike.seggern@seven
>> cal.com> wrote:
>>
>> 2017-04-01 21:54 GMT+02:00 Paul Tremblay <pa...@gmail.com>:
>>
>> When I try to to do a groupByKey() in my spark environment, I get the
>> error described here:
>>
>> http://stackoverflow.com/questions/36798833/what-does-
>> exception-randomness-of-hash-of-string-should-be-disabled-via-pythonh
>>
>> In order to attempt to fix the problem, I set up my ipython environment
>> with the additional line:
>>
>> PYTHONHASHSEED=1
>>
>> When I fire up my ipython shell, and do:
>>
>> In [7]: hash("foo")
>> Out[7]: -2457967226571033580
>>
>> In [8]: hash("foo")
>> Out[8]: -2457967226571033580
>>
>> So my hash function is now seeded so it returns consistent values. But
>> when I do a groupByKey(), I get the same error:
>>
>>
>> Exception: Randomness of hash of string should be disabled via
>> PYTHONHASHSEED
>>
>> Anyone know how to fix this problem in python 3.4?
>>
>>
>> Independent of the python version, you have to ensure that Python on
>> spark-master and -workers is started with PYTHONHASHSEED set, e.g. by
>> adding it to the environment of the spark processes.
>>
>> Best
>>
>> Eike
>>
>> --
>> Cell : 425-233-8271 <(425)%20233-8271>
>> Twitter: https://twitter.com/holdenkarau
>>
>


-- 
Paul Henry Tremblay
Robert Half Technology

Re: bug with PYTHONHASHSEED

Posted by Jeff Zhang <zj...@gmail.com>.
It is fixed in https://issues.apache.org/jira/browse/SPARK-13330



Holden Karau <ho...@pigscanfly.ca>于2017年4月5日周三 上午12:03写道:

> Which version of Spark is this (or is it a dev build)? We've recently made
> some improvements with PYTHONHASHSEED propagation.
>
> On Tue, Apr 4, 2017 at 7:49 AM Eike von Seggern <eike.seggern@seven
> cal.com> wrote:
>
> 2017-04-01 21:54 GMT+02:00 Paul Tremblay <pa...@gmail.com>:
>
> When I try to to do a groupByKey() in my spark environment, I get the
> error described here:
>
>
> http://stackoverflow.com/questions/36798833/what-does-exception-randomness-of-hash-of-string-should-be-disabled-via-pythonh
>
> In order to attempt to fix the problem, I set up my ipython environment
> with the additional line:
>
> PYTHONHASHSEED=1
>
> When I fire up my ipython shell, and do:
>
> In [7]: hash("foo")
> Out[7]: -2457967226571033580
>
> In [8]: hash("foo")
> Out[8]: -2457967226571033580
>
> So my hash function is now seeded so it returns consistent values. But
> when I do a groupByKey(), I get the same error:
>
>
> Exception: Randomness of hash of string should be disabled via
> PYTHONHASHSEED
>
> Anyone know how to fix this problem in python 3.4?
>
>
> Independent of the python version, you have to ensure that Python on
> spark-master and -workers is started with PYTHONHASHSEED set, e.g. by
> adding it to the environment of the spark processes.
>
> Best
>
> Eike
>
> --
> Cell : 425-233-8271 <(425)%20233-8271>
> Twitter: https://twitter.com/holdenkarau
>

Re: bug with PYTHONHASHSEED

Posted by Holden Karau <ho...@pigscanfly.ca>.
Which version of Spark is this (or is it a dev build)? We've recently made
some improvements with PYTHONHASHSEED propagation.

On Tue, Apr 4, 2017 at 7:49 AM Eike von Seggern <eike.seggern@seven cal.com>
wrote:

2017-04-01 21:54 GMT+02:00 Paul Tremblay <pa...@gmail.com>:

When I try to to do a groupByKey() in my spark environment, I get the error
described here:

http://stackoverflow.com/questions/36798833/what-does-exception-randomness-of-hash-of-string-should-be-disabled-via-pythonh

In order to attempt to fix the problem, I set up my ipython environment
with the additional line:

PYTHONHASHSEED=1

When I fire up my ipython shell, and do:

In [7]: hash("foo")
Out[7]: -2457967226571033580

In [8]: hash("foo")
Out[8]: -2457967226571033580

So my hash function is now seeded so it returns consistent values. But when
I do a groupByKey(), I get the same error:


Exception: Randomness of hash of string should be disabled via
PYTHONHASHSEED

Anyone know how to fix this problem in python 3.4?


Independent of the python version, you have to ensure that Python on
spark-master and -workers is started with PYTHONHASHSEED set, e.g. by
adding it to the environment of the spark processes.

Best

Eike

-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: bug with PYTHONHASHSEED

Posted by Paul Tremblay <pa...@gmail.com>.
So that means I have to pass that bash variable to the EMR clusters when I
spin them up, not afterwards. I'll give that a go.

Thanks!

Henry

On Tue, Apr 4, 2017 at 7:49 AM, Eike von Seggern <ei...@sevenval.com>
wrote:

> 2017-04-01 21:54 GMT+02:00 Paul Tremblay <pa...@gmail.com>:
>
>> When I try to to do a groupByKey() in my spark environment, I get the
>> error described here:
>>
>> http://stackoverflow.com/questions/36798833/what-does-except
>> ion-randomness-of-hash-of-string-should-be-disabled-via-pythonh
>>
>> In order to attempt to fix the problem, I set up my ipython environment
>> with the additional line:
>>
>> PYTHONHASHSEED=1
>>
>> When I fire up my ipython shell, and do:
>>
>> In [7]: hash("foo")
>> Out[7]: -2457967226571033580
>>
>> In [8]: hash("foo")
>> Out[8]: -2457967226571033580
>>
>> So my hash function is now seeded so it returns consistent values. But
>> when I do a groupByKey(), I get the same error:
>>
>>
>> Exception: Randomness of hash of string should be disabled via
>> PYTHONHASHSEED
>>
>> Anyone know how to fix this problem in python 3.4?
>>
>
> Independent of the python version, you have to ensure that Python on
> spark-master and -workers is started with PYTHONHASHSEED set, e.g. by
> adding it to the environment of the spark processes.
>
> Best
>
> Eike
>



-- 
Paul Henry Tremblay
Robert Half Technology

Re: bug with PYTHONHASHSEED

Posted by Eike von Seggern <ei...@sevenval.com>.
2017-04-01 21:54 GMT+02:00 Paul Tremblay <pa...@gmail.com>:

> When I try to to do a groupByKey() in my spark environment, I get the
> error described here:
>
> http://stackoverflow.com/questions/36798833/what-does-except
> ion-randomness-of-hash-of-string-should-be-disabled-via-pythonh
>
> In order to attempt to fix the problem, I set up my ipython environment
> with the additional line:
>
> PYTHONHASHSEED=1
>
> When I fire up my ipython shell, and do:
>
> In [7]: hash("foo")
> Out[7]: -2457967226571033580
>
> In [8]: hash("foo")
> Out[8]: -2457967226571033580
>
> So my hash function is now seeded so it returns consistent values. But
> when I do a groupByKey(), I get the same error:
>
>
> Exception: Randomness of hash of string should be disabled via
> PYTHONHASHSEED
>
> Anyone know how to fix this problem in python 3.4?
>

Independent of the python version, you have to ensure that Python on
spark-master and -workers is started with PYTHONHASHSEED set, e.g. by
adding it to the environment of the spark processes.

Best

Eike