You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/09/04 01:44:46 UTC

[jira] [Assigned] (SPARK-6931) python: struct.pack('!q', value) in write_long(value, stream) in serializers.py require int(but doesn't raise exceptions in common cases)

     [ https://issues.apache.org/jira/browse/SPARK-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-6931:
-----------------------------------

    Assignee: Apache Spark

> python: struct.pack('!q', value) in write_long(value, stream) in serializers.py require int(but doesn't raise exceptions in common cases)
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-6931
>                 URL: https://issues.apache.org/jira/browse/SPARK-6931
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.3.0
>            Reporter: Chunxi Zhang
>            Assignee: Apache Spark
>            Priority: Critical
>              Labels: easyfix
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> when I map my own feature calculation module's function, sparks raises:
> Traceback (most recent call last):
>   File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/daemon.py", line 162, in manager
>     code = worker(sock)
>   File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/daemon.py", line 60, in worker
>     worker_main(infile, outfile)
>   File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py", line 115, in main
>     report_times(outfile, boot_time, init_time, finish_time)
>   File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py", line 40, in report_times
>     write_long(1000 * boot, outfile)
>   File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/serializers.py", line 518, in write_long
>     stream.write(struct.pack("!q", value))
> DeprecationWarning: integer argument expected, got float
> so I turn on the serializers.py, and tried to print the value out, which is a float, came from 1000 * time.time()
> when I removed my lib, or add a rdd.count() before mapping my lib, this bug won’t appear.
> so I edited the function to :
> def write_long(value, stream):
>     stream.write(struct.pack("!q", int(value))) # added a int(value)
> everything seem fine…
> According to python’s doc for struct(https://docs.python.org/2/library/struct.html)’s Note(3), the value should be a int(for q), and if it’s a float, it’ll try use __index__(), else, try __int__, but since __int__ is deprecated, it’ll raise DeprecationWarning. And float doesn’t have __index__, but has __int__, so it should raise the exception every time.
> But, as you can see, in normal cases, it won’t raise the exception, and the code works perfectly, and exec struct.pack('!q', 111.1) in console or a clean file won't raise any exception…I can hardly tell how my lib might effect a time.time()'s value passed to struct.pack()... it might a python's original bug or what.
> Anyway, this value should be a int, so add a int() to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org