You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vamaral1 <gi...@git.apache.org> on 2018/06/26 17:48:30 UTC

[GitHub] spark issue #21397: [SPARK-24334] Fix race condition in ArrowPythonRunner ca...

Github user vamaral1 commented on the issue:

    https://github.com/apache/spark/pull/21397
  
    Thanks for the fix. I was having the memory leak issue described in [JIRA](https://issues.apache.org/jira/browse/SPARK-24334) when working with pandas udf's but was able to fix it after upgrading my Spark version to get the patch. However, now I'm getting an issue related with the serializer and I'm having trouble debugging and understanding the stack trace. Any ideas?
    
    ```
    INFO TaskSetManager: Lost task [...] org.apache.spark.api.python.PythonException (Traceback (most recent call last):
      File "/home/spark-current/python/lib/pyspark.zip/pyspark/worker.py", line 230, in main
        process()
      File "/home/spark-current/python/lib/pyspark.zip/pyspark/worker.py", line 225, in process
        serializer.dump_stream(func(split_index, iterator), outfile)
      File "/home/spark-current/python/lib/pyspark.zip/pyspark/serializers.py", line 260, in dump_stream
        for series in iterator:
      File "/home/spark-current/python/lib/pyspark.zip/pyspark/serializers.py", line 279, in load_stream
        for batch in reader:
      File "ipc.pxi", line 268, in __iter__
      File "ipc.pxi", line 284, in pyarrow.lib._RecordBatchReader.read_next_batch
      File "error.pxi", line 79, in pyarrow.lib.check_status
    pyarrow.lib.ArrowIOError: read length must be positive or -1
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org