You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/04/18 12:35:55 UTC
[GitHub] [spark] gaborgsomogyi commented on issue #24403: [SPARK-23014][SS] Fully remove V1 memory sink.

gaborgsomogyi commented on issue #24403: [SPARK-23014][SS] Fully remove V1 memory sink.
URL: https://github.com/apache/spark/pull/24403#issuecomment-484487619
 
 
   Just for the sake of understanding why python adaptation needed.
   With V1 memory sink the following exception arrived:
   ```
   Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
     File "/Users/gaborsomogyi/spark/python/lib/pyspark.zip/pyspark/worker.py", line 428, in main
       process()
     File "/Users/gaborsomogyi/spark/python/lib/pyspark.zip/pyspark/worker.py", line 423, in process
       serializer.dump_stream(func(split_index, iterator), outfile)
     File "/Users/gaborsomogyi/spark/python/pyspark/serializers.py", line 457, in dump_stream
       self.serializer.dump_stream(self._batched(iterator), stream)
     File "/Users/gaborsomogyi/spark/python/pyspark/serializers.py", line 141, in dump_stream
       for obj in iterator:
     File "/Users/gaborsomogyi/spark/python/pyspark/serializers.py", line 446, in _batched
       for item in iterator:
     File "<string>", line 1, in <lambda>
     File "/Users/gaborsomogyi/spark/python/lib/pyspark.zip/pyspark/worker.py", line 86, in <lambda>
       return lambda *a: f(*a)
     File "/Users/gaborsomogyi/spark/python/pyspark/util.py", line 99, in wrapper
       return f(*args, **kwargs)
     File "/Users/gaborsomogyi/spark/python/pyspark/sql/tests/test_streaming.py", line 217, in <lambda>
       bad_udf = udf(lambda x: 1 / 0)
   ZeroDivisionError: integer division or modulo by zero
   
       at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:453)
       at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:81)
       at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:64)
       at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406)
       at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
       at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
       at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
       at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(generated.java:25)
       at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
       at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:748)
       at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:255)
       at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:852)
       at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:852)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:291)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:121)
       at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:425)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1349)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:428)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   
   Driver stacktrace:
   === Streaming Query ===
   Identifier: this_query [id = fcda74c8-05e0-4c74-ac45-3444f3e50c63, runId = 7534a374-5d15-4986-bae1-8fa0664c647b]
   Current Committed Offsets: {}
   Current Available Offsets: {FileStreamSource[file:/Users/gaborsomogyi/spark/python/test_support/sql/streaming]: {"logOffset":0}}
   
   Current State: ACTIVE
   Thread State: RUNNABLE
   
   Logical Plan:
   Project [<lambda>(value#4) AS <lambda>(value)#17]
   +- StreamingExecutionRelation FileStreamSource[file:/Users/gaborsomogyi/spark/python/test_support/sql/streaming], [value#4]
   ```
   With V2 memory sink the following exception arrived (here additional exceptions are linked to the top level one, for instance `ZeroDivisionError`):
   ```
   Writing job aborted.
   === Streaming Query ===
   Identifier: this_query [id = d37ca2ea-edb7-4670-b4cc-75b0d704496e, runId = 587d62da-8355-4f66-a9d8-708c154ef953]
   Current Committed Offsets: {}
   Current Available Offsets: {FileStreamSource[file:/Users/gaborsomogyi/spark/python/test_support/sql/streaming]: {"logOffset":0}}
   
   Current State: ACTIVE
   Thread State: RUNNABLE
   
   Logical Plan:
   WriteToMicroBatchDataSource org.apache.spark.sql.execution.streaming.sources.MemoryStreamingWrite@1a3691ef
   +- Project [<lambda>(value#18) AS <lambda>(value)#32]
      +- StreamingExecutionRelation FileStreamSource[file:/Users/gaborsomogyi/spark/python/test_support/sql/streaming], [value#18]
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org