You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2020/12/03 04:19:00 UTC
[jira] [Commented] (SPARK-33620) Task not started after filtering
[ https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242899#comment-17242899 ]
Hyukjin Kwon commented on SPARK-33620:
--------------------------------------
For a question, let's interact in the mailing list before filing it as an issue. See also https://spark.apache.org/community.html
> Task not started after filtering
> --------------------------------
>
> Key: SPARK-33620
> URL: https://issues.apache.org/jira/browse/SPARK-33620
> Project: Spark
> Issue Type: Question
> Components: Spark Core
> Affects Versions: 2.4.7
> Reporter: Vladislav Sterkhov
> Priority: Major
> Attachments: VlwWJ.png, mgg1s.png
>
>
> Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory used task starting and complete, but we need use unlimited stack. Please help
>
> !VlwWJ.png|width=644,height=150!
>
> !mgg1s.png|width=651,height=182!
>
> This my code:
> {{var filteredRDD = sparkContext.emptyRDD[String]
> for (path<- pathBuffer)
> { val someRDD = sparkContext.textFile(path) if (isValidRDD(someRDD)) filteredRDD = filteredRDD.++(someRDD.filter(row =>\{...}
> )
> }
> hiveService.insertRDD(filteredRDD.repartition(10), outTable)}}
>
> been other way. When i got StackOverflowError after many iteration spark
>
> {{java.lang.StackOverflowError
> at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2303)
> at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2596)
> at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2606)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)}}
> \{{}}
> \{{}}
> How i must build my code with repartitional and persist\coalesce for to nodes not crashes?
> I tried to rebuild the program in different ways, transferring repartitioning and saving in memory / disk inside the loop, installed a large number of partitions - 200.
> The program either hangs on the “repartition” stage or crashes into error code 143 (outOfMemory), throwing a stackOverflowError in a strange way.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org