You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2022/01/09 02:46:00 UTC
[jira] [Commented] (SPARK-37821) spark thrift server RDD ID overflow lead sql execute failed
[ https://issues.apache.org/jira/browse/SPARK-37821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471273#comment-17471273 ]
Hyukjin Kwon commented on SPARK-37821:
--------------------------------------
Can we switch it to long?
> spark thrift server RDD ID overflow lead sql execute failed
> -----------------------------------------------------------
>
> Key: SPARK-37821
> URL: https://issues.apache.org/jira/browse/SPARK-37821
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.2.0
> Reporter: muhong
> Priority: Major
>
> this problem will happen in long run spark application,such as thrift server;
> as only one SparkContext instance in thrift server driver size,so if the concurrency of sql request is large or the sql is too complicate(this will create a lot of rdd), the rdd will be generate too fast , the rdd id (SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala] )will be consume fast, after a few months the nextRddId will overflow。the newRddId will be negative number,but the rdd's block id need to be positive, so this will lead a exception"Failed to parse rdd_-2123452330_2 into block ID"(rdd block id formate“val RDD = "rdd_([0-9]{+})_([0-9]{+})".r”:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala]),so can not exchange data during sql execution, and lead sql execute failed
> if rddId overflow , when rdd.MapPartition execute , error will occur, the error is occur on driver side, when driver deserialize block id from "block message" inputstream
> when executor invoke rdd.MapPartition, it will call block manager to report block status, the the block id is negative,when the message send back to driver , the driver regex will failed match and throw an exception
>
> how to fix the problem???
> SparkContext.scala
>
> {code:java}
> ...
> ...
> private val nextShuffleId = new AtomicInteger(0)
> private[spark] def newShuffleId(): Int = nextShuffleId.getAndIncrement()
> private var nextRddId = new AtomicInteger(0) // change happen
> /** Register a new RDD, returning its RDD ID */
> // change happen
> private[spark] def newRddId(): Int = {
> var id = nextRddId.getAndIncrement()
> if (id > 0) {
> return id
> }
> this.synchronized {
> id = nextRddId.getAndIncrement()
> if (id < 0) {
> nextRddId = new AtomicInteger(0)
> id = nextRddId.getAndIncrement()
> }
> }
> id
> }
> ...
> ...{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org