You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2022/01/09 02:46:00 UTC
[jira] [Commented] (SPARK-37821) spark thrift server RDD ID overflow lead sql execute failed

    [ https://issues.apache.org/jira/browse/SPARK-37821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471273#comment-17471273 ] 

Hyukjin Kwon commented on SPARK-37821:
--------------------------------------

Can we switch it to long?

> spark thrift server RDD ID overflow lead sql execute failed
> -----------------------------------------------------------
>
>                 Key: SPARK-37821
>                 URL: https://issues.apache.org/jira/browse/SPARK-37821
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.2.0
>            Reporter: muhong
>            Priority: Major
>
> this problem will happen in long run spark application，such as thrift server；
> as only one SparkContext instance in thrift server driver size，so if the concurrency of sql request is large or the sql is too complicate（this will create a lot of rdd）, the rdd will be generate too fast , the rdd id （SparkContext.scala#nextRddId:[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala] ）will be consume fast, after a few months the nextRddId will overflow。the newRddId will be negative number，but the rdd's block id need to be positive, so this will lead a exception"Failed to parse rdd_-2123452330_2 into block ID"（rdd block id formate“val RDD = "rdd_([0-9]{+})_([0-9]{+})".r”：[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockId.scala]），so can not exchange data during sql execution, and lead sql execute failed
> if rddId overflow , when rdd.MapPartition execute , error will occur, the error is occur on driver side, when driver deserialize block id from "block message" inputstream
> when executor invoke rdd.MapPartition, it will call block manager to report block status,  the the block id is negative，when the message send back to driver , the driver regex will failed match and throw an exception
>  
> how to fix the problem???
> SparkContext.scala
>  
> {code:java}
> ...
> ...
>  private val nextShuffleId = new AtomicInteger(0)
>   private[spark] def newShuffleId(): Int = nextShuffleId.getAndIncrement()
>   private var nextRddId = new AtomicInteger(0) // change happen
>   /** Register a new RDD, returning its RDD ID */  
> // change happen
> private[spark] def newRddId(): Int = {
>   var id = nextRddId.getAndIncrement()
>   if (id > 0) {
>     return id
>   }
>   this.synchronized {
>     id = nextRddId.getAndIncrement()
>    if (id < 0) {       
>       nextRddId = new AtomicInteger(0)
>       id = nextRddId.getAndIncrement()
>     }
>   }
>   id
> }
> ...
> ...{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org