You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2020/06/08 08:22:00 UTC
[jira] [Commented] (SPARK-31916) StringConcat can overflow
`length`, leads to StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/SPARK-31916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128013#comment-17128013 ]
Apache Spark commented on SPARK-31916:
--------------------------------------
User 'dilipbiswal' has created a pull request for this issue:
https://github.com/apache/spark/pull/28750
> StringConcat can overflow `length`, leads to StringIndexOutOfBoundsException
> ----------------------------------------------------------------------------
>
> Key: SPARK-31916
> URL: https://issues.apache.org/jira/browse/SPARK-31916
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.4, 3.0.0
> Reporter: Jeffrey Stokes
> Priority: Major
>
> We have query plans that through multiple transformations can grow extremely long in length. These would eventually throw OutOfMemory exceptions (https://issues.apache.org/jira/browse/SPARK-26103 & related https://issues.apache.org/jira/browse/SPARK-25380).
>
> We backported the changes from [https://github.com/apache/spark/pull/23169] into our distribution of Spark, based on 2.4.4, and attempted to use the added `spark.sql.maxPlanStringLength`. While this works in some cases, large query plans can still lead to issues stemming from `StringConcat` in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala.
>
> The following unit test exhibits the issue, which continues to fail in the master branch of spark:
>
> {code:scala}
> test("StringConcat doesn't overflow on many inputs") {
> val concat = new StringConcat(maxLength = 100)
> 0.to(Integer.MAX_VALUE).foreach { _ =>
> concat.append("hello world")
> }
> assert(concat.toString.length === 100)
> }
> {code}
>
> Looking at the append method here: [https://github.com/apache/spark/blob/fc6af9d900ec6f6a1cbe8f987857a69e6ef600d1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala#L118-L128]
>
> It seems like regardless of whether the string to be append is added fully to the internal buffer, added as a substring to reach `maxLength`, or not added at all the internal `length` field is incremented by the length of `s`. Eventually this will overflow an int and cause L123 to substring with a negative index.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org