You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2021/03/03 12:49:00 UTC
[jira] [Commented] (SPARK-34607) NewInstance.resolved should not
throw malformed class name error
[ https://issues.apache.org/jira/browse/SPARK-34607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294509#comment-17294509 ]
Hyukjin Kwon commented on SPARK-34607:
--------------------------------------
cc [~kiszk], [~maropu] and [~viirya] in case you're interested in this.
> NewInstance.resolved should not throw malformed class name error
> ----------------------------------------------------------------
>
> Key: SPARK-34607
> URL: https://issues.apache.org/jira/browse/SPARK-34607
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.7, 3.0.2, 3.1.1
> Reporter: Kris Mok
> Priority: Major
>
> I'd like to seek for community help on fixing this issue:
> Related to SPARK-34596, one of our users had hit an issue with {{ExpressionEncoder}} when running Spark code in a Scala REPL, where {{NewInstance.resolved}} was throwing {{"Malformed class name"}} error, with the following kind of stack trace:
> {code}
> java.lang.InternalError: Malformed class name
> at java.lang.Class.getSimpleBinaryName(Class.java:1450)
> at java.lang.Class.isMemberClass(Class.java:1433)
> at org.apache.spark.sql.catalyst.expressions.objects.NewInstance.resolved$lzycompute(objects.scala:447)
> at org.apache.spark.sql.catalyst.expressions.objects.NewInstance.resolved(objects.scala:441)
> at org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpressionBottomUp(Analyzer.scala:1935)
> ...
> Caused by: sbt.ForkMain$ForkError: java.lang.StringIndexOutOfBoundsException: String index out of range: -83
> at java.lang.String.substring(String.java:1931)
> at java.lang.Class.getSimpleBinaryName(Class.java:1448)
> at java.lang.Class.isMemberClass(Class.java:1433)
> at org.apache.spark.sql.catalyst.expressions.objects.NewInstance.resolved$lzycompute(objects.scala:447)
> at org.apache.spark.sql.catalyst.expressions.objects.NewInstance.resolved(objects.scala:441)
> ...
> {code}
> The most important point in the stack trace is this:
> {code}
> java.lang.InternalError: Malformed class name
> at java.lang.Class.getSimpleBinaryName(Class.java:1450)
> at java.lang.Class.isMemberClass(Class.java:1433)
> {code}
> The most common way to hit the {{"Malformed class name"}} issue in Spark is via {{java.lang.Class.getSimpleName}}. But as this stack trace demonstrates, it can happen via other code paths from the JDK as well.
> If we want to fix it in a similar fashion as {{Utils.getSimpleName}}, we'd have to emulate {{java.lang.Class.isMemberClass}} in Spark's {{Utils}}, and then use it in the {{NewInstance.resolved}} code path.
> Here's a reproducer test case (in diff form against Spark master's {{ExpressionEncoderSuite}} ), which uses explicit nested classes to emulate the code structure that'd be generated by Scala's REPL:
> (the level of nesting can be further reduced and still reproduce the issue)
> {code}
> diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala
> index 2635264..fd1b23d 100644
> --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala
> +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala
> @@ -217,6 +217,95 @@ class ExpressionEncoderSuite extends CodegenInterpretedPlanTest with AnalysisTes
> "nested Scala class should work")
> }
>
> + object OuterLevelWithVeryVeryVeryLongClassName1 {
> + object OuterLevelWithVeryVeryVeryLongClassName2 {
> + object OuterLevelWithVeryVeryVeryLongClassName3 {
> + object OuterLevelWithVeryVeryVeryLongClassName4 {
> + object OuterLevelWithVeryVeryVeryLongClassName5 {
> + object OuterLevelWithVeryVeryVeryLongClassName6 {
> + object OuterLevelWithVeryVeryVeryLongClassName7 {
> + object OuterLevelWithVeryVeryVeryLongClassName8 {
> + object OuterLevelWithVeryVeryVeryLongClassName9 {
> + object OuterLevelWithVeryVeryVeryLongClassName10 {
> + object OuterLevelWithVeryVeryVeryLongClassName11 {
> + object OuterLevelWithVeryVeryVeryLongClassName12 {
> + object OuterLevelWithVeryVeryVeryLongClassName13 {
> + object OuterLevelWithVeryVeryVeryLongClassName14 {
> + object OuterLevelWithVeryVeryVeryLongClassName15 {
> + object OuterLevelWithVeryVeryVeryLongClassName16 {
> + object OuterLevelWithVeryVeryVeryLongClassName17 {
> + object OuterLevelWithVeryVeryVeryLongClassName18 {
> + object OuterLevelWithVeryVeryVeryLongClassName19 {
> + object OuterLevelWithVeryVeryVeryLongClassName20 {
> + case class MalformedNameExample2(x: Int)
> + }
> + }
> + }
> + }
> + }
> + }
> + }
> + }
> + }
> + }
> + }
> + }
> + }
> + }
> + }
> + }
> + }
> + }
> + }
> + }
> +
> + {
> + OuterScopes.addOuterScope(
> + OuterLevelWithVeryVeryVeryLongClassName1
> + .OuterLevelWithVeryVeryVeryLongClassName2
> + .OuterLevelWithVeryVeryVeryLongClassName3
> + .OuterLevelWithVeryVeryVeryLongClassName4
> + .OuterLevelWithVeryVeryVeryLongClassName5
> + .OuterLevelWithVeryVeryVeryLongClassName6
> + .OuterLevelWithVeryVeryVeryLongClassName7
> + .OuterLevelWithVeryVeryVeryLongClassName8
> + .OuterLevelWithVeryVeryVeryLongClassName9
> + .OuterLevelWithVeryVeryVeryLongClassName10
> + .OuterLevelWithVeryVeryVeryLongClassName11
> + .OuterLevelWithVeryVeryVeryLongClassName12
> + .OuterLevelWithVeryVeryVeryLongClassName13
> + .OuterLevelWithVeryVeryVeryLongClassName14
> + .OuterLevelWithVeryVeryVeryLongClassName15
> + .OuterLevelWithVeryVeryVeryLongClassName16
> + .OuterLevelWithVeryVeryVeryLongClassName17
> + .OuterLevelWithVeryVeryVeryLongClassName18
> + .OuterLevelWithVeryVeryVeryLongClassName19
> + .OuterLevelWithVeryVeryVeryLongClassName20)
> + encodeDecodeTest(
> + OuterLevelWithVeryVeryVeryLongClassName1
> + .OuterLevelWithVeryVeryVeryLongClassName2
> + .OuterLevelWithVeryVeryVeryLongClassName3
> + .OuterLevelWithVeryVeryVeryLongClassName4
> + .OuterLevelWithVeryVeryVeryLongClassName5
> + .OuterLevelWithVeryVeryVeryLongClassName6
> + .OuterLevelWithVeryVeryVeryLongClassName7
> + .OuterLevelWithVeryVeryVeryLongClassName8
> + .OuterLevelWithVeryVeryVeryLongClassName9
> + .OuterLevelWithVeryVeryVeryLongClassName10
> + .OuterLevelWithVeryVeryVeryLongClassName11
> + .OuterLevelWithVeryVeryVeryLongClassName12
> + .OuterLevelWithVeryVeryVeryLongClassName13
> + .OuterLevelWithVeryVeryVeryLongClassName14
> + .OuterLevelWithVeryVeryVeryLongClassName15
> + .OuterLevelWithVeryVeryVeryLongClassName16
> + .OuterLevelWithVeryVeryVeryLongClassName17
> + .OuterLevelWithVeryVeryVeryLongClassName18
> + .OuterLevelWithVeryVeryVeryLongClassName19
> + .OuterLevelWithVeryVeryVeryLongClassName20
> + .MalformedNameExample2(42),
> + "deeply nested Scala class should work")
> + }
> +
> productTest(PrimitiveData(1, 1, 1, 1, 1, 1, true))
>
> productTest(
> {code}
> The reason why the test case above is so convoluted is in the way Scala generates the class name for nested classes. In general, Scala generates a class name for a nested class by inserting the dollar-sign ( {{$}} ) in between each level of class nesting. The problem is that this format can concatenate into a very long string that goes beyond certain limits, so Scala will change the class name format beyond certain length threshold.
> For the example above, we can see that the first two levels of class nesting have class names that look like this:
> {code}
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassName1$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassName1$OuterLevelWithVeryVeryVeryLongClassName2$
> {code}
> If we leave out the fact that Scala uses a dollar-sign ( {{$}} ) suffix for the class name of the companion object, {{OuterLevelWithVeryVeryVeryLongClassName1}}'s full name is a prefix (substring) of {{OuterLevelWithVeryVeryVeryLongClassName2}}.
> But if we keep going deeper into the levels of nesting, you'll find names that look like:
> {code}
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$2a1321b953c615695d7442b2adb1$$$$ryVeryLongClassName8$OuterLevelWithVeryVeryVeryLongClassName9$OuterLevelWithVeryVeryVeryLongClassName10$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$2a1321b953c615695d7442b2adb1$$$$ryVeryLongClassName8$OuterLevelWithVeryVeryVeryLongClassName9$OuterLevelWithVeryVeryVeryLongClassName10$OuterLevelWithVeryVeryVeryLongClassName11$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$85f068777e7ecf112afcbe997d461b$$$$VeryLongClassName11$OuterLevelWithVeryVeryVeryLongClassName12$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$85f068777e7ecf112afcbe997d461b$$$$VeryLongClassName11$OuterLevelWithVeryVeryVeryLongClassName12$OuterLevelWithVeryVeryVeryLongClassName13$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$85f068777e7ecf112afcbe997d461b$$$$VeryLongClassName11$OuterLevelWithVeryVeryVeryLongClassName12$OuterLevelWithVeryVeryVeryLongClassName13$OuterLevelWithVeryVeryVeryLongClassName14$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$5f7ad51804cb1be53938ea804699fa$$$$VeryLongClassName14$OuterLevelWithVeryVeryVeryLongClassName15$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$5f7ad51804cb1be53938ea804699fa$$$$VeryLongClassName14$OuterLevelWithVeryVeryVeryLongClassName15$OuterLevelWithVeryVeryVeryLongClassName16$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$5f7ad51804cb1be53938ea804699fa$$$$VeryLongClassName14$OuterLevelWithVeryVeryVeryLongClassName15$OuterLevelWithVeryVeryVeryLongClassName16$OuterLevelWithVeryVeryVeryLongClassName17$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$69b54f16b1965a31e88968df1a58d8$$$$VeryLongClassName17$OuterLevelWithVeryVeryVeryLongClassName18$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$69b54f16b1965a31e88968df1a58d8$$$$VeryLongClassName17$OuterLevelWithVeryVeryVeryLongClassName18$OuterLevelWithVeryVeryVeryLongClassName19$
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$69b54f16b1965a31e88968df1a58d8$$$$VeryLongClassName17$OuterLevelWithVeryVeryVeryLongClassName18$OuterLevelWithVeryVeryVeryLongClassName19$OuterLevelWithVeryVeryVeryLongClassName20$
> {code}
> with a hash code in the middle and various levels of nesting omitted.
> The {{java.lang.Class.isMemberClass}} method is implemented in JDK8u as:
> http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/tip/src/share/classes/java/lang/Class.java#l1425
> {code:java}
> /**
> * Returns {@code true} if and only if the underlying class
> * is a member class.
> *
> * @return {@code true} if and only if this class is a member class.
> * @since 1.5
> */
> public boolean isMemberClass() {
> return getSimpleBinaryName() != null && !isLocalOrAnonymousClass();
> }
> /**
> * Returns the "simple binary name" of the underlying class, i.e.,
> * the binary name without the leading enclosing class name.
> * Returns {@code null} if the underlying class is a top level
> * class.
> */
> private String getSimpleBinaryName() {
> Class<?> enclosingClass = getEnclosingClass();
> if (enclosingClass == null) // top level class
> return null;
> // Otherwise, strip the enclosing class' name
> try {
> return getName().substring(enclosingClass.getName().length());
> } catch (IndexOutOfBoundsException ex) {
> throw new InternalError("Malformed class name", ex);
> }
> }
> {code}
> and the problematic code is {{getName().substring(enclosingClass.getName().length())}} -- if a class's enclosing class's full name is *longer* than the nested class's full name, this logic would end up going out of bounds.
> The bug has been fixed in JDK9 by https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8057919 , but still exists in the latest JDK8u release. So from the Spark side we'd need to do something to avoid hitting this problem.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org