You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Justin Mays (Jira)" <ji...@apache.org> on 2020/10/09 19:39:00 UTC

[jira] [Created] (SPARK-33103) Custom Schema with Custom RDD reorders columns when more than 4 added

Justin Mays created SPARK-33103:
-----------------------------------

             Summary: Custom Schema with Custom RDD reorders columns when more than 4 added
                 Key: SPARK-33103
                 URL: https://issues.apache.org/jira/browse/SPARK-33103
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.1
         Environment: Java Application
            Reporter: Justin Mays


I have a custom RDD written in Java that uses a custom schema.  Everything appears to work fine with using 4 columns, but when i add a 5th column, calling show() fails with 

java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: java.lang.Long is not a valid external type for schema of

here is the schema definition in java:

StructType schema = new StructType() StructType schema = new StructType() .add("recordId", DataTypes.LongType, false) .add("col1", DataTypes.DoubleType, false) .add("col2", DataTypes.DoubleType, false) .add("col3", DataTypes.IntegerType, false) .add("col4", DataTypes.IntegerType, false);

 

Here is the printout of schema.printTreeString();

== Physical Plan ==
*(1) Scan dw [recordId#0L,col1#1,col2#2,col3#3,col4#4] PushedFilters: [], ReadSchema: struct<recordId:bigint,col1:double,col2:double,col3:int,col4:int>

 

I hardcoded a return in my Row object with values matching the schema:

@Override @Override public Object get(int i) \{ switch(i) { case 0: return 0L; case 1: return 1.1911950001644689D; case 2: return 9.100000949955666E9D; case 3: return 476; case 4: return 500; } return 0L; }

 

Here is the output of the show command:

15:30:26.875 ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 0.0 (TID 0)15:30:26.875 ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 0.0 (TID 0)java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: java.lang.Long is not a valid external type for schema of doublevalidateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, col1), DoubleType) AS col1#30validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, recordId), LongType) AS recordId#31Lvalidateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, col2), DoubleType) AS col2#32validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 3, col3), IntegerType) AS col3#33validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 4, col4), IntegerType) AS col4#34 at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:215) ~[spark-catalyst_2.12-3.0.1.jar:3.0.1] at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:197) ~[spark-catalyst_2.12-3.0.1.jar:3.0.1] at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) ~[scala-library-2.12.10.jar:?] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ~[?:?] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) ~[spark-sql_2.12-3.0.1.jar:3.0.1] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729) ~[spark-sql_2.12-3.0.1.jar:3.0.1] at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340) ~[spark-sql_2.12-3.0.1.jar:3.0.1] at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.scheduler.Task.run(Task.scala:127) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) ~[spark-core_2.12-3.0.1.jar:3.0.1] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) [spark-core_2.12-3.0.1.jar:3.0.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_265] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_265] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265]Caused by: java.lang.RuntimeException: java.lang.Long is not a valid external type for schema of double at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_0$(Unknown Source) ~[?:?] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) ~[?:?] at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:211) ~[spark-catalyst_2.12-3.0.1.jar:3.0.1] ... 19 more

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org