You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Heuer (JIRA)" <ji...@apache.org> on 2018/10/08 06:17:00 UTC

[jira] [Comment Edited] (SPARK-25588) SchemaParseException: Can't redefine: list when reading from Parquet

    [ https://issues.apache.org/jira/browse/SPARK-25588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638226#comment-16638226 ] 

Michael Heuer edited comment on SPARK-25588 at 10/8/18 6:16 AM:
----------------------------------------------------------------

> Looking at the stack trace, it seems like we are using Spark RDD API to read something via the parquet lib with avro read support.

The unit test shows two code paths, one that writes using the Spark RDD API and reads using the Spark RDD API, that works fine, one that writes using the Spark SQL API and reads using the Spark RDD API, that one worked fine with 2.3.1 and now breaks with 2.4.0.

> Is it possible to provide some code that other people can also reproduce the bug locally?

Agreed, I'm still working on this, https://issues.apache.org/jira/browse/SPARK-25587 was an attempt at reproducing this issue that uncovered a different issue.

> BTW is it possible that ADAM has some problem with avro 1.8.x?

ADAM has had a dependency on Avro 1.8.x for a long time, rather there was a 1.8 vs 1.7 internal conflict present in Spark at runtime that caused trouble.

With Avro 1.8.1 and Parquet 1.8.x dependencies in ADAM, building against Spark 2.4.0 results in runtime error
{noformat}
*** RUN ABORTED ***
  java.lang.NoSuchFieldError: BROTLI
  at org.apache.parquet.hadoop.metadata.CompressionCodecName.<clinit>(CompressionCodecName.java:31){noformat}
With Avro 1.8.2 and Parquet 1.10.0 dependencies in ADAM, building against Spark 2.4.0, we run into this issue.

On further investigation, I'm thinking this may be a difference in how (Avro 1.8.2 + Parquet 1.10.0) determines schema compatibility vs. how (Avro 1.8.1 + Parquet 1.8.x) did, but I still haven't found a succinct test that demonstrates the issue.


was (Author: heuermh):
> Looking at the stack trace, it seems like we are using Spark RDD API to read something via the parquet lib with avro read support.

The unit test shows two code paths, one uses Spark RDD API, that works fine, one uses the Spark SQL API, that one worked fine with 2.3.1 and now breaks with 2.4.0.


> Is it possible to provide some code that other people can also reproduce the bug locally?

Agreed, I'm still working on this, https://issues.apache.org/jira/browse/SPARK-25587 was an attempt at reproducing this issue that uncovered a different issue.


> BTW is it possible that ADAM has some problem with avro 1.8.x?

ADAM has had a dependency on Avro 1.8.x for a long time, rather there was a 1.8 vs 1.7 internal conflict present in Spark at runtime that caused trouble.

With Avro 1.8.1 and Parquet 1.8.x dependencies in ADAM, building against Spark 2.4.0 results in runtime error
{noformat}
*** RUN ABORTED ***
  java.lang.NoSuchFieldError: BROTLI
  at org.apache.parquet.hadoop.metadata.CompressionCodecName.<clinit>(CompressionCodecName.java:31){noformat}
With Avro 1.8.2 and Parquet 1.10.0 dependencies in ADAM, building against Spark 2.4.0, we run into this issue.

> SchemaParseException: Can't redefine: list when reading from Parquet
> --------------------------------------------------------------------
>
>                 Key: SPARK-25588
>                 URL: https://issues.apache.org/jira/browse/SPARK-25588
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>         Environment: Spark version 2.4.0 (RC2).
> {noformat}
> $ spark-submit --version
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 2.4.0
>       /_/
> Using Scala version 2.11.12, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_181
> Branch
> Compiled by user  on 2018-09-27T14:50:10Z
> Revision
> Url
> Type --help for more information.
> {noformat}
>            Reporter: Michael Heuer
>            Priority: Major
>
> In ADAM, a library downstream of Spark, we use Avro to define a schema, generate Java classes from the Avro schema using the avro-maven-plugin, and generate Scala Products from the Avro schema using our own code generation library.
> In the code path demonstrated by the following unit test, we write out to Parquet and read back in using an RDD of Avro-generated Java classes and then write out to Parquet and read back in using a Dataset of Avro-generated Scala Products.
> {code:scala}
>   sparkTest("transform reads to variant rdd") {
>     val reads = sc.loadAlignments(testFile("small.sam"))
>     def checkSave(variants: VariantRDD) {
>       val tempPath = tmpLocation(".adam")
>       variants.saveAsParquet(tempPath)
>       assert(sc.loadVariants(tempPath).rdd.count === 20)
>     }
>     val variants: VariantRDD = reads.transmute[Variant, VariantProduct, VariantRDD](
>       (rdd: RDD[AlignmentRecord]) => {
>         rdd.map(AlignmentRecordRDDSuite.varFn)
>       })
>     checkSave(variants)
>     val sqlContext = SQLContext.getOrCreate(sc)
>     import sqlContext.implicits._
>     val variantsDs: VariantRDD = reads.transmuteDataset[Variant, VariantProduct, VariantRDD](
>       (ds: Dataset[AlignmentRecordProduct]) => {
>         ds.map(r => {
>           VariantProduct.fromAvro(
>             AlignmentRecordRDDSuite.varFn(r.toAvro))
>         })
>       })
>     checkSave(variantsDs)
> }
> {code}
> https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/test/scala/org/bdgenomics/adam/rdd/read/AlignmentRecordRDDSuite.scala#L1540
> Note the schema in Parquet are different:
> RDD code path
> {noformat}
> $ parquet-tools schema /var/folders/m6/4yqn_4q129lbth_dq3qzj_8h0000gn/T/TempSuite3400691035694870641.adam/part-r-00000.gz.parquet
> message org.bdgenomics.formats.avro.Variant {
>   optional binary contigName (UTF8);
>   optional int64 start;
>   optional int64 end;
>   required group names (LIST) {
>     repeated binary array (UTF8);
>   }
>   optional boolean splitFromMultiAllelic;
>   optional binary referenceAllele (UTF8);
>   optional binary alternateAllele (UTF8);
>   optional double quality;
>   optional boolean filtersApplied;
>   optional boolean filtersPassed;
>   required group filtersFailed (LIST) {
>     repeated binary array (UTF8);
>   }
>   optional group annotation {
>     optional binary ancestralAllele (UTF8);
>     optional int32 alleleCount;
>     optional int32 readDepth;
>     optional int32 forwardReadDepth;
>     optional int32 reverseReadDepth;
>     optional int32 referenceReadDepth;
>     optional int32 referenceForwardReadDepth;
>     optional int32 referenceReverseReadDepth;
>     optional float alleleFrequency;
>     optional binary cigar (UTF8);
>     optional boolean dbSnp;
>     optional boolean hapMap2;
>     optional boolean hapMap3;
>     optional boolean validated;
>     optional boolean thousandGenomes;
>     optional boolean somatic;
>     required group transcriptEffects (LIST) {
>       repeated group array {
>         optional binary alternateAllele (UTF8);
>         required group effects (LIST) {
>           repeated binary array (UTF8);
>         }
>         optional binary geneName (UTF8);
>         optional binary geneId (UTF8);
>         optional binary featureType (UTF8);
>         optional binary featureId (UTF8);
>         optional binary biotype (UTF8);
>         optional int32 rank;
>         optional int32 total;
>         optional binary genomicHgvs (UTF8);
>         optional binary transcriptHgvs (UTF8);
>         optional binary proteinHgvs (UTF8);
>         optional int32 cdnaPosition;
>         optional int32 cdnaLength;
>         optional int32 cdsPosition;
>         optional int32 cdsLength;
>         optional int32 proteinPosition;
>         optional int32 proteinLength;
>         optional int32 distance;
>         required group messages (LIST) {
>           repeated binary array (ENUM);
>         }
>       }
>     }
>     required group attributes (MAP) {
>       repeated group map (MAP_KEY_VALUE) {
>         required binary key (UTF8);
>         required binary value (UTF8);
>       }
>     }
>   }
> }
> {noformat}
> Dataset code path:
> {noformat}
> $ parquet-tools schema /var/folders/m6/4yqn_4q129lbth_dq3qzj_8h0000gn/T/TempSuite2879366708769671307.adam/part-00000-b123eb8b-2648-4648-8096-b3de95343141-c000.snappy.parquet
> message spark_schema {
>   optional binary contigName (UTF8);
>   optional int64 start;
>   optional int64 end;
>   optional group names (LIST) {
>     repeated group list {
>       optional binary element (UTF8);
>     }
>   }
>   optional boolean splitFromMultiAllelic;
>   optional binary referenceAllele (UTF8);
>   optional binary alternateAllele (UTF8);
>   optional double quality;
>   optional boolean filtersApplied;
>   optional boolean filtersPassed;
>   optional group filtersFailed (LIST) {
>     repeated group list {
>       optional binary element (UTF8);
>     }
>   }
>   optional group annotation {
>     optional binary ancestralAllele (UTF8);
>     optional int32 alleleCount;
>     optional int32 readDepth;
>     optional int32 forwardReadDepth;
>     optional int32 reverseReadDepth;
>     optional int32 referenceReadDepth;
>     optional int32 referenceForwardReadDepth;
>     optional int32 referenceReverseReadDepth;
>     optional float alleleFrequency;
>     optional binary cigar (UTF8);
>     optional boolean dbSnp;
>     optional boolean hapMap2;
>     optional boolean hapMap3;
>     optional boolean validated;
>     optional boolean thousandGenomes;
>     optional boolean somatic;
>     optional group transcriptEffects (LIST) {
>       repeated group list {
>         optional group element {
>           optional binary alternateAllele (UTF8);
>           optional group effects (LIST) {
>             repeated group list {
>               optional binary element (UTF8);
>             }
>           }
>           optional binary geneName (UTF8);
>           optional binary geneId (UTF8);
>           optional binary featureType (UTF8);
>           optional binary featureId (UTF8);
>           optional binary biotype (UTF8);
>           optional int32 rank;
>           optional int32 total;
>           optional binary genomicHgvs (UTF8);
>           optional binary transcriptHgvs (UTF8);
>           optional binary proteinHgvs (UTF8);
>           optional int32 cdnaPosition;
>           optional int32 cdnaLength;
>           optional int32 cdsPosition;
>           optional int32 cdsLength;
>           optional int32 proteinPosition;
>           optional int32 proteinLength;
>           optional int32 distance;
>           optional group messages (LIST) {
>             repeated group list {
>               optional binary element (UTF8);
>             }
>           }
>         }
>       }
>     }
>     optional group attributes (MAP) {
>       repeated group key_value {
>         required binary key (UTF8);
>         optional binary value (UTF8);
>       }
>     }
>   }
> }
> {noformat}
> With Spark 2.4.0 (RC2), and Parquet dependency version 1.10.0, the Dataset path now fails
> {noformat}
> $ mvn test
> ...
> - transform reads to variant rdd *** FAILED ***
>   org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 3, localhost, executor driver):
>  org.apache.avro.SchemaParseException: Can't redefine: list
> 	at org.apache.avro.Schema$Names.put(Schema.java:1128)
> 	at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562)
> 	at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:690)
> 	at org.apache.avro.Schema$ArraySchema.toJson(Schema.java:805)
> 	at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882)
> 	at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716)
> 	at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701)
> 	at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882)
> 	at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716)
> 	at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701)
> 	at org.apache.avro.Schema.toString(Schema.java:324)
> 	at org.apache.avro.SchemaCompatibility.checkReaderWriterCompatibility(SchemaCompatibility.java:68)
> 	at org.apache.parquet.avro.AvroRecordConverter.isElementType(AvroRecordConverter.java:866)
> 	at org.apache.parquet.avro.AvroIndexedRecordConverter$AvroArrayConverter.<init>(AvroIndexedRecordConverter.java:333)
> 	at org.apache.parquet.avro.AvroIndexedRecordConverter.newConverter(AvroIndexedRecordConverter.java:172)
> 	at org.apache.parquet.avro.AvroIndexedRecordConverter.<init>(AvroIndexedRecordConverter.java:94)
> 	at org.apache.parquet.avro.AvroIndexedRecordConverter.newConverter(AvroIndexedRecordConverter.java:168)
> 	at org.apache.parquet.avro.AvroIndexedRecordConverter.<init>(AvroIndexedRecordConverter.java:94)
> 	at org.apache.parquet.avro.AvroIndexedRecordConverter.<init>(AvroIndexedRecordConverter.java:66)
> 	at org.apache.parquet.avro.AvroCompatRecordMaterializer.<init>(AvroCompatRecordMaterializer.java:34)
> 	at org.apache.parquet.avro.AvroReadSupport.newCompatMaterializer(AvroReadSupport.java:144)
> 	at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:136)
> 	at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:204)
> 	at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182)
> 	at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
> 	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:199)
> 	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:196)
> 	at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:151)
> 	at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:70)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:121)
> 	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
> 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> 2018-09-29 21:39:47 ERROR TaskSetManager:70 - Task 0 in stage 3.0 failed 1 times; aborting job
>   at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1866)
>   at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1866)
>   at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
>   at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
>   at scala.Option.foreach(Option.scala:257)
>   at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
>   at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2100)
>   at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2049)
>   at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2038)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
>   at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
>   at org.apache.spark.rdd.RDD.count(RDD.scala:1168)
>   at org.bdgenomics.adam.rdd.read.AlignmentRecordRDDSuite$$anonfun$78.checkSave$6(AlignmentRecordRDDSuite.scala:1551)
>   at org.bdgenomics.adam.rdd.read.AlignmentRecordRDDSuite$$anonfun$78.apply$mcV$sp(AlignmentRecordRDDSuite.scala:1579)
>   at org.bdgenomics.utils.misc.SparkFunSuite$$anonfun$sparkTest$1.apply$mcV$sp(SparkFunSuite.scala:102)
>   at org.bdgenomics.utils.misc.SparkFunSuite$$anonfun$sparkTest$1.apply(SparkFunSuite.scala:98)
>   at org.bdgenomics.utils.misc.SparkFunSuite$$anonfun$sparkTest$1.apply(SparkFunSuite.scala:98)
>   at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
>   at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
>   at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.bdgenomics.adam.util.ADAMFunSuite.org$scalatest$BeforeAndAfter$$super$runTest(ADAMFunSuite.scala:24)
>   at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
>   at org.bdgenomics.adam.util.ADAMFunSuite.runTest(ADAMFunSuite.scala:24)
>   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at org.bdgenomics.adam.util.ADAMFunSuite.org$scalatest$BeforeAndAfter$$super$run(ADAMFunSuite.scala:24)
>   at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
>   at org.bdgenomics.adam.util.ADAMFunSuite.run(ADAMFunSuite.scala:24)
>   at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
>   at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
>   at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
>   at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
>   at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
>   at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
>   at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
>   at org.scalatest.tools.Runner$.main(Runner.scala:860)
>   at org.scalatest.tools.Runner.main(Runner.scala)
>   Cause: org.apache.avro.SchemaParseException: Can't redefine: list
>   at org.apache.avro.Schema$Names.put(Schema.java:1128)
>   at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562)
>   at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:690)
>   at org.apache.avro.Schema$ArraySchema.toJson(Schema.java:805)
>   at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882)
>   at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716)
>   at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701)
>   at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882)
>   at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716)
>   at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701)
>   at org.apache.avro.Schema.toString(Schema.java:324)
>   at org.apache.avro.SchemaCompatibility.checkReaderWriterCompatibility(SchemaCompatibility.java:68)
>   at org.apache.parquet.avro.AvroRecordConverter.isElementType(AvroRecordConverter.java:866)
>   at org.apache.parquet.avro.AvroIndexedRecordConverter$AvroArrayConverter.<init>(AvroIndexedRecordConverter.java:333)
>   at org.apache.parquet.avro.AvroIndexedRecordConverter.newConverter(AvroIndexedRecordConverter.java:172)
>   at org.apache.parquet.avro.AvroIndexedRecordConverter.<init>(AvroIndexedRecordConverter.java:94)
>   at org.apache.parquet.avro.AvroIndexedRecordConverter.newConverter(AvroIndexedRecordConverter.java:168)
>   at org.apache.parquet.avro.AvroIndexedRecordConverter.<init>(AvroIndexedRecordConverter.java:94)
>   at org.apache.parquet.avro.AvroIndexedRecordConverter.<init>(AvroIndexedRecordConverter.java:66)
>   at org.apache.parquet.avro.AvroCompatRecordMaterializer.<init>(AvroCompatRecordMaterializer.java:34)
>   at org.apache.parquet.avro.AvroReadSupport.newCompatMaterializer(AvroReadSupport.java:144)
>   at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:136)
>   at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:204)
>   at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182)
>   at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
>   at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:199)
>   at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:196)
>   at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:151)
>   at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:70)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Regression from Spark version 2.3.1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org