You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Heuer (JIRA)" <ji...@apache.org> on 2019/03/02 20:47:00 UTC

[jira] [Commented] (SPARK-26146) CSV wouln't be ingested in Spark 2.4.0 with Scala 2.12

    [ https://issues.apache.org/jira/browse/SPARK-26146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782519#comment-16782519 ] 

Michael Heuer commented on SPARK-26146:
---------------------------------------

[~srowen] Hey, wait a minute, the full example from the original submitter appears to be in github here

[https://github.com/jgperrin/net.jgp.books.spark.ch01]

This example has no explicit dependency on paranamer, and neither does Disq for that matter 

[https://github.com/disq-bio/disq/blob/5760a80b3322c537226a876e0df4f7710188f7b2/pom.xml]

This is not the first instance I've seen (and reported) where Spark itself has dependencies that conflict or otherwise are not compatible with each other, and those do not manifest themselves at test scope in Spark CI, only when an external project has a compile scope dependency on Spark jars.

> CSV wouln't be ingested in Spark 2.4.0 with Scala 2.12
> ------------------------------------------------------
>
>                 Key: SPARK-26146
>                 URL: https://issues.apache.org/jira/browse/SPARK-26146
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 2.4.0
>            Reporter: Jean Georges Perrin
>            Priority: Major
>
> Ingestion of a CSV file seems to fail with Spark v2.4.0 and Scala v2.12, where it works ok with Scala v2.11.
> When running a simple CSV ingestion like:{{ }}
> {code:java}
>     // Creates a session on a local master
>     SparkSession spark = SparkSession.builder()
>         .appName("CSV to Dataset")
>         .master("local")
>         .getOrCreate();
>     // Reads a CSV file with header, called books.csv, stores it in a dataframe
>     Dataset<Row> df = spark.read().format("csv")
>         .option("header", "true")
>         .load("data/books.csv");
> {code}
>   With Scala 2.12, I get: 
> {code:java}
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10582
> at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)
> at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
> at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
> at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90)
> at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:44)
> at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58)
> at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:58)
> at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
> ...
> at net.jgp.books.sparkWithJava.ch01.CsvToDataframeApp.start(CsvToDataframeApp.java:37)
> at net.jgp.books.sparkWithJava.ch01.CsvToDataframeApp.main(CsvToDataframeApp.java:21)
> {code}
> Where it works pretty smoothly if I switch back to 2.11.
> Full example available at [https://github.com/jgperrin/net.jgp.books.sparkWithJava.ch01.] You can modify pom.xml to change easily the Scala version in the property section:
> {code:java}
> <properties>
>  <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
>  <java.version>1.8</java.version>
>  <scala.version>2.11</scala.version>
>  <spark.version>2.4.0</spark.version>
> </properties>{code}
>  
> (ps. It's my first bug submission, so I hope I did not mess too much, be tolerant if I did)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org