You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jean Georges Perrin (JIRA)" <ji...@apache.org> on 2018/11/22 13:16:00 UTC
[jira] [Updated] (SPARK-26146) CSV wouln't be ingested in Spark
2.4.0 with Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-26146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jean Georges Perrin updated SPARK-26146:
----------------------------------------
Description:
Ingestion of a CSV file seems to fail with Spark v2.4.0 and Scala v2.12, where it works ok with Scala v2.11.
When running a simple CSV ingestion like:{{ }}
{code:java}
// Creates a session on a local master
SparkSession spark = SparkSession.builder()
.appName("CSV to Dataset")
.master("local")
.getOrCreate();
// Reads a CSV file with header, called books.csv, stores it in a dataframe
Dataset<Row> df = spark.read().format("csv")
.option("header", "true")
.load("data/books.csv");
{code}
With Scala 2.12, I get:
{code:java}
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10582
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:44)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:58)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
...
at net.jgp.books.sparkWithJava.ch01.CsvToDataframeApp.start(CsvToDataframeApp.java:37)
at net.jgp.books.sparkWithJava.ch01.CsvToDataframeApp.main(CsvToDataframeApp.java:21)
{code}
Where it works pretty smoothly if I switch back to 2.11.
Full example available at [https://github.com/jgperrin/net.jgp.books.sparkWithJava.ch01.] You can modify pom.xml to change easily the Scala version in the property section:
{code:java}
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<java.version>1.8</java.version>
<scala.version>2.11</scala.version>
<spark.version>2.4.0</spark.version>
</properties>{code}
(ps. It's my first bug submission, so I hope I did not mess too much, be tolerant if I did)
was:
When running a simple CSV ingestion like:{{ }}
{code:java}
// Creates a session on a local master
SparkSession spark = SparkSession.builder()
.appName("CSV to Dataset")
.master("local")
.getOrCreate();
// Reads a CSV file with header, called books.csv, stores it in a dataframe
Dataset<Row> df = spark.read().format("csv")
.option("header", "true")
.load("data/books.csv");
{code}
With Scala 2.12, I get:
{code:java}
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10582
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)
at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:44)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58)
at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:58)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
...
at net.jgp.books.sparkWithJava.ch01.CsvToDataframeApp.start(CsvToDataframeApp.java:37)
at net.jgp.books.sparkWithJava.ch01.CsvToDataframeApp.main(CsvToDataframeApp.java:21)
{code}
Where it works pretty smoothly if I switch back to 2.11.
Full example available at [https://github.com/jgperrin/net.jgp.books.sparkWithJava.ch01.] You can modify pom.xml to change easily the Scala version in the property section:
{code:java}
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<java.version>1.8</java.version>
<scala.version>2.11</scala.version>
<spark.version>2.4.0</spark.version>
</properties>{code}
(ps. It's my first bug submission, so I hope I did not mess too much, be tolerant if I did)
> CSV wouln't be ingested in Spark 2.4.0 with Scala 2.12
> ------------------------------------------------------
>
> Key: SPARK-26146
> URL: https://issues.apache.org/jira/browse/SPARK-26146
> Project: Spark
> Issue Type: Bug
> Components: Input/Output
> Affects Versions: 2.4.0
> Reporter: Jean Georges Perrin
> Priority: Major
>
> Ingestion of a CSV file seems to fail with Spark v2.4.0 and Scala v2.12, where it works ok with Scala v2.11.
> When running a simple CSV ingestion like:{{ }}
> {code:java}
> // Creates a session on a local master
> SparkSession spark = SparkSession.builder()
> .appName("CSV to Dataset")
> .master("local")
> .getOrCreate();
> // Reads a CSV file with header, called books.csv, stores it in a dataframe
> Dataset<Row> df = spark.read().format("csv")
> .option("header", "true")
> .load("data/books.csv");
> {code}
> With Scala 2.12, I get:
> {code:java}
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10582
> at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)
> at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
> at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
> at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90)
> at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:44)
> at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58)
> at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:58)
> at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
> ...
> at net.jgp.books.sparkWithJava.ch01.CsvToDataframeApp.start(CsvToDataframeApp.java:37)
> at net.jgp.books.sparkWithJava.ch01.CsvToDataframeApp.main(CsvToDataframeApp.java:21)
> {code}
> Where it works pretty smoothly if I switch back to 2.11.
> Full example available at [https://github.com/jgperrin/net.jgp.books.sparkWithJava.ch01.] You can modify pom.xml to change easily the Scala version in the property section:
> {code:java}
> <properties>
> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
> <java.version>1.8</java.version>
> <scala.version>2.11</scala.version>
> <spark.version>2.4.0</spark.version>
> </properties>{code}
>
> (ps. It's my first bug submission, so I hope I did not mess too much, be tolerant if I did)
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org