You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cheng Lian (JIRA)" <ji...@apache.org> on 2016/06/03 01:14:59 UTC
[jira] [Resolved] (SPARK-15732) Dataset generated code "generated.java" Fails with Certain Case Classes

     [ https://issues.apache.org/jira/browse/SPARK-15732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheng Lian resolved SPARK-15732.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0

Resolved by https://github.com/apache/spark/pull/13485

> Dataset generated code "generated.java" Fails with Certain Case Classes
> -----------------------------------------------------------------------
>
>                 Key: SPARK-15732
>                 URL: https://issues.apache.org/jira/browse/SPARK-15732
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>         Environment: Version 2.0 Preview on the Databricks Community Edition
>            Reporter: Sanjay Dasgupta
>            Assignee: Wenchen Fan
>            Priority: Critical
>             Fix For: 2.0.0
>
>
> The Dataset code generation logic fails to handle field-names in case classes that are also Java keywords (e.g. "abstract"). Scala has an escaping mechanism (using backquotes) that allows Java (and Scala) keywords to be used as names in programs, as in the example below:
> case class PatApp(number: Int, title: String, `abstract`: String)
> But this case class trips up the Dataset code generator. The following error message is displayed when Datasets containing instances of such case classes are processed.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 54.0 failed 1 times, most recent failure: Lost task 2.0 in stage 54.0 (TID 1304, localhost): java.lang.RuntimeException: Error while encoding: java.util.concurrent.ExecutionException: java.lang.Exception: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 60, Column 84: Unexpected selector 'abstract' after "."
> The following code can be used to replicate the problem. This code was run on the Databricks CE, in a Scala notebook, in 3 separate cells as shown below:
> // CELL 1:
> //
> // Create a Case Class with "abstract" as a field-name ...
> //
> package keywordissue
> // The field-name abstract is a Java keyword ...
> case class PatApp(number: Int, title: String, `abstract`: String)
> // CELL 2:
> //
> // Create a Dataset using the case class ...
> //
> import keywordissue.PatApp
> val applications = List(PatApp(1001, "1001", "Abstract 1001"), PatApp(1002, "1002", "Abstract 1002"), PatApp(1003, "1003", "Abstract for 1003"), PatApp(/* Duplicate! */ 1003, "1004", "Abstract 1004"))
> val appsDataset = sc.parallelize(applications).toDF.as[PatApp]
> // CELL 3:
> //
> // Force Dataset code-generation. This causes the error message to display ...
> //
> val duplicates = appsDataset.groupByKey(_.number).mapGroups((k, i) => (k, i.length)).filter(_._2 > 0)
> duplicates.collect().foreach(println)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org