You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sanjay Dasgupta (JIRA)" <ji...@apache.org> on 2016/06/02 14:59:59 UTC

[jira] [Created] (SPARK-15732) Dataset generated code "generated.java" Fails with Certain Case Classes

Sanjay Dasgupta created SPARK-15732:
---------------------------------------

             Summary: Dataset generated code "generated.java" Fails with Certain Case Classes
                 Key: SPARK-15732
                 URL: https://issues.apache.org/jira/browse/SPARK-15732
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.0
         Environment: Version 2.0 Preview on the Databricks Community Edition
            Reporter: Sanjay Dasgupta


The Dataset code generation logic fails to handle field-names in case classes that are also Java keywords (e.g. "abstract"). Scala has an escaping mechanism (using backquotes) that allows Java (and Scala) keywords to be used as names in programs, as in the example below:

case class PatApp(number: Int, title: String, `abstract`: String)

But this case class trips up the Dataset code generator. The following error message is displayed when Datasets containing instances of such case classes are processed.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 54.0 failed 1 times, most recent failure: Lost task 2.0 in stage 54.0 (TID 1304, localhost): java.lang.RuntimeException: Error while encoding: java.util.concurrent.ExecutionException: java.lang.Exception: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 60, Column 84: Unexpected selector 'abstract' after "."

The following code can be used to replicate the problem. This code was run on the Databricks CE, in a Scala notebook, in 3 separate cells as shown below:

// CELL 1:
//
// Create a Case Class with "abstract" as a field-name ...
//
package keywordissue
// The field-name abstract is a Java keyword ...
case class PatApp(number: Int, title: String, `abstract`: String)

// CELL 2:
//
// Create a Dataset using the case class ...
//
import keywordissue.PatApp

val applications = List(PatApp(1001, "1001", "Abstract 1001"), PatApp(1002, "1002", "Abstract 1002"), PatApp(1003, "1003", "Abstract for 1003"), PatApp(/* Duplicate! */ 1003, "1004", "Abstract 1004"))
val appsDataset = sc.parallelize(applications).toDF.as[PatApp]

// CELL 3:
//
// Force Dataset code-generation. This causes the error message to display ...
//
val duplicates = appsDataset.groupByKey(_.number).mapGroups((k, i) => (k, i.length)).filter(_._2 > 0)
duplicates.collect().foreach(println)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org