You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Frederick Reiss (JIRA)" <ji...@apache.org> on 2015/05/05 03:34:05 UTC

[jira] [Commented] (SPARK-6649) DataFrame created through SQLContext.jdbc() failed if columns table must be quoted

    [ https://issues.apache.org/jira/browse/SPARK-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527738#comment-14527738 ] 

Frederick Reiss commented on SPARK-6649:
----------------------------------------

I was able to reproduce this problem on Spark 1.3.1. The Spark SQL lexical analyzer treats anything enclosed in double quotes as a string. The Spark SQL parser only allows strings to be treated as literals, not identifiers.

On the spark-sql command line:
{noformat}
spark-sql> select "hello" as hello, 'world' as world;
hello	world
Time taken: 0.125 seconds, Fetched 1 row(s)

spark-sql> select "hello" as "hello", 'world' as "world";
15/05/04 18:03:05 ERROR SparkSQLDriver: Failed in [select "hello" as "hello", 'world' as "world"]
org.apache.spark.sql.AnalysisException: cannot recognize input near 'as' '"hello"' ',' in selection target; line 1 pos 18
	at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:254)
        [many lines of stack trace]
{noformat}

The same thing happens from the Spark Scala shell:
{noformat}
scala> val df = sqlContext.sql("select \"hello\" as hello, 'world' as world")
df: org.apache.spark.sql.DataFrame = [hello: string, world: string]

scala> val df2 = sqlContext.sql("select \"hello\" as \"hello\", 'world' as \"world\"")
org.apache.spark.sql.AnalysisException: cannot recognize input near 'as' '"hello"' ',' in selection target; line 1 pos 18
	at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:254)
	at org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41)
        [many lines of stack trace]
{noformat}

This behavior is not consistent with the SQL standard, though I suppose it is somewhat consistent with MySQL's default behavior.

According to the grammar in the SQL-92 document ([http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt], strings should be delimited by single quotes:
{noformat}
<character string literal> ::=
              [ <introducer><character set specification> ]
              <quote> [ <character representation>... ] <quote>
                [ { <separator>... <quote> [ <character representation>... ] <quote> }... ]
...
<national character string literal> ::=
              N <quote> [ <character representation>... ] <quote>
                [ { <separator>... <quote> [ <character representation>... ] <quote> }... ]

<bit string literal> ::=
              B <quote> [ <bit>... ] <quote>
                [ { <separator>... <quote> [ <bit>... ] <quote> }... ]

<hex string literal> ::=
              X <quote> [ <hexit>... ] <quote>
                [ { <separator>... <quote> [ <hexit>... ] <quote> }... ]
...
<quote> ::= '
{noformat}
and identifiers *may* be delimited with double quotes:
{noformat}
<delimited identifier> ::=
              <double quote> <delimited identifier body> <double quote>
...
<double quote> ::= "
{noformat}

Thoughts? Are there any pull requests in flight that fix this problem already?

> DataFrame created through SQLContext.jdbc() failed if columns table must be quoted
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-6649
>                 URL: https://issues.apache.org/jira/browse/SPARK-6649
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: Frédéric Blanc
>            Priority: Minor
>
> If I want to import the content a table from oracle, that contains a column with name COMMENT (a reserved keyword), I cannot use a DataFrame that map all the columns of this table.
> {code:title=ddl.sql|borderStyle=solid}
> CREATE TABLE TEST_TABLE (
>     "COMMENT" VARCHAR2(10)
> );
> {code}
> {code:title=test.java|borderStyle=solid}
> SQLContext sqlContext = ...
> DataFrame df = sqlContext.jdbc(databaseURL, "TEST_TABLE");
> df.rdd();   // => failed if the table contains a column with a reserved keyword
> {code}
> The same problem can be encounter if reserved keyword are used on table name.
> The JDBCRDD scala class could be improved, if the columnList initializer append the double-quote for each column. (line : 225)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org