You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Jianshi Huang (JIRA)" <ji...@apache.org> on 2014/08/07 02:19:11 UTC

[jira] [Created] (SPARK-2890) Spark SQL should allow SELECT with duplicated columns

Jianshi Huang created SPARK-2890:
------------------------------------

             Summary: Spark SQL should allow SELECT with duplicated columns
                 Key: SPARK-2890
                 URL: https://issues.apache.org/jira/browse/SPARK-2890
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.1.0
            Reporter: Jianshi Huang


Spark reported error java.lang.IllegalArgumentException with messages:

java.lang.IllegalArgumentException: requirement failed: Found fields with the same name.
        at scala.Predef$.require(Predef.scala:233)
        at org.apache.spark.sql.catalyst.types.StructType.<init>(dataTypes.scala:317)
        at org.apache.spark.sql.catalyst.types.StructType$.fromAttributes(dataTypes.scala:310)
        at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToString(ParquetTypes.scala:306)
        at org.apache.spark.sql.parquet.ParquetTableScan.execute(ParquetTableOperations.scala:83)
        at org.apache.spark.sql.execution.Filter.execute(basicOperators.scala:57)
        at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:85)
        at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:433)

After trial and error, it seems it's caused by duplicated columns in my select clause.

I made the duplication on purpose for my code to parse correctly. I think we should allow users to specify duplicated columns as return value.

Jianshi



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org