You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2014/12/29 20:21:13 UTC
[jira] [Commented] (SPARK-4968) [SparkSQL]
java.lang.UnsupportedOperationException when hive partition doesn't exist
and order by and limit are used
[ https://issues.apache.org/jira/browse/SPARK-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260343#comment-14260343 ]
Apache Spark commented on SPARK-4968:
-------------------------------------
User 'saucam' has created a pull request for this issue:
https://github.com/apache/spark/pull/3830
> [SparkSQL] java.lang.UnsupportedOperationException when hive partition doesn't exist and order by and limit are used
> --------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-4968
> URL: https://issues.apache.org/jira/browse/SPARK-4968
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.1.1
> Environment: Spark 1.1.1
> scala - 2.10.2
> hive metastore db - pgsql
> OS- Linux
> Reporter: Shekhar Bansal
> Fix For: 1.1.1, 1.1.2, 1.2.1
>
>
> Create table with partitions
> run query for partition which doesn't exist and contains order by and limit
> I am running queries in hiveContext
> 1. Create hive table
> create table if not exists testTable (ID1 BIGINT, ID2 BIGINT,Start_Time STRING, End_Time STRING) PARTITIONED BY (Region STRING,Market STRING)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> LINES TERMINATED BY '\n'
> STORED AS TEXTFILE;
> 2. Create data
> 1,2,"2014-11-01","2014-11-02"
> 2,3,"2014-11-01","2014-11-02"
> 3,4,"2014-11-01","2014-11-02"
> 3. Load data in hive
> LOAD DATA LOCAL INPATH '/tmp/input.txt' OVERWRITE INTO TABLE testTable PARTITION (Region="North", market='market1');
> 4. run query
> SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100;
> Error trace
> java.lang.UnsupportedOperationException: empty collection
> at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
> at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.reduce(RDD.scala:863)
> at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136)
> at org.apache.spark.sql.execution.TakeOrdered.executeCollect(basicOperators.scala:171)
> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org