You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Shekhar Bansal (JIRA)" <ji...@apache.org> on 2014/12/25 17:21:13 UTC

[jira] [Created] (SPARK-4968) [SparkSQL] java.lang.UnsupportedOperationException when hive partition doesn't exist and order by and limit are used

Shekhar Bansal created SPARK-4968:
-------------------------------------

             Summary: [SparkSQL] java.lang.UnsupportedOperationException when hive partition doesn't exist and order by and limit are used
                 Key: SPARK-4968
                 URL: https://issues.apache.org/jira/browse/SPARK-4968
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.1.1
         Environment: Spark 1.1.1
hive metastore db - pgsql
OS- Linux
            Reporter: Shekhar Bansal
             Fix For: 1.1.2, 1.2.1, 1.1.1


Create table with partitions
run query for partition which doesn't exist and contains order by and limit

I am running queries in hiveContext

1. Create hive table
create table if not exists testTable (ID1 BIGINT, ID2 BIGINT,Start_Time STRING, End_Time STRING) PARTITIONED BY (Region STRING,Market STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;


2. Create data
1,2,"2014-11-01","2014-11-02"
2,3,"2014-11-01","2014-11-02"
3,4,"2014-11-01","2014-11-02"

3. Load data in hive
LOAD DATA LOCAL INPATH '/tmp/input.txt' OVERWRITE INTO TABLE testTable PARTITION (Region="North", market='market1');

4. run query
SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100;


Error trace
java.lang.UnsupportedOperationException: empty collection
	at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
	at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.reduce(RDD.scala:863)
	at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136)
	at org.apache.spark.sql.execution.TakeOrdered.executeCollect(basicOperators.scala:171)
	at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org