You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Umesh Kacha (JIRA)" <ji...@apache.org> on 2016/01/07 19:54:39 UTC

[jira] [Created] (SPARK-12698) How to load specific Hive partition in DataFrame Spark 1.6?

Umesh Kacha created SPARK-12698:
-----------------------------------

             Summary: How to load specific Hive partition in DataFrame Spark 1.6?
                 Key: SPARK-12698
                 URL: https://issues.apache.org/jira/browse/SPARK-12698
             Project: Spark
          Issue Type: Question
          Components: Java API, SQL
    Affects Versions: 1.6.0
         Environment: YARN, Hive, Hadoop 2.6
            Reporter: Umesh Kacha
            Priority: Blocker


Spark 1.6 onwards as per the official doc we cant add specific hive partitions to DataFrame

spark 1.5 the following used to work and the following dataframe will have entity column

DataFrame df = hiveContext.read().format("orc").load("path/to/table/entity=xyz")
But in Spark 1.6 above does not work and I have to give base path like the following but it does not contain entity column which I want in DataFrame

DataFrame df = hiveContext.read().format("orc").load("path/to/table/") 

How do I load specific hive partition in a dataframe? What was the driver behind removing this feature which was efficient I believe now above Spark 1.6 code load all partitions and if I filter for specific partitions it is not efficient it hits memory and throws GC error because of thousands of partitions get loaded into memory and not the specific partition. Please guide. Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org