You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Umesh Kacha (JIRA)" <ji...@apache.org> on 2016/01/07 19:54:39 UTC
[jira] [Created] (SPARK-12698) How to load specific Hive partition
in DataFrame Spark 1.6?
Umesh Kacha created SPARK-12698:
-----------------------------------
Summary: How to load specific Hive partition in DataFrame Spark 1.6?
Key: SPARK-12698
URL: https://issues.apache.org/jira/browse/SPARK-12698
Project: Spark
Issue Type: Question
Components: Java API, SQL
Affects Versions: 1.6.0
Environment: YARN, Hive, Hadoop 2.6
Reporter: Umesh Kacha
Priority: Blocker
Spark 1.6 onwards as per the official doc we cant add specific hive partitions to DataFrame
spark 1.5 the following used to work and the following dataframe will have entity column
DataFrame df = hiveContext.read().format("orc").load("path/to/table/entity=xyz")
But in Spark 1.6 above does not work and I have to give base path like the following but it does not contain entity column which I want in DataFrame
DataFrame df = hiveContext.read().format("orc").load("path/to/table/")
How do I load specific hive partition in a dataframe? What was the driver behind removing this feature which was efficient I believe now above Spark 1.6 code load all partitions and if I filter for specific partitions it is not efficient it hits memory and throws GC error because of thousands of partitions get loaded into memory and not the specific partition. Please guide. Thanks in advance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org