You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "zhichao-li (JIRA)" <ji...@apache.org> on 2016/03/02 03:30:18 UTC

[jira] [Comment Edited] (SPARK-13141) Dataframe created from Hive partitioned tables using HiveContext returns wrong results

    [ https://issues.apache.org/jira/browse/SPARK-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174895#comment-15174895 ] 

zhichao-li edited comment on SPARK-13141 at 3/2/16 2:29 AM:
------------------------------------------------------------

Just try, but this cannot be reproduced from the master version by : 

create table mn.logs (field1 string, field2 string, field3 string)
partitioned by (year string, month string , day string, host string)
row format delimited fields terminated by ',';

insert into logs partition (year="2013", month="07", day="28", host="host1") values ("foo","foo","foo")

hc.table("logs").show()


 as you mentioned, not sure if it's specific to the version of CDH 5.5.1


was (Author: zhichao-li):
Just try, but this cannot be reproduced from the master version by the sql: `create table mn.logs (field1 string, field2 string, field3 string)
partitioned by (year string, month string , day string, host string)
row format delimited fields terminated by ',';` as you mentioned, not sure if it's specific to the version of CDH 5.5.1

> Dataframe created from Hive partitioned tables using HiveContext returns wrong results
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-13141
>                 URL: https://issues.apache.org/jira/browse/SPARK-13141
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>         Environment: CDH 5.5.1
>            Reporter: Simone
>            Priority: Critical
>
> I get wrong dataframe results using HiveContext with Spark 1.5.0 on CDH 5.5.1 in yarn-client mode.
> The problem occurs with partitioned tables on text delimited HDFS data, both with Scala and Python.
> This an example code:
> import org.apache.spark.sql.hive.HiveContext
> val hc = new HiveContext(sc)
> hc.table("my_db.partition_table").show()
> The result is that all values of all rows are NULL, except from the first column (that contains the whole line of data) and the partitioning columns, which appears to be correct.
> With Hive and Impala I get correct results.
> Also with Spark on the same data with a not partitioned table I get correct results.
> I think that similar problems occurs also with Avro data:
> https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Pyspark-Table-Dataframe-returning-empty-records-from-Partitioned/td-p/35836



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org