You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Zhan Zhang (JIRA)" <ji...@apache.org> on 2016/06/09 19:47:21 UTC

[jira] [Commented] (SPARK-15848) Spark unable to read partitioned table in avro format and column name in upper case

    [ https://issues.apache.org/jira/browse/SPARK-15848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323195#comment-15323195 ] 

Zhan Zhang commented on SPARK-15848:
------------------------------------

cat > file1.csv<<EOF
0,38,91
0,65,28
0,78,16
1,34,96
1,78,14
1,11,43
EOF

cat > file2.csv<<EOF
5,300,100
7,650,20
8,780,160
1,340,963
9,780,142
2,110,430
EOF




CREATE TABLE csv_table (
STUDENT_ID INT,
SUBJECT_ID INT,
marks INT)
PARTITIONED BY (Year INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH "file1.csv" OVERWRITE INTO TABLE csv_table PARTITION (Year = '2002');

LOAD DATA LOCAL INPATH "file2.csv" OVERWRITE INTO TABLE csv_table PARTITION (Year = '2000');




CREATE TABLE avro_table_uppercase
    PARTITIONeD BY ( YEAR INT)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    TBLPROPERTIES (
       'avro.schema.literal'='{
          "namespace": "com.example.avro",
           "name": "student_marks",
           "type": "record",
          "fields": [ { "name":"STUDENT_ID","type":"int"}, { "name":"SUBJECT_ID","type":"int"}, { "name":"marks","type":"int"}]
        }');


INSERT OVERWRITE TABLE avro_table_uppercase partition(Year) SELECT STUDENT_ID, SUBJECT_ID, marks,Year FROM csv_table ;


Now from hive, we can successfully run : select * from avro_table_uppercase;


But from spark-shell, we find:

scala> val tbl = sqlContext.table("default.avro_table_uppercase");

scala> tbl.show

+----------+----------+-----+----+
|student_id|subject_id|marks|year|
+----------+----------+-----+----+
|      null|      null|  100|2000|
|      null|      null|   20|2000|
|      null|      null|  160|2000|
|      null|      null|  963|2000|
|      null|      null|  142|2000|
|      null|      null|  430|2000|
|      null|      null|   91|2002|
|      null|      null|   28|2002|
|      null|      null|   16|2002|
|      null|      null|   96|2002|
|      null|      null|   14|2002|
|      null|      null|   43|2002|
+----------+----------+-----+----+

> Spark unable to read partitioned table in avro format and column name in upper case
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-15848
>                 URL: https://issues.apache.org/jira/browse/SPARK-15848
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Zhan Zhang
>
> If external partitioned Hive tables created in Avro format.
> Spark is returning "null" values if columns names are in Uppercase in the Avro schema.
> The same tables return proper data when queried in the Hive client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org