You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Zhan Zhang (JIRA)" <ji...@apache.org> on 2016/06/09 19:47:21 UTC
[jira] [Commented] (SPARK-15848) Spark unable to read partitioned
table in avro format and column name in upper case
[ https://issues.apache.org/jira/browse/SPARK-15848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323195#comment-15323195 ]
Zhan Zhang commented on SPARK-15848:
------------------------------------
cat > file1.csv<<EOF
0,38,91
0,65,28
0,78,16
1,34,96
1,78,14
1,11,43
EOF
cat > file2.csv<<EOF
5,300,100
7,650,20
8,780,160
1,340,963
9,780,142
2,110,430
EOF
CREATE TABLE csv_table (
STUDENT_ID INT,
SUBJECT_ID INT,
marks INT)
PARTITIONED BY (Year INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH "file1.csv" OVERWRITE INTO TABLE csv_table PARTITION (Year = '2002');
LOAD DATA LOCAL INPATH "file2.csv" OVERWRITE INTO TABLE csv_table PARTITION (Year = '2000');
CREATE TABLE avro_table_uppercase
PARTITIONeD BY ( YEAR INT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.literal'='{
"namespace": "com.example.avro",
"name": "student_marks",
"type": "record",
"fields": [ { "name":"STUDENT_ID","type":"int"}, { "name":"SUBJECT_ID","type":"int"}, { "name":"marks","type":"int"}]
}');
INSERT OVERWRITE TABLE avro_table_uppercase partition(Year) SELECT STUDENT_ID, SUBJECT_ID, marks,Year FROM csv_table ;
Now from hive, we can successfully run : select * from avro_table_uppercase;
But from spark-shell, we find:
scala> val tbl = sqlContext.table("default.avro_table_uppercase");
scala> tbl.show
+----------+----------+-----+----+
|student_id|subject_id|marks|year|
+----------+----------+-----+----+
| null| null| 100|2000|
| null| null| 20|2000|
| null| null| 160|2000|
| null| null| 963|2000|
| null| null| 142|2000|
| null| null| 430|2000|
| null| null| 91|2002|
| null| null| 28|2002|
| null| null| 16|2002|
| null| null| 96|2002|
| null| null| 14|2002|
| null| null| 43|2002|
+----------+----------+-----+----+
> Spark unable to read partitioned table in avro format and column name in upper case
> -----------------------------------------------------------------------------------
>
> Key: SPARK-15848
> URL: https://issues.apache.org/jira/browse/SPARK-15848
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: Zhan Zhang
>
> If external partitioned Hive tables created in Avro format.
> Spark is returning "null" values if columns names are in Uppercase in the Avro schema.
> The same tables return proper data when queried in the Hive client.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org