You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Alexey Zotov (JIRA)" <ji...@apache.org> on 2013/08/06 10:10:48 UTC
[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES
'schema.url' is not working when creating external table
[ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13730501#comment-13730501 ]
Alexey Zotov commented on HIVE-3442:
------------------------------------
I found a better approach. We need to specify the following _avro.schema.url_ _hdfs:///some/path/schema.json_ to both _TBLPROPERTIES and _SERDEPROPERTIES_. It works well with NameNodes HA mode.
> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
> ---------------------------------------------------------------------------------------
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.10.0
> Reporter: Zhenxiao Luo
> Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
> title STRING,
> release_date STRING,
> video_release_date STRING,
> imdb_url STRING,
> unknown_genre TINYINT,
> action TINYINT,
> adventure TINYINT,
> animation TINYINT,
> children TINYINT,
> comedy TINYINT,
> crime TINYINT,
> documentary TINYINT,
> drama TINYINT,
> fantasy TINYINT,
> film_noir TINYINT,
> horror TINYINT,
> musical TINYINT,
> mystery TINYINT,
> romance TINYINT,
> sci_fi TINYINT,
> thriller TINYINT,
> war TINYINT,
> western TINYINT)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
> STORED as INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
> SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
> FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the "describe ml_items_as_avro" output. The output is below:
> PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> PREHOOK: type: DROPTABLE
> POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> POSTHOOK: type: DROPTABLE
> PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
> STORED as INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> PREHOOK: type: CREATETABLE
> POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
> STORED as INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> POSTHOOK: type: CREATETABLE
> POSTHOOK: Output: default@ml_items_as_avro
> PREHOOK: query: describe ml_items_as_avro
> PREHOOK: type: DESCTABLE
> POSTHOOK: query: describe ml_items_as_avro
> POSTHOOK: type: DESCTABLE
> error_error_error_error_error_error_error string from deserializer
> cannot_determine_schema string from deserializer
> check string from deserializer
> schema string from deserializer
> url string from deserializer
> and string from deserializer
> literal string from deserializer
> FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'ml_items_as_avro': Table insclause-0 has 7 columns, but query has 22 columns.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira