You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Alexey Zotov (JIRA)" <ji...@apache.org> on 2013/08/06 10:10:48 UTC

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

    [ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13730501#comment-13730501 ] 

Alexey Zotov commented on HIVE-3442:
------------------------------------

I found a better approach. We need to specify the following _avro.schema.url_ _hdfs:///some/path/schema.json_ to both _TBLPROPERTIES and _SERDEPROPERTIES_. It works well with NameNodes HA mode.
                
> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-3442
>                 URL: https://issues.apache.org/jira/browse/HIVE-3442
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>            Reporter: Zhenxiao Luo
>            Assignee: Zhenxiao Luo
>             Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>                       title STRING,
>                       release_date STRING,
>                       video_release_date STRING,
>                       imdb_url STRING,
>                       unknown_genre TINYINT,
>                       action TINYINT,
>                       adventure TINYINT,
>                       animation TINYINT,
>                       children TINYINT,
>                       comedy TINYINT,
>                       crime TINYINT,
>                       documentary TINYINT,
>                       drama TINYINT,
>                       fantasy TINYINT,
>                       film_noir TINYINT,
>                       horror TINYINT,
>                       musical TINYINT,
>                       mystery TINYINT,
>                       romance TINYINT,
>                       sci_fi TINYINT,
>                       thriller TINYINT,
>                       war TINYINT,
>                       western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
>     'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
>     imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime,
>     documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
>     sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the "describe ml_items_as_avro" output. The output is below:
> PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> PREHOOK: type: DROPTABLE
> POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> POSTHOOK: type: DROPTABLE
> PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
>     'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> PREHOOK: type: CREATETABLE
> POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
>     'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> POSTHOOK: type: CREATETABLE
> POSTHOOK: Output: default@ml_items_as_avro
> PREHOOK: query: describe ml_items_as_avro
> PREHOOK: type: DESCTABLE
> POSTHOOK: query: describe ml_items_as_avro
> POSTHOOK: type: DESCTABLE
> error_error_error_error_error_error_error   string  from deserializer
> cannot_determine_schema string  from deserializer
> check   string  from deserializer
> schema  string  from deserializer
> url string  from deserializer
> and string  from deserializer
> literal string  from deserializer
> FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'ml_items_as_avro': Table insclause-0 has 7 columns, but query has 22 columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira