You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Adrian Hains (JIRA)" <ji...@apache.org> on 2013/11/08 23:32:18 UTC

[jira] [Commented] (HIVE-4022) Structs and struct fields cannot be NULL in INSERT statements

    [ https://issues.apache.org/jira/browse/HIVE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817769#comment-13817769 ] 

Adrian Hains commented on HIVE-4022:
------------------------------------

I found a workaround to get me past this restriction. I had a need to add some struct columns to a table t1 by way of copying the data to a new table t2 with the correct updated schema. Trying to insert directly to t2 by selecting from t1 with null literals failed for me as described in this jira ticket. To work around this I created an additional table t2copy that has the same schema as t2. Then I did an insert to t1 selecting from t2 left outer join t2copy, and referencing the t2copy.newStructColumn instance to have a table-sourced null value pass to t1. This worked. It may be that t2copy having the same struct definition is unnecessary, and a simple empty table with a bogus struct column definition would have worked just as well.

> Structs and struct fields cannot be NULL in INSERT statements
> -------------------------------------------------------------
>
>                 Key: HIVE-4022
>                 URL: https://issues.apache.org/jira/browse/HIVE-4022
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Michael Malak
>
> Originally thought to be Avro-specific, and first noted with respect to HIVE-3528 "Avro SerDe doesn't handle serializing Nullable types that require access to a Schema", it turns out even native Hive tables cannot store NULL in a STRUCT field or for the entire STRUCT itself, at least when the NULL is specified directly in the INSERT statement.
> Again, this affects both Avro-backed tables and native Hive tables.
> ***For native Hive tables:
> The following:
> echo 1,2 >twovalues.csv
> hive
> CREATE TABLE tc (x INT, y INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> LOAD DATA LOCAL INPATH 'twovalues.csv' INTO TABLE tc;
> CREATE TABLE oc (z STRUCT<a: int, b: int>);
> INSERT INTO TABLE oc SELECT null FROM tc;
> produces the error
> FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from void to struct<a:int,b:int>.
> The following:
> INSERT INTO TABLE oc SELECT named_struct('a', null, 'b', null) FROM tc;
> produces the error:
> FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from struct<a:void,b:void> to struct<a:int,b:int>.
> ***For Avro:
> In HIVE-3528, there is in fact a null-struct test case in line 14 of
> https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt
> The test script at
> https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q
> does indeed work.  But in that test, the query gets all of its data from a test table verbatim:
> INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;
> If instead we stick in a hard-coded null for the struct directly into the query, it fails:
> INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, bytes1, fixed1 FROM test_serializer;
> with the following error:
> FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different 'as_avro': Cannot convert column 10 from void to struct<sint:int,sboolean:boolean,sstring:string>.
> Note, though, that substituting a hard-coded null for string1 (and restoring struct1 into the query) does work:
> INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, fixed1 FROM test_serializer;



--
This message was sent by Atlassian JIRA
(v6.1#6144)