You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Michael Malak (JIRA)" <ji...@apache.org> on 2013/02/19 17:43:13 UTC

[jira] [Updated] (HIVE-4022) Structs and struct fields cannot be NULL in INSERT statements

     [ https://issues.apache.org/jira/browse/HIVE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Malak updated HIVE-4022:
--------------------------------

    Description: 
Originally thought to be Avro-specific, and first noted with respect to HIVE-3528 "Avro SerDe doesn't handle serializing Nullable types that require access to a Schema", it turns out even native Hive tables cannot store NULL in a STRUCT field or for the entire STRUCT itself, at least when the NULL is specified directly in the INSERT statement.

Again, this affects both Avro-backed tables and native Hive tables.

***For native Hive tables:

The following:

echo 1,2 >twovalues.csv
hive
CREATE TABLE tc (x INT, y INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
LOAD DATA LOCAL INPATH 'twovalues.csv' INTO TABLE tc;
CREATE TABLE oc (z STRUCT<a: int, b: int>);
INSERT INTO TABLE oc SELECT null FROM tc;

produces the error

FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from void to struct<a:int,b:int>.

The following:

INSERT INTO TABLE oc SELECT named_struct('a', null, 'b', null) FROM tc;

produces the error:

FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from struct<a:void,b:void> to struct<a:int,b:int>.

***For Avro:

In HIVE-3528, there is in fact a null-struct test case in line 14 of
https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt

The test script at
https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q

does indeed work.  But in that test, the query gets all of its data from a test table verbatim:

INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;

If instead we stick in a hard-coded null for the struct directly into the query, it fails:

INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, bytes1, fixed1 FROM test_serializer;

with the following error:

FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different 'as_avro': Cannot convert column 10 from void to struct<sint:int,sboolean:boolean,sstring:string>.

Note, though, that substituting a hard-coded null for string1 (and restoring struct1 into the query) does work:

INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, fixed1 FROM test_serializer;


  was:
Related to HIVE-3528,

There is in fact a null-struct test case in line 14 of
https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt

The test script at
https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q

does indeed work.  But in that test, the query gets all of its data from a test table verbatim:

INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;

If instead we stick in a hard-coded null for the struct directly into the query, it fails:

INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, bytes1, fixed1 FROM test_serializer;

with the following error:

FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different 'as_avro': Cannot convert column 10 from void to struct<sint:int,sboolean:boolean,sstring:string>.

Note, though, that substituting a hard-coded null for string1 (and restoring struct1 into the query) does work:

INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, fixed1 FROM test_serializer;


        Summary: Structs and struct fields cannot be NULL in INSERT statements  (was: Avro SerDe queries don't handle hard-coded nulls for optional/nullable structs)
    
> Structs and struct fields cannot be NULL in INSERT statements
> -------------------------------------------------------------
>
>                 Key: HIVE-4022
>                 URL: https://issues.apache.org/jira/browse/HIVE-4022
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Michael Malak
>
> Originally thought to be Avro-specific, and first noted with respect to HIVE-3528 "Avro SerDe doesn't handle serializing Nullable types that require access to a Schema", it turns out even native Hive tables cannot store NULL in a STRUCT field or for the entire STRUCT itself, at least when the NULL is specified directly in the INSERT statement.
> Again, this affects both Avro-backed tables and native Hive tables.
> ***For native Hive tables:
> The following:
> echo 1,2 >twovalues.csv
> hive
> CREATE TABLE tc (x INT, y INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> LOAD DATA LOCAL INPATH 'twovalues.csv' INTO TABLE tc;
> CREATE TABLE oc (z STRUCT<a: int, b: int>);
> INSERT INTO TABLE oc SELECT null FROM tc;
> produces the error
> FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from void to struct<a:int,b:int>.
> The following:
> INSERT INTO TABLE oc SELECT named_struct('a', null, 'b', null) FROM tc;
> produces the error:
> FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'oc': Cannot convert column 0 from struct<a:void,b:void> to struct<a:int,b:int>.
> ***For Avro:
> In HIVE-3528, there is in fact a null-struct test case in line 14 of
> https://github.com/apache/hive/blob/15cc604bf10f4c2502cb88fb8bb3dcd45647cf2c/data/files/csv.txt
> The test script at
> https://github.com/apache/hive/blob/12d6f3e7d21f94e8b8490b7c6d291c9f4cac8a4f/ql/src/test/queries/clientpositive/avro_nullable_fields.q
> does indeed work.  But in that test, the query gets all of its data from a test table verbatim:
> INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;
> If instead we stick in a hard-coded null for the struct directly into the query, it fails:
> INSERT OVERWRITE TABLE as_avro SELECT string1, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, null, enum1, nullableint, bytes1, fixed1 FROM test_serializer;
> with the following error:
> FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different 'as_avro': Cannot convert column 10 from void to struct<sint:int,sboolean:boolean,sstring:string>.
> Note, though, that substituting a hard-coded null for string1 (and restoring struct1 into the query) does work:
> INSERT OVERWRITE TABLE as_avro SELECT null, int1, tinyint1, smallint1, bigint1, boolean1, float1, double1, list1, map1, struct1, enum1, nullableint, bytes1, fixed1 FROM test_serializer;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira