You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Valluri, Sathish" <sa...@emc.com> on 2014/05/29 12:41:32 UTC

Hive Avro union data access

Hi,



I have an Hive table created with 3 different union data types for alias_host column name as shown. (array<string>,string, null).



CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type": [{

      "type" : "array",

      "items" : "string"

    },"string","null"]}]}

') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/arrayTests';



How to access and query the contents of this table in where clause.

The queries below like these can be possible if the datatype is not union but when once I set the datatype as union the following queries are failing.



Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 1000;

Error: Error while processing statement: FAILED: SemanticException [Error 10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of EQUAL  is expected to a primitive type, but union is found (state=42000,code=10016)



Can anyone suggest how to access and query the contents of union data types.



Regards

Sathish Valluri












Re: Hive Avro union data access

Posted by Fastupload <fa...@outlook.com>.
Sathish,

Curent version Hive ONLY support union type has two primitive/struct types, i.e.. {“name”: “alias_host”, “type”: [ “string”, “null” ]} , or {“name”: “alias_host”, “type”: [{“type”: “array”, “items”: “string”]}, “null”}.
You can consider JSON string as a column type when the column has smart data structure. and then use Hive built-in JSON function to query data.


Regards,
Link


On May 29, 2014, at 9:21 PM, Valluri, Sathish <sa...@emc.com> wrote:

>  
> Hi,
>  
> I have an Hive table created with 3 different union data types for alias_host column name as shown. (array<string>,string, null).
>  
> CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type": [{
>       "type" : "array",
>       "items" : "string"
>     },"string","null"]}]}
> ') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/arrayTests';
>  
> How to access and query the contents of this table in where clause.
> The queries below like these can be possible if the datatype is not union but when once I set the datatype as union the following queries are failing.
>  
> Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 1000;
> Error: Error while processing statement: FAILED: SemanticException [Error 10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of EQUAL  is expected to a primitive type, but union is found (state=42000,code=10016)
>  
> Can anyone suggest how to access and query the contents of union data types.
>  
> Regards
> Sathish Valluri


RE: Hive Avro union data access

Posted by java8964 <ja...@hotmail.com>.
Your "alias_host" column is an array, from your Avro specification, right?
If so, just use [] to access the specified element in the array
select alias_host[0] from array_tests where aliat_host[0] like '%test%'
If you want to query all the elements in the array, google "explode lateral view" of hive.
Yong

From: sathish.valluri@emc.com
To: user@hive.apache.org
Subject: Hive Avro union data access
Date: Fri, 30 May 2014 06:21:19 +0000

Hi, I have an Hive table created with 3 different union data types for alias_host column name as shown. (array<string>,string, null). CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type": [{      "type" : "array",      "items" : "string"    },"string","null"]}]}') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/arrayTests'; How to access and query the contents of this table in where clause.The queries below like these can be possible if the datatype is not union but when once I set the datatype as union the following queries are failing. Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 1000;Error: Error while processing statement: FAILED: SemanticException [Error 10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of EQUAL  is expected to a primitive type, but union is found (state=42000,code=10016)  Can anyone suggest how to access and query the contents of union data types. RegardsSathish Valluri      		 	   		  

Hive Avro union data access

Posted by "Valluri, Sathish" <sa...@emc.com>.
Hi,

 

I have an Hive table created with 3 different union data types for alias_host column name as shown. (array<string>,string, null).

 

CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type": [{

      "type" : "array",

      "items" : "string"

    },"string","null"]}]}

') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/arrayTests';

 

How to access and query the contents of this table in where clause.

The queries below like these can be possible if the datatype is not union but when once I set the datatype as union the following queries are failing.

 

Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 1000;

Error: Error while processing statement: FAILED: SemanticException [Error 10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of EQUAL  is expected to a primitive type, but union is found (state=42000,code=10016) 

 

Can anyone suggest how to access and query the contents of union data types.

 

Regards

Sathish Valluri

 

 

 

 

 


Hive Avro union data access

Posted by "Valluri, Sathish" <sa...@emc.com>.
 

Hi,

 

I have an Hive table created with 3 different union data types for alias_host column name as shown. (array<string>,string, null).

 

CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type": [{

      "type" : "array",

      "items" : "string"

    },"string","null"]}]}

') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/arrayTests';

 

How to access and query the contents of this table in where clause.

The queries below like these can be possible if the datatype is not union but when once I set the datatype as union the following queries are failing.

 

Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 1000;

Error: Error while processing statement: FAILED: SemanticException [Error 10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of EQUAL  is expected to a primitive type, but union is found (state=42000,code=10016) 

 

Can anyone suggest how to access and query the contents of union data types.

 

Regards

Sathish Valluri

 

 

 

 

 


Hive Avro union data access

Posted by "Valluri, Sathish" <sa...@emc.com>.
Hi,

 

I have an Hive table created with 3 different union data types for alias_host column name as shown. (array<string>,string, null).

 

CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type": [{

      "type" : "array",

      "items" : "string"

    },"string","null"]}]}

') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/arrayTests';

 

How to access and query the contents of this table in where clause.

The queries below like these can be possible if the datatype is not union but when once I set the datatype as union the following queries are failing.

 

Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 1000;

Error: Error while processing statement: FAILED: SemanticException [Error 10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of EQUAL  is expected to a primitive type, but union is found (state=42000,code=10016) 

 

Can anyone suggest how to access and query the contents of union data types.

 

Regards

Sathish Valluri