You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Baron Tsai <ts...@gmail.com> on 2013/12/06 06:57:41 UTC

How can Hive handle the complex data Type through SerDe and UDF/GenericUDF?

------------------------------------Table
Define------------------------------------
CREATE TABLE kvpair (
  id STRING,
  arrstr ARRAY<STRING>,
  arrmap ARRAY<MAP<STRING, STRING>>
 )
ROW FORMAT SERDE "com.cloudera.hive.serde.JSONSerDe";

##com.cloudera.hive.serde.JSONSerDe is a SerDe can handle complex json data.

------------------------------------Sample
Data------------------------------------
{
    "id": "I001",
    "arrstr": [
        "stringA",
        "stringB",
        "stringC"
    ],
    "arrmap": [
        {
            "t0000": "android",
            "t0001": "ca"
        },
        {
            "t0000": "ios",
            "t0001": "us"
        }
    ]
}
------------------------------------CLI------------------------------------
ArrayIterateUDF's Method evaluate signature:
public String evaluate(List<Map<String,String>>jsonStr, String key, String
value) ;

create temporary function kv as 'com.demo.udf.ArrayIterateUDF';
SELECT kv(tb.arrmap,"t0000","android") from kvpair tb;

------------------------------------Problem------------------------------------
I think the data pass into UDF's evaluate Method is processed by the
JSONSerDe, and in this DEMO the value should be some object that
deserialized by JSONSerDe which has the type of
 List<Map<String,String>>.However, it failed.
I dont know I can my UDF can receive the input data in evaluate and parse
it(parse it into JSON Object).And what's the relationship or
implementation between the SerDe and UDF.
Thank you all.