You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Evan Chiu <ch...@gmail.com> on 2009/05/21 19:05:04 UTC

Can Hive's map type store multiple key-value pairs?

Hi,
I have a data format like this: F1 \t F2 \t F3 \t K1=V1 \t K2=V2 \t K3=V3\n

I've tried to create a table using:

CREATE TABLE test ( f1 STRING, f2 STRING, f3 STRING, kv MAP<STRING,
STRING> )
DELIMITED FIELDS TERMINATED BY '\t' MAP KEYS TERMINATED BY '=';

After I load the data into test, f1, f2 and f3 looks good.  However, the kv
column only contains {K1:V1}, I was hoping it would contain {K1:V1, K2:V2,
K3:V3}.

I think that's because Hive sees a '\t' char between K1=V1 and K2=V2 (same
situation between K2=V2 and K3=V3) then it treat it as a new Field begin so
kv only stores {K1:V1}.  Since the test table doesn't have more columns
after kv, then K2=V2 and K3=V3 is ignored.

Is there any way besides to write a custom SerDe so that I could have the kv
column to store more than one key-value pair?  (I could change the file
format)

thanks,

C.H.

Re: Can Hive's map type store multiple key-value pairs?

Posted by Zheng Shao <zs...@gmail.com>.
Hi Evan,

We need a separator other than tab between the entries in the map.

Can you log the data in this way?
F1 \t F2 \t F3 \t K1=V1 : K2=V2 : K3=V3\n

Then we can create the table like this:
CREATE TABLE test ( f1 STRING, f2 STRING, f3 STRING, kv MAP<STRING,
STRING> )
DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ':' MAP
KEYS TERMINATED BY '=';

We have to have a different separator otherwise it's not possible to know
K2=V2 is in the map, or the next column (in case there is another column
after kv in the table creation statement)
Zheng

On Thu, May 21, 2009 at 10:05 AM, Evan Chiu <ch...@gmail.com>wrote:

> Hi,
> I have a data format like this: F1 \t F2 \t F3 \t K1=V1 \t K2=V2 \t K3=V3\n
>
> I've tried to create a table using:
>
> CREATE TABLE test ( f1 STRING, f2 STRING, f3 STRING, kv MAP<STRING,
> STRING> )
> DELIMITED FIELDS TERMINATED BY '\t' MAP KEYS TERMINATED BY '=';
>
> After I load the data into test, f1, f2 and f3 looks good.  However, the kv
> column only contains {K1:V1}, I was hoping it would contain {K1:V1, K2:V2,
> K3:V3}.
>
> I think that's because Hive sees a '\t' char between K1=V1 and K2=V2 (same
> situation between K2=V2 and K3=V3) then it treat it as a new Field begin so
> kv only stores {K1:V1}.  Since the test table doesn't have more columns
> after kv, then K2=V2 and K3=V3 is ignored.
>
> Is there any way besides to write a custom SerDe so that I could have the
> kv column to store more than one key-value pair?  (I could change the file
> format)
>
> thanks,
>
> C.H.
>



-- 
Yours,
Zheng