You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Sunderlin, Mark" <ma...@teamaol.com> on 2012/06/14 21:26:33 UTC

An array and a map in the same Hive table: Can Separator for Map KV pairs be different than Separator for Array elements?

If my data has three columns and a typical row looks like:

5754^E ContentQuality5,Knowledge,Knowledge/Nature,UnFlagged,EarthReport^EdisplayHeight=293&displayWidth=570&imid=09177970492035608320&sid=577&skey=63&videoid=506875580

I have an integer, an array, and a map.
Columns separator is a Control E (^E)
Array elements are separated by a comma (,)
Map key/value pairs are separated by a ampersand (&), and keys are separated from values by the equals sign (=)

Pretty sure I want this create:

Create table mark_test
Row_num	int,
Tags		array<string>,
Keys		map<string, string>

row format delimited
    fields terminated by '\005'  -- Control E 
    collection items terminated by '\&'  
    map keys terminated by '=' ;

Question:  
Does the 'Collection items terminated by' apply to just the map, or does it also set the item terminator for my array?
If no
	Great! Life is good for me!
If yes
	Ugh.  Can I have some way have a separate item terminator for the array and the map or do I need to manipulate the data before loading to get the map and array's item terminator to be the same?


---
Mark E. Sunderlin
Solutions Architect   |AOL Core Data Technologies
P: 703-265-6935       |C: 540-327-6222 | AIM: MESunderlin
22000 AOL Way,  Dulles, VA  20166



Re: An array and a map in the same Hive table: Can Separator for Map KV pairs be different than Separator for Array elements?

Posted by Aniket Mokashi <an...@gmail.com>.
Hi Mark,

Collection items terminated by applies to both maps and arrays. In your
case, you can play with hive's nested complex data structures (so that you
can introduce another separator) to deserialize your data but that would
require some experimentation (digging into code). This would be non-trivial.

The simplest way would be to specify -
collection items terminated by '\&'
   map keys terminated by '=' ;
in table creation and parse the array field by using split udf in hive.
(This would even work if the array field does not have '&' in it). But, all
the users of this table need to know about this.

In other words,
Create table mark_test
Row_num int,
Tags            string,
Keys            map<string, string>

select split(Tags, ',') from mark_test ...

Hope it helps.

~Aniket


On Thu, Jun 14, 2012 at 12:26 PM, Sunderlin, Mark <
mark.sunderlin@teamaol.com> wrote:

> If my data has three columns and a typical row looks like:
>
> 5754^E
> ContentQuality5,Knowledge,Knowledge/Nature,UnFlagged,EarthReport^EdisplayHeight=293&displayWidth=570&imid=09177970492035608320&sid=577&skey=63&videoid=506875580
>
> I have an integer, an array, and a map.
> Columns separator is a Control E (^E)
> Array elements are separated by a comma (,)
> Map key/value pairs are separated by a ampersand (&), and keys are
> separated from values by the equals sign (=)
>
> Pretty sure I want this create:
>
> Create table mark_test
> Row_num int,
> Tags            array<string>,
> Keys            map<string, string>
>
> row format delimited
>    fields terminated by '\005'  -- Control E
>    collection items terminated by '\&'
>    map keys terminated by '=' ;
>
> Question:
> Does the 'Collection items terminated by' apply to just the map, or does
> it also set the item terminator for my array?
> If no
>        Great! Life is good for me!
> If yes
>        Ugh.  Can I have some way have a separate item terminator for the
> array and the map or do I need to manipulate the data before loading to get
> the map and array's item terminator to be the same?
>
>
> ---
> Mark E. Sunderlin
> Solutions Architect   |AOL Core Data Technologies
> P: 703-265-6935       |C: 540-327-6222 | AIM: MESunderlin
> 22000 AOL Way,  Dulles, VA  20166
>
>
>


-- 
"...:::Aniket:::... Quetzalco@tl"