You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by ANKUR GOEL <an...@corp.aol.com> on 2009/01/15 15:34:48 UTC
Data types for Map key value pairs
Hi All,
I have a custom loader that returns a set of fields after
reading a log line. One of the fields returned is of type DataType.Map.
My question is how can I set the data types for this map's (key, value)
pair. In my script I try to generate a record from k,v of this map and
get the error
java.io.IOException: Unknown type Unknown
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:178)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
here is my script
raw = LOAD 'myfile' USING myUdf.MyCustomLoader() ;
filtered = FILTER raw BY (ARG_MAP#'key' is not null);
entry = FOREACH filtered GENERATE A, B, myUdf.MySplit(ARG_MAP#'key',
'|') as FIELDS; // This returns a map with String (key, value) pairs
// The MySplit UDF line splits the value in the map which is "|"
separated and puts the splits it into another Map and returns it. Each
split is keyed by 'field0', 'field1'...'fieldn' where n is the number of
splits.
result = FOREACH entry GENERATE A, B, FIELDS#'field0' as CLIENT_ID,
FIELDS#'field1' as CHANNEL_ID, FIELDS#'field2' as OTHER_ID;
// Here another tuple is generated
store results into 'location' using PigStorage();
Any help here is appreciated.
TIA
-Ankur
RE: Data types for Map key value pairs
Posted by Santhosh Srinivasan <sm...@yahoo-inc.com>.
You can specify map types as colname: map[] in your load statement.
E.g.: a = load 'mydata' as (x: int, m: map[]);
Note that Pig does not enforce types for the keys and values in the map.
The only constraint is that keys should be basic types (int, long,
float, double, string). Internally, the value types are treated as
bytearray allowing the use of any Pig type as the value type. This
allows you to have any type (int, long, float, double, string, tuple,
bag, map) as the value type.
I hope that answers your question regarding types inside a map in Pig.
Let me know if you have further questions/clarifications.
Thanks,
Santhosh
-----Original Message-----
From: ANKUR GOEL [mailto:ankur.goel@corp.aol.com]
Sent: Thursday, January 15, 2009 6:35 AM
To: pig-dev@hadoop.apache.org
Cc: pig-user@hadoop.apache.org
Subject: Data types for Map key value pairs
Hi All,
I have a custom loader that returns a set of fields after
reading a log line. One of the fields returned is of type DataType.Map.
My question is how can I set the data types for this map's (key, value)
pair. In my script I try to generate a record from k,v of this map and
get the error
java.io.IOException: Unknown type Unknown
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.
map(PigMapBase.java:178)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$
Map.map(PigMapOnly.java:65)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
here is my script
raw = LOAD 'myfile' USING myUdf.MyCustomLoader() ;
filtered = FILTER raw BY (ARG_MAP#'key' is not null);
entry = FOREACH filtered GENERATE A, B, myUdf.MySplit(ARG_MAP#'key',
'|') as FIELDS; // This returns a map with String (key, value) pairs
// The MySplit UDF line splits the value in the map which is "|"
separated and puts the splits it into another Map and returns it. Each
split is keyed by 'field0', 'field1'...'fieldn' where n is the number of
splits.
result = FOREACH entry GENERATE A, B, FIELDS#'field0' as CLIENT_ID,
FIELDS#'field1' as CHANNEL_ID, FIELDS#'field2' as OTHER_ID;
// Here another tuple is generated
store results into 'location' using PigStorage();
Any help here is appreciated.
TIA
-Ankur
RE: Data types for Map key value pairs
Posted by Santhosh Srinivasan <sm...@yahoo-inc.com>.
You can specify map types as colname: map[] in your load statement.
E.g.: a = load 'mydata' as (x: int, m: map[]);
Note that Pig does not enforce types for the keys and values in the map.
The only constraint is that keys should be basic types (int, long,
float, double, string). Internally, the value types are treated as
bytearray allowing the use of any Pig type as the value type. This
allows you to have any type (int, long, float, double, string, tuple,
bag, map) as the value type.
I hope that answers your question regarding types inside a map in Pig.
Let me know if you have further questions/clarifications.
Thanks,
Santhosh
-----Original Message-----
From: ANKUR GOEL [mailto:ankur.goel@corp.aol.com]
Sent: Thursday, January 15, 2009 6:35 AM
To: pig-dev@hadoop.apache.org
Cc: pig-user@hadoop.apache.org
Subject: Data types for Map key value pairs
Hi All,
I have a custom loader that returns a set of fields after
reading a log line. One of the fields returned is of type DataType.Map.
My question is how can I set the data types for this map's (key, value)
pair. In my script I try to generate a record from k,v of this map and
get the error
java.io.IOException: Unknown type Unknown
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.
map(PigMapBase.java:178)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$
Map.map(PigMapOnly.java:65)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
here is my script
raw = LOAD 'myfile' USING myUdf.MyCustomLoader() ;
filtered = FILTER raw BY (ARG_MAP#'key' is not null);
entry = FOREACH filtered GENERATE A, B, myUdf.MySplit(ARG_MAP#'key',
'|') as FIELDS; // This returns a map with String (key, value) pairs
// The MySplit UDF line splits the value in the map which is "|"
separated and puts the splits it into another Map and returns it. Each
split is keyed by 'field0', 'field1'...'fieldn' where n is the number of
splits.
result = FOREACH entry GENERATE A, B, FIELDS#'field0' as CLIENT_ID,
FIELDS#'field1' as CHANNEL_ID, FIELDS#'field2' as OTHER_ID;
// Here another tuple is generated
store results into 'location' using PigStorage();
Any help here is appreciated.
TIA
-Ankur