You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Kris Coward <kr...@melon.org> on 2010/12/08 22:53:30 UTC
IOException appearing during dump but not illustrate
Hi,
I've recently gotten stumped by a problem where my attempts to dump the
relations produced by a GROUP command give the following error (though
illustrating the same relation works fine):
java.io.IOException: Type mismatch in key from map: expected
org.apache.pig.impl.io.NullableBytesWritable, recieved org.apache.pig.impl.io.NullableText
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
.
.
.
for a little background, the relation that's failing is called y5, and
is produced by the following string of commands (in grunt):
y2 = foreach y1 generate $0 as timestamp, myudfs.httpArgParse($1) as argMap;
y3 = foreach y2 generate argMap#'s' as uid, timestamp as timestamp;
y4 = FILTER y3 BY (uid is not null);
y5 = GROUP y4 BY uid;
and to get an idea what sort of data is involved, ILLUSTRATE y4 yields:
-----------------------------------------------------------------------------------------------------
| y1 | timestamp: int | args: bag({tuple_of_tokens: (token: chararray)}) |
-----------------------------------------------------------------------------------------------------
| | 1265950806 | {(s=1381688313), (u=F68FFA1F655FDF494ABA520D95E1D99E), (ts=1265950805)} |
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
| y2 | timestamp: int | argMap: map |
-----------------------------------------------------------------------------------------------
| | 1265950806 | {u=F68FFA1F655FDF494ABA520D95E1D99E, ts=1265950805, s=1381688313} |
-----------------------------------------------------------------------------------------------
--------------------------------------------
| y3 | uid: bytearray | timestamp: int |
--------------------------------------------
| | 1381688313 | 1265950806 |
--------------------------------------------
--------------------------------------------
| y4 | uid: bytearray | timestamp: int |
--------------------------------------------
| | 1381688313 | 1265950806 |
--------------------------------------------
The same problem was also produced when the FILTER command was omitted,
and the relevant chunk of code in myudfs.httpArgParse is:
StringTokenizer tok = new StringTokenizer((String)pair, "=", false);
if (tok.hasMoreTokens() ) {
String oKey = tok.nextToken();
if (tok.hasMoreTokens() ) {
Object oValue = tok.nextToken();
output.put(oKey, oValue);
} else {
output.put(oKey, null);
}
}
If anyone has any insight how I could get this to work, that'd really
help me out.
Thanks,
Kris
P.S. For those who remember my earlier post about getting httpArgParse
to compile, I took the advice to ditch the InternalMap in favour of a
HashMap<String,Object>
--
Kris Coward http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3
Re: IOException appearing during dump but not illustrate
Posted by Kris Coward <kr...@melon.org>.
That looks to have worked. Thanks.
On Wed, Dec 08, 2010 at 02:04:07PM -0800, Dmitriy Ryaboy wrote:
> Try explicitly casting argMap#'s' to a chararray?
>
>
> On Wed, Dec 8, 2010 at 1:53 PM, Kris Coward <kr...@melon.org> wrote:
>
> > Hi,
> >
> > I've recently gotten stumped by a problem where my attempts to dump the
> > relations produced by a GROUP command give the following error (though
> > illustrating the same relation works fine):
> >
> > java.io.IOException: Type mismatch in key from map: expected
> > org.apache.pig.impl.io.NullableBytesWritable, recieved
> > org.apache.pig.impl.io.NullableText
> > at
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> > at
> > org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> > at
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> > .
> > .
> > .
> >
> > for a little background, the relation that's failing is called y5, and
> > is produced by the following string of commands (in grunt):
> >
> > y2 = foreach y1 generate $0 as timestamp, myudfs.httpArgParse($1) as
> > argMap;
> > y3 = foreach y2 generate argMap#'s' as uid, timestamp as timestamp;
> > y4 = FILTER y3 BY (uid is not null);
> > y5 = GROUP y4 BY uid;
> >
> > and to get an idea what sort of data is involved, ILLUSTRATE y4 yields:
> >
> >
> > -----------------------------------------------------------------------------------------------------
> > | y1 | timestamp: int | args: bag({tuple_of_tokens: (token:
> > chararray)}) |
> >
> > -----------------------------------------------------------------------------------------------------
> > | | 1265950806 | {(s=1381688313),
> > (u=F68FFA1F655FDF494ABA520D95E1D99E), (ts=1265950805)} |
> >
> > -----------------------------------------------------------------------------------------------------
> >
> > -----------------------------------------------------------------------------------------------
> > | y2 | timestamp: int | argMap: map
> > |
> >
> > -----------------------------------------------------------------------------------------------
> > | | 1265950806 | {u=F68FFA1F655FDF494ABA520D95E1D99E,
> > ts=1265950805, s=1381688313} |
> >
> > -----------------------------------------------------------------------------------------------
> > --------------------------------------------
> > | y3 | uid: bytearray | timestamp: int |
> > --------------------------------------------
> > | | 1381688313 | 1265950806 |
> > --------------------------------------------
> > --------------------------------------------
> > | y4 | uid: bytearray | timestamp: int |
> > --------------------------------------------
> > | | 1381688313 | 1265950806 |
> > --------------------------------------------
> >
> > The same problem was also produced when the FILTER command was omitted,
> > and the relevant chunk of code in myudfs.httpArgParse is:
> >
> > StringTokenizer tok = new StringTokenizer((String)pair, "=", false);
> > if (tok.hasMoreTokens() ) {
> > String oKey = tok.nextToken();
> > if (tok.hasMoreTokens() ) {
> > Object oValue = tok.nextToken();
> > output.put(oKey, oValue);
> > } else {
> > output.put(oKey, null);
> > }
> > }
> >
> > If anyone has any insight how I could get this to work, that'd really
> > help me out.
> >
> > Thanks,
> > Kris
> >
> > P.S. For those who remember my earlier post about getting httpArgParse
> > to compile, I took the advice to ditch the InternalMap in favour of a
> > HashMap<String,Object>
> >
> > --
> > Kris Coward http://unripe.melon.org/
> > GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3
> >
--
Kris Coward http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3
Re: IOException appearing during dump but not illustrate
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Try explicitly casting argMap#'s' to a chararray?
On Wed, Dec 8, 2010 at 1:53 PM, Kris Coward <kr...@melon.org> wrote:
> Hi,
>
> I've recently gotten stumped by a problem where my attempts to dump the
> relations produced by a GROUP command give the following error (though
> illustrating the same relation works fine):
>
> java.io.IOException: Type mismatch in key from map: expected
> org.apache.pig.impl.io.NullableBytesWritable, recieved
> org.apache.pig.impl.io.NullableText
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
> at
> org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> .
> .
> .
>
> for a little background, the relation that's failing is called y5, and
> is produced by the following string of commands (in grunt):
>
> y2 = foreach y1 generate $0 as timestamp, myudfs.httpArgParse($1) as
> argMap;
> y3 = foreach y2 generate argMap#'s' as uid, timestamp as timestamp;
> y4 = FILTER y3 BY (uid is not null);
> y5 = GROUP y4 BY uid;
>
> and to get an idea what sort of data is involved, ILLUSTRATE y4 yields:
>
>
> -----------------------------------------------------------------------------------------------------
> | y1 | timestamp: int | args: bag({tuple_of_tokens: (token:
> chararray)}) |
>
> -----------------------------------------------------------------------------------------------------
> | | 1265950806 | {(s=1381688313),
> (u=F68FFA1F655FDF494ABA520D95E1D99E), (ts=1265950805)} |
>
> -----------------------------------------------------------------------------------------------------
>
> -----------------------------------------------------------------------------------------------
> | y2 | timestamp: int | argMap: map
> |
>
> -----------------------------------------------------------------------------------------------
> | | 1265950806 | {u=F68FFA1F655FDF494ABA520D95E1D99E,
> ts=1265950805, s=1381688313} |
>
> -----------------------------------------------------------------------------------------------
> --------------------------------------------
> | y3 | uid: bytearray | timestamp: int |
> --------------------------------------------
> | | 1381688313 | 1265950806 |
> --------------------------------------------
> --------------------------------------------
> | y4 | uid: bytearray | timestamp: int |
> --------------------------------------------
> | | 1381688313 | 1265950806 |
> --------------------------------------------
>
> The same problem was also produced when the FILTER command was omitted,
> and the relevant chunk of code in myudfs.httpArgParse is:
>
> StringTokenizer tok = new StringTokenizer((String)pair, "=", false);
> if (tok.hasMoreTokens() ) {
> String oKey = tok.nextToken();
> if (tok.hasMoreTokens() ) {
> Object oValue = tok.nextToken();
> output.put(oKey, oValue);
> } else {
> output.put(oKey, null);
> }
> }
>
> If anyone has any insight how I could get this to work, that'd really
> help me out.
>
> Thanks,
> Kris
>
> P.S. For those who remember my earlier post about getting httpArgParse
> to compile, I took the advice to ditch the InternalMap in favour of a
> HashMap<String,Object>
>
> --
> Kris Coward http://unripe.melon.org/
> GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3
>