You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Eli Finkelshteyn <ie...@gmail.com> on 2011/09/10 00:49:56 UTC

Loading LZOs With Some JSON

Hi,
I'm currently working on trying to load lzos that contain some JSON 
elements. This is of the form:

item1    item2    {'thing1':'1','thing2':'2'}
item3    item4    {'thing3':'1','thing27':'2'}
item5    item6    {'thing5':'1','thing19':'2'}

I was thinking I could use LzoJsonLoader for this, but it keeps throwing 
me errors like:
ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo 
without native-hadoop

This is despite the fact that I can load normal lzos just fine using 
LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill. What 
should I do to go about loading these files? Does anyone have any ideas?

Cheers,
Eli

Re: Loading LZOs With Some JSON

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
It's a straightforward fix by the way. Feel free to open a github issue for
elephant-bird, or better yet, toss me a pull request :).

D

On Tue, Sep 13, 2011 at 9:31 AM, Eli Finkelshteyn <ie...@gmail.com>wrote:

> Sweet! Just got this working! For anyone with the same problem in the
> future: apparently JsonStringToMap() *does not* like bytearrays. If you
> simply cast your json as a chararray when you're loading, the error
> disappears!
>
> Eli
>
>
> On 9/13/11 11:51 AM, Eli Finkelshteyn wrote:
>
>> Correction: I forgot to run the JsonStringToMap function when writing my
>> last email, when I run that, I get the same error as before
>> (*org.apache.pig.data.**DataByteArray cannot be cast to
>> java.lang.String*).
>>
>> My full workflow is as follows:
>>
>> initial = LOAD 'some_file.lzo' USING com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
>> AS (col1, col2, col3, json_data);
>> map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**
>> piggybank.JsonStringToMap(**json_data) AS mapped_json_data;
>> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
>> type;
>> dump extracted;
>>
>> Any ideas?
>>
>> Eli
>>
>> On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:
>>
>>> Well, it's not throwing me errors anymore. Now it's just discarding the
>>> field. When I run it on two records where I've verified a field exists in
>>> the json, I get:
>>>
>>> Encountered Warning FIELD_DISCARDED_TYPE_**CONVERSION_FAILED 2 time(s).
>>>
>>> More specifically, my json is of the following form:
>>>
>>> {"foo":0,"bar":"hi"}
>>>
>>> On that, I'm running:
>>>
>>> initial = LOAD 'some_file.lzo' USING com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
>>> AS (col1, col2, col3, json_data);
>>> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS
>>> type;
>>> dump extracted;
>>>
>>> Which gives me the above warning along with:
>>>
>>> ()
>>> ()
>>>
>>> I also tried it without the cast to chararray, but received the same
>>> results. Should I be casting json_data as some other data type when I load
>>> it initially? Seems by default it's cast to a bytearray when I describe
>>> initial. Would that be a problem?
>>>
>>> Thanks for all the help so far!
>>>
>>> Eli
>>>
>>>
>>>
>>> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:
>>>
>>>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
>>>> theoretically).
>>>> The values are bytearrays. You are probably trying to treat them as
>>>> strings.
>>>>  You have to do stuff like this:
>>>>
>>>> x = foreach myrelation generate
>>>>   (chararray) mymap#'foo' as foo,
>>>>   (chararray) mymap#'bar' as bar;
>>>>
>>>>
>>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<el...@tumblr.com>
>>>>  wrote:
>>>>
>>>>  Hmmm, now it gets past my mention of the function, but when I run a
>>>>> dump on
>>>>> generated information, I get:
>>>>>
>>>>> 2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.****Grunt
>>>>> -
>>>>> ERROR 2997: Unable to recreate exception from backed error:
>>>>> java.lang.ClassCastException: *org.apache.pig.data.****DataByteArray
>>>>> cannot
>>>>> be cast to java.lang.String*
>>>>>
>>>>> Thanks for all the help so far!
>>>>>
>>>>> Eli
>>>>>
>>>>>
>>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
>>>>>
>>>>>  You also want json-simple-1.1.jar
>>>>>>
>>>>>>
>>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<iefinkel@gmail.***
>>>>>> *com<ie...@gmail.com>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>  Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar,
>>>>>> guava-*.jar,
>>>>>>
>>>>>>> and
>>>>>>> piggybank.jar, and then trying to use that UDF, but getting the
>>>>>>> following
>>>>>>> error:
>>>>>>>
>>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>>>>>>> ParseException
>>>>>>>
>>>>>>> java.lang.******NoClassDefFoundError: org/json/simple/parser/****
>>>>>>> ParseException
>>>>>>>        at java.lang.Class.forName0(******Native Method)
>>>>>>>        at java.lang.Class.forName(Class.******java:247)
>>>>>>>        at org.apache.pig.impl.******PigContext.resolveClassName(**
>>>>>>> PigContext.java:426)
>>>>>>>        at org.apache.pig.impl.******PigContext.****
>>>>>>> instantiateFuncFromSpec(**
>>>>>>> PigContext.java:456)
>>>>>>>        at org.apache.pig.impl.******PigContext.****
>>>>>>> instantiateFuncFromSpec(**
>>>>>>> PigContext.java:508)
>>>>>>>        at org.apache.pig.impl.******PigContext.****
>>>>>>> instantiateFuncFromAlias(**
>>>>>>> PigContext.java:531)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.EvalFuncSpec(******QueryParser.java:5462)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.BaseEvalSpec(******QueryParser.java:5291)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.UnaryExpr(******QueryParser.java:5187)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.CastExpr(******QueryParser.java:5133)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.******
>>>>>>> QueryParser.**
>>>>>>> MultiplicativeExpr(******QueryParser.java:5042)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.AdditiveExpr(******QueryParser.java:4968)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.InfixExpr(******QueryParser.java:4934)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.******
>>>>>>> QueryParser.**
>>>>>>> FlattenedGenerateItem(******QueryParser.java:4861)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.******
>>>>>>> QueryParser.**
>>>>>>> FlattenedGenerateItemList(******QueryParser.java:4747)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.GenerateStatement(******QueryParser.java:4704)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.NestedBlock(******QueryParser.java:4030)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.ForEachClause(******QueryParser.java:3433)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.BaseExpr(******QueryParser.java:1464)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.Expr(QueryParser.******java:1013)
>>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.Parse(QueryParser.******java:800)
>>>>>>>        etc...
>>>>>>>
>>>>>>> Any ideas? I've verified that it recognizes the function itself, and
>>>>>>> that
>>>>>>> the data it's running on is valid json. Not sure what else I can
>>>>>>> check.
>>>>>>>
>>>>>>> Eli
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:
>>>>>>>
>>>>>>>  They derive from the same classes as far as lzo handling goes, so I
>>>>>>>
>>>>>>>> suspect
>>>>>>>> something's up with your environment or inputs if you get
>>>>>>>> LzoTokenizedLoader
>>>>>>>> to work, but LzoJsonStorage does not.
>>>>>>>>
>>>>>>>> Note that LzoTokenizedLoader is deprecated -- just use
>>>>>>>> LzoPigStorage.
>>>>>>>>
>>>>>>>> JsonLoader wouldn't work for you because it expects the complete
>>>>>>>> input
>>>>>>>> line
>>>>>>>> to be json, not part of it. You want to load with LzoPigStorage, and
>>>>>>>> then
>>>>>>>> apply the JsonStringToMap udf to the third field.
>>>>>>>>
>>>>>>>> -D
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<iefinkel@gmail.***
>>>>>>>> ***
>>>>>>>> com<ie...@gmail.com>>
>>>>>>>>
>>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>  Hi,
>>>>>>>>
>>>>>>>>  I'm currently working on trying to load lzos that contain some JSON
>>>>>>>>> elements. This is of the form:
>>>>>>>>>
>>>>>>>>> item1    item2    {'thing1':'1','thing2':'2'}
>>>>>>>>> item3    item4    {'thing3':'1','thing27':'2'}
>>>>>>>>> item5    item6    {'thing5':'1','thing19':'2'}
>>>>>>>>>
>>>>>>>>> I was thinking I could use LzoJsonLoader for this, but it keeps
>>>>>>>>> throwing
>>>>>>>>> me
>>>>>>>>> errors like:
>>>>>>>>> ERROR com.hadoop.compression.lzo.********LzoCodec - Cannot load
>>>>>>>>> native-lzo
>>>>>>>>> without native-hadoop
>>>>>>>>>
>>>>>>>>> This is despite the fact that I can load normal lzos just fine
>>>>>>>>> using
>>>>>>>>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill.
>>>>>>>>> What
>>>>>>>>> should
>>>>>>>>> I do to go about loading these files? Does anyone have any ideas?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Eli
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>
>>
>

Re: Loading LZOs With Some JSON

Posted by Eli Finkelshteyn <ie...@gmail.com>.
Sweet! Just got this working! For anyone with the same problem in the 
future: apparently JsonStringToMap() *does not* like bytearrays. If you 
simply cast your json as a chararray when you're loading, the error 
disappears!

Eli

On 9/13/11 11:51 AM, Eli Finkelshteyn wrote:
> Correction: I forgot to run the JsonStringToMap function when writing 
> my last email, when I run that, I get the same error as before 
> (*org.apache.pig.data.DataByteArray cannot be cast to java.lang.String*).
>
> My full workflow is as follows:
>
> initial = LOAD 'some_file.lzo' USING 
> com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1, 
> col2, col3, json_data);
> map = FOREACH initial GENERATE 
> com.twitter.elephantbird.pig.piggybank.JsonStringToMap(json_data) AS 
> mapped_json_data;
> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' 
> AS type;
> dump extracted;
>
> Any ideas?
>
> Eli
>
> On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:
>> Well, it's not throwing me errors anymore. Now it's just discarding 
>> the field. When I run it on two records where I've verified a field 
>> exists in the json, I get:
>>
>> Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 2 time(s).
>>
>> More specifically, my json is of the following form:
>>
>> {"foo":0,"bar":"hi"}
>>
>> On that, I'm running:
>>
>> initial = LOAD 'some_file.lzo' USING 
>> com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1, 
>> col2, col3, json_data);
>> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS 
>> type;
>> dump extracted;
>>
>> Which gives me the above warning along with:
>>
>> ()
>> ()
>>
>> I also tried it without the cast to chararray, but received the same 
>> results. Should I be casting json_data as some other data type when I 
>> load it initially? Seems by default it's cast to a bytearray when I 
>> describe initial. Would that be a problem?
>>
>> Thanks for all the help so far!
>>
>> Eli
>>
>>
>>
>> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:
>>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
>>> theoretically).
>>> The values are bytearrays. You are probably trying to treat them as 
>>> strings.
>>>   You have to do stuff like this:
>>>
>>> x = foreach myrelation generate
>>>    (chararray) mymap#'foo' as foo,
>>>    (chararray) mymap#'bar' as bar;
>>>
>>>
>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<el...@tumblr.com>  
>>> wrote:
>>>
>>>> Hmmm, now it gets past my mention of the function, but when I run a 
>>>> dump on
>>>> generated information, I get:
>>>>
>>>> 2011-09-12 14:48:12,814 [main] ERROR 
>>>> org.apache.pig.tools.grunt.**Grunt -
>>>> ERROR 2997: Unable to recreate exception from backed error:
>>>> java.lang.ClassCastException: *org.apache.pig.data.**DataByteArray 
>>>> cannot
>>>> be cast to java.lang.String*
>>>>
>>>> Thanks for all the help so far!
>>>>
>>>> Eli
>>>>
>>>>
>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
>>>>
>>>>> You also want json-simple-1.1.jar
>>>>>
>>>>>
>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli 
>>>>> Finkelshteyn<ie...@gmail.com>
>>>>>> wrote:
>>>>>   Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, 
>>>>> guava-*.jar,
>>>>>> and
>>>>>> piggybank.jar, and then trying to use that UDF, but getting the 
>>>>>> following
>>>>>> error:
>>>>>>
>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>>>>>> ParseException
>>>>>>
>>>>>> java.lang.****NoClassDefFoundError: org/json/simple/parser/****
>>>>>> ParseException
>>>>>>         at java.lang.Class.forName0(****Native Method)
>>>>>>         at java.lang.Class.forName(Class.****java:247)
>>>>>>         at org.apache.pig.impl.****PigContext.resolveClassName(**
>>>>>> PigContext.java:426)
>>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>>> instantiateFuncFromSpec(**
>>>>>> PigContext.java:456)
>>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>>> instantiateFuncFromSpec(**
>>>>>> PigContext.java:508)
>>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>>> instantiateFuncFromAlias(**
>>>>>> PigContext.java:531)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.EvalFuncSpec(****QueryParser.java:5462)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.BaseEvalSpec(****QueryParser.java:5291)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.UnaryExpr(****QueryParser.java:5187)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.CastExpr(****QueryParser.java:5133)
>>>>>>         at 
>>>>>> org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**
>>>>>> MultiplicativeExpr(****QueryParser.java:5042)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.AdditiveExpr(****QueryParser.java:4968)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.InfixExpr(****QueryParser.java:4934)
>>>>>>         at 
>>>>>> org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**
>>>>>> FlattenedGenerateItem(****QueryParser.java:4861)
>>>>>>         at 
>>>>>> org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**
>>>>>> FlattenedGenerateItemList(****QueryParser.java:4747)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.GenerateStatement(****QueryParser.java:4704)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.NestedBlock(****QueryParser.java:4030)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.ForEachClause(****QueryParser.java:3433)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.BaseExpr(****QueryParser.java:1464)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.Expr(QueryParser.****java:1013)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.Parse(QueryParser.****java:800)
>>>>>>         etc...
>>>>>>
>>>>>> Any ideas? I've verified that it recognizes the function itself, 
>>>>>> and that
>>>>>> the data it's running on is valid json. Not sure what else I can 
>>>>>> check.
>>>>>>
>>>>>> Eli
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:
>>>>>>
>>>>>>   They derive from the same classes as far as lzo handling goes, 
>>>>>> so I
>>>>>>> suspect
>>>>>>> something's up with your environment or inputs if you get
>>>>>>> LzoTokenizedLoader
>>>>>>> to work, but LzoJsonStorage does not.
>>>>>>>
>>>>>>> Note that LzoTokenizedLoader is deprecated -- just use 
>>>>>>> LzoPigStorage.
>>>>>>>
>>>>>>> JsonLoader wouldn't work for you because it expects the complete 
>>>>>>> input
>>>>>>> line
>>>>>>> to be json, not part of it. You want to load with LzoPigStorage, 
>>>>>>> and
>>>>>>> then
>>>>>>> apply the JsonStringToMap udf to the third field.
>>>>>>>
>>>>>>> -D
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli 
>>>>>>> Finkelshteyn<iefinkel@gmail.****
>>>>>>> com<ie...@gmail.com>>
>>>>>>>
>>>>>>>   wrote:
>>>>>>>
>>>>>>>   Hi,
>>>>>>>
>>>>>>>> I'm currently working on trying to load lzos that contain some 
>>>>>>>> JSON
>>>>>>>> elements. This is of the form:
>>>>>>>>
>>>>>>>> item1    item2    {'thing1':'1','thing2':'2'}
>>>>>>>> item3    item4    {'thing3':'1','thing27':'2'}
>>>>>>>> item5    item6    {'thing5':'1','thing19':'2'}
>>>>>>>>
>>>>>>>> I was thinking I could use LzoJsonLoader for this, but it keeps
>>>>>>>> throwing
>>>>>>>> me
>>>>>>>> errors like:
>>>>>>>> ERROR com.hadoop.compression.lzo.******LzoCodec - Cannot load
>>>>>>>> native-lzo
>>>>>>>> without native-hadoop
>>>>>>>>
>>>>>>>> This is despite the fact that I can load normal lzos just fine 
>>>>>>>> using
>>>>>>>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a 
>>>>>>>> standstill. What
>>>>>>>> should
>>>>>>>> I do to go about loading these files? Does anyone have any ideas?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Eli
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>
>


Re: Loading LZOs With Some JSON

Posted by Eli Finkelshteyn <ie...@gmail.com>.
Haha, yeah; that. I literally just got it to work when you emailed. 
Thanks for all the help, Dmitriy!

Eli

On 9/13/11 12:30 PM, Dmitriy Ryaboy wrote:
> initial = LOAD 'some_file.lzo' USING
> com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
> AS (col1, col2, col3, json_data*:chararray*);
>
> or
>
> map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**
> piggybank.JsonStringToMap((chararray) **json_data) AS mapped_json_data;
>
>
> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
> type;
>
>
> On Tue, Sep 13, 2011 at 8:51 AM, Eli Finkelshteyn<ie...@gmail.com>wrote:
>
>> Correction: I forgot to run the JsonStringToMap function when writing my
>> last email, when I run that, I get the same error as before
>> (*org.apache.pig.data.**DataByteArray cannot be cast to
>> java.lang.String*).
>>
>> My full workflow is as follows:
>>
>>
>> initial = LOAD 'some_file.lzo' USING com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
>> AS (col1, col2, col3, json_data);
>> map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**
>> piggybank.JsonStringToMap(**json_data) AS mapped_json_data;
>> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
>> type;
>> dump extracted;
>>
>> Any ideas?
>>
>> Eli
>>
>>
>> On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:
>>
>>> Well, it's not throwing me errors anymore. Now it's just discarding the
>>> field. When I run it on two records where I've verified a field exists in
>>> the json, I get:
>>>
>>> Encountered Warning FIELD_DISCARDED_TYPE_**CONVERSION_FAILED 2 time(s).
>>>
>>> More specifically, my json is of the following form:
>>>
>>> {"foo":0,"bar":"hi"}
>>>
>>> On that, I'm running:
>>>
>>> initial = LOAD 'some_file.lzo' USING com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
>>> AS (col1, col2, col3, json_data);
>>> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS type;
>>> dump extracted;
>>>
>>> Which gives me the above warning along with:
>>>
>>> ()
>>> ()
>>>
>>> I also tried it without the cast to chararray, but received the same
>>> results. Should I be casting json_data as some other data type when I load
>>> it initially? Seems by default it's cast to a bytearray when I describe
>>> initial. Would that be a problem?
>>>
>>> Thanks for all the help so far!
>>>
>>> Eli
>>>
>>>
>>>
>>> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:
>>>
>>>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
>>>> theoretically).
>>>> The values are bytearrays. You are probably trying to treat them as
>>>> strings.
>>>>   You have to do stuff like this:
>>>>
>>>> x = foreach myrelation generate
>>>>    (chararray) mymap#'foo' as foo,
>>>>    (chararray) mymap#'bar' as bar;
>>>>
>>>>
>>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<el...@tumblr.com>
>>>>   wrote:
>>>>
>>>>   Hmmm, now it gets past my mention of the function, but when I run a dump
>>>>> on
>>>>> generated information, I get:
>>>>>
>>>>> 2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.****Grunt
>>>>> -
>>>>> ERROR 2997: Unable to recreate exception from backed error:
>>>>> java.lang.ClassCastException: *org.apache.pig.data.****DataByteArray
>>>>> cannot
>>>>> be cast to java.lang.String*
>>>>>
>>>>> Thanks for all the help so far!
>>>>>
>>>>> Eli
>>>>>
>>>>>
>>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
>>>>>
>>>>>   You also want json-simple-1.1.jar
>>>>>>
>>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<iefinkel@gmail.****
>>>>>> com<ie...@gmail.com>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>   Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar,
>>>>>> guava-*.jar,
>>>>>>
>>>>>>> and
>>>>>>> piggybank.jar, and then trying to use that UDF, but getting the
>>>>>>> following
>>>>>>> error:
>>>>>>>
>>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>>>>>>> ParseException
>>>>>>>
>>>>>>> java.lang.******NoClassDefFoundError: org/json/simple/parser/****
>>>>>>> ParseException
>>>>>>>         at java.lang.Class.forName0(******Native Method)
>>>>>>>         at java.lang.Class.forName(Class.******java:247)
>>>>>>>         at org.apache.pig.impl.******PigContext.resolveClassName(**
>>>>>>> PigContext.java:426)
>>>>>>>         at org.apache.pig.impl.******PigContext.****
>>>>>>> instantiateFuncFromSpec(**
>>>>>>> PigContext.java:456)
>>>>>>>         at org.apache.pig.impl.******PigContext.****
>>>>>>> instantiateFuncFromSpec(**
>>>>>>> PigContext.java:508)
>>>>>>>         at org.apache.pig.impl.******PigContext.****
>>>>>>> instantiateFuncFromAlias(**
>>>>>>> PigContext.java:531)
>>>>>>>         at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.EvalFuncSpec(******QueryParser.java:5462)
>>>>>>>         at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.BaseEvalSpec(******QueryParser.java:5291)
>>>>>>>         at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.UnaryExpr(******QueryParser.java:5187)
>>>>>>>         at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.CastExpr(******QueryParser.java:5133)
>>>>>>>         at org.apache.pig.impl.******logicalLayer.parser.******
>>>>>>> QueryParser.**
>>>>>>> MultiplicativeExpr(******QueryParser.java:5042)
>>>>>>>         at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.AdditiveExpr(******QueryParser.java:4968)
>>>>>>>         at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.InfixExpr(******QueryParser.java:4934)
>>>>>>>         at org.apache.pig.impl.******logicalLayer.parser.******
>>>>>>> QueryParser.**
>>>>>>> FlattenedGenerateItem(******QueryParser.java:4861)
>>>>>>>         at org.apache.pig.impl.******logicalLayer.parser.******
>>>>>>> QueryParser.**
>>>>>>> FlattenedGenerateItemList(******QueryParser.java:4747)
>>>>>>>         at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.GenerateStatement(******QueryParser.java:4704)
>>>>>>>         at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.NestedBlock(******QueryParser.java:4030)
>>>>>>>         at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.ForEachClause(******QueryParser.java:3433)
>>>>>>>         at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.BaseExpr(******QueryParser.java:1464)
>>>>>>>         at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.Expr(QueryParser.******java:1013)
>>>>>>>         at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>>> QueryParser.Parse(QueryParser.******java:800)
>>>>>>>         etc...
>>>>>>>
>>>>>>> Any ideas? I've verified that it recognizes the function itself, and
>>>>>>> that
>>>>>>> the data it's running on is valid json. Not sure what else I can
>>>>>>> check.
>>>>>>>
>>>>>>> Eli
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:
>>>>>>>
>>>>>>>   They derive from the same classes as far as lzo handling goes, so I
>>>>>>>
>>>>>>>> suspect
>>>>>>>> something's up with your environment or inputs if you get
>>>>>>>> LzoTokenizedLoader
>>>>>>>> to work, but LzoJsonStorage does not.
>>>>>>>>
>>>>>>>> Note that LzoTokenizedLoader is deprecated -- just use LzoPigStorage.
>>>>>>>>
>>>>>>>> JsonLoader wouldn't work for you because it expects the complete
>>>>>>>> input
>>>>>>>> line
>>>>>>>> to be json, not part of it. You want to load with LzoPigStorage, and
>>>>>>>> then
>>>>>>>> apply the JsonStringToMap udf to the third field.
>>>>>>>>
>>>>>>>> -D
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<iefinkel@gmail.****
>>>>>>>> **
>>>>>>>> com<ie...@gmail.com>>
>>>>>>>>
>>>>>>>>   wrote:
>>>>>>>>
>>>>>>>>   Hi,
>>>>>>>>
>>>>>>>>   I'm currently working on trying to load lzos that contain some JSON
>>>>>>>>> elements. This is of the form:
>>>>>>>>>
>>>>>>>>> item1    item2    {'thing1':'1','thing2':'2'}
>>>>>>>>> item3    item4    {'thing3':'1','thing27':'2'}
>>>>>>>>> item5    item6    {'thing5':'1','thing19':'2'}
>>>>>>>>>
>>>>>>>>> I was thinking I could use LzoJsonLoader for this, but it keeps
>>>>>>>>> throwing
>>>>>>>>> me
>>>>>>>>> errors like:
>>>>>>>>> ERROR com.hadoop.compression.lzo.********LzoCodec - Cannot load
>>>>>>>>> native-lzo
>>>>>>>>> without native-hadoop
>>>>>>>>>
>>>>>>>>> This is despite the fact that I can load normal lzos just fine using
>>>>>>>>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill.
>>>>>>>>> What
>>>>>>>>> should
>>>>>>>>> I do to go about loading these files? Does anyone have any ideas?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Eli
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>


Re: Loading LZOs With Some JSON

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
initial = LOAD 'some_file.lzo' USING
com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
AS (col1, col2, col3, json_data*:chararray*);

or

map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**
piggybank.JsonStringToMap((chararray) **json_data) AS mapped_json_data;


extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
type;


On Tue, Sep 13, 2011 at 8:51 AM, Eli Finkelshteyn <ie...@gmail.com>wrote:

> Correction: I forgot to run the JsonStringToMap function when writing my
> last email, when I run that, I get the same error as before
> (*org.apache.pig.data.**DataByteArray cannot be cast to
> java.lang.String*).
>
> My full workflow is as follows:
>
>
> initial = LOAD 'some_file.lzo' USING com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
> AS (col1, col2, col3, json_data);
> map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**
> piggybank.JsonStringToMap(**json_data) AS mapped_json_data;
> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
> type;
> dump extracted;
>
> Any ideas?
>
> Eli
>
>
> On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:
>
>> Well, it's not throwing me errors anymore. Now it's just discarding the
>> field. When I run it on two records where I've verified a field exists in
>> the json, I get:
>>
>> Encountered Warning FIELD_DISCARDED_TYPE_**CONVERSION_FAILED 2 time(s).
>>
>> More specifically, my json is of the following form:
>>
>> {"foo":0,"bar":"hi"}
>>
>> On that, I'm running:
>>
>> initial = LOAD 'some_file.lzo' USING com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
>> AS (col1, col2, col3, json_data);
>> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS type;
>> dump extracted;
>>
>> Which gives me the above warning along with:
>>
>> ()
>> ()
>>
>> I also tried it without the cast to chararray, but received the same
>> results. Should I be casting json_data as some other data type when I load
>> it initially? Seems by default it's cast to a bytearray when I describe
>> initial. Would that be a problem?
>>
>> Thanks for all the help so far!
>>
>> Eli
>>
>>
>>
>> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:
>>
>>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
>>> theoretically).
>>> The values are bytearrays. You are probably trying to treat them as
>>> strings.
>>>  You have to do stuff like this:
>>>
>>> x = foreach myrelation generate
>>>   (chararray) mymap#'foo' as foo,
>>>   (chararray) mymap#'bar' as bar;
>>>
>>>
>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<el...@tumblr.com>
>>>  wrote:
>>>
>>>  Hmmm, now it gets past my mention of the function, but when I run a dump
>>>> on
>>>> generated information, I get:
>>>>
>>>> 2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.****Grunt
>>>> -
>>>> ERROR 2997: Unable to recreate exception from backed error:
>>>> java.lang.ClassCastException: *org.apache.pig.data.****DataByteArray
>>>> cannot
>>>> be cast to java.lang.String*
>>>>
>>>> Thanks for all the help so far!
>>>>
>>>> Eli
>>>>
>>>>
>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
>>>>
>>>>  You also want json-simple-1.1.jar
>>>>>
>>>>>
>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<iefinkel@gmail.****
>>>>> com<ie...@gmail.com>
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>  Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar,
>>>>> guava-*.jar,
>>>>>
>>>>>> and
>>>>>> piggybank.jar, and then trying to use that UDF, but getting the
>>>>>> following
>>>>>> error:
>>>>>>
>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>>>>>> ParseException
>>>>>>
>>>>>> java.lang.******NoClassDefFoundError: org/json/simple/parser/****
>>>>>> ParseException
>>>>>>        at java.lang.Class.forName0(******Native Method)
>>>>>>        at java.lang.Class.forName(Class.******java:247)
>>>>>>        at org.apache.pig.impl.******PigContext.resolveClassName(**
>>>>>> PigContext.java:426)
>>>>>>        at org.apache.pig.impl.******PigContext.****
>>>>>> instantiateFuncFromSpec(**
>>>>>> PigContext.java:456)
>>>>>>        at org.apache.pig.impl.******PigContext.****
>>>>>> instantiateFuncFromSpec(**
>>>>>> PigContext.java:508)
>>>>>>        at org.apache.pig.impl.******PigContext.****
>>>>>> instantiateFuncFromAlias(**
>>>>>> PigContext.java:531)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.EvalFuncSpec(******QueryParser.java:5462)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.BaseEvalSpec(******QueryParser.java:5291)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.UnaryExpr(******QueryParser.java:5187)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.CastExpr(******QueryParser.java:5133)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.******
>>>>>> QueryParser.**
>>>>>> MultiplicativeExpr(******QueryParser.java:5042)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.AdditiveExpr(******QueryParser.java:4968)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.InfixExpr(******QueryParser.java:4934)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.******
>>>>>> QueryParser.**
>>>>>> FlattenedGenerateItem(******QueryParser.java:4861)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.******
>>>>>> QueryParser.**
>>>>>> FlattenedGenerateItemList(******QueryParser.java:4747)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.GenerateStatement(******QueryParser.java:4704)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.NestedBlock(******QueryParser.java:4030)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.ForEachClause(******QueryParser.java:3433)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.BaseExpr(******QueryParser.java:1464)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.Expr(QueryParser.******java:1013)
>>>>>>        at org.apache.pig.impl.******logicalLayer.parser.**
>>>>>> QueryParser.Parse(QueryParser.******java:800)
>>>>>>        etc...
>>>>>>
>>>>>> Any ideas? I've verified that it recognizes the function itself, and
>>>>>> that
>>>>>> the data it's running on is valid json. Not sure what else I can
>>>>>> check.
>>>>>>
>>>>>> Eli
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:
>>>>>>
>>>>>>  They derive from the same classes as far as lzo handling goes, so I
>>>>>>
>>>>>>> suspect
>>>>>>> something's up with your environment or inputs if you get
>>>>>>> LzoTokenizedLoader
>>>>>>> to work, but LzoJsonStorage does not.
>>>>>>>
>>>>>>> Note that LzoTokenizedLoader is deprecated -- just use LzoPigStorage.
>>>>>>>
>>>>>>> JsonLoader wouldn't work for you because it expects the complete
>>>>>>> input
>>>>>>> line
>>>>>>> to be json, not part of it. You want to load with LzoPigStorage, and
>>>>>>> then
>>>>>>> apply the JsonStringToMap udf to the third field.
>>>>>>>
>>>>>>> -D
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<iefinkel@gmail.****
>>>>>>> **
>>>>>>> com<ie...@gmail.com>>
>>>>>>>
>>>>>>>  wrote:
>>>>>>>
>>>>>>>  Hi,
>>>>>>>
>>>>>>>  I'm currently working on trying to load lzos that contain some JSON
>>>>>>>> elements. This is of the form:
>>>>>>>>
>>>>>>>> item1    item2    {'thing1':'1','thing2':'2'}
>>>>>>>> item3    item4    {'thing3':'1','thing27':'2'}
>>>>>>>> item5    item6    {'thing5':'1','thing19':'2'}
>>>>>>>>
>>>>>>>> I was thinking I could use LzoJsonLoader for this, but it keeps
>>>>>>>> throwing
>>>>>>>> me
>>>>>>>> errors like:
>>>>>>>> ERROR com.hadoop.compression.lzo.********LzoCodec - Cannot load
>>>>>>>> native-lzo
>>>>>>>> without native-hadoop
>>>>>>>>
>>>>>>>> This is despite the fact that I can load normal lzos just fine using
>>>>>>>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill.
>>>>>>>> What
>>>>>>>> should
>>>>>>>> I do to go about loading these files? Does anyone have any ideas?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Eli
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>
>

Re: Loading LZOs With Some JSON

Posted by Eli Finkelshteyn <ie...@gmail.com>.
Correction: I forgot to run the JsonStringToMap function when writing my 
last email, when I run that, I get the same error as before 
(*org.apache.pig.data.DataByteArray cannot be cast to java.lang.String*).

My full workflow is as follows:

initial = LOAD 'some_file.lzo' USING 
com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1, col2, 
col3, json_data);
map = FOREACH initial GENERATE 
com.twitter.elephantbird.pig.piggybank.JsonStringToMap(json_data) AS 
mapped_json_data;
extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS 
type;
dump extracted;

Any ideas?

Eli

On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:
> Well, it's not throwing me errors anymore. Now it's just discarding 
> the field. When I run it on two records where I've verified a field 
> exists in the json, I get:
>
> Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 2 time(s).
>
> More specifically, my json is of the following form:
>
> {"foo":0,"bar":"hi"}
>
> On that, I'm running:
>
> initial = LOAD 'some_file.lzo' USING 
> com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1, 
> col2, col3, json_data);
> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS 
> type;
> dump extracted;
>
> Which gives me the above warning along with:
>
> ()
> ()
>
> I also tried it without the cast to chararray, but received the same 
> results. Should I be casting json_data as some other data type when I 
> load it initially? Seems by default it's cast to a bytearray when I 
> describe initial. Would that be a problem?
>
> Thanks for all the help so far!
>
> Eli
>
>
>
> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:
>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
>> theoretically).
>> The values are bytearrays. You are probably trying to treat them as 
>> strings.
>>   You have to do stuff like this:
>>
>> x = foreach myrelation generate
>>    (chararray) mymap#'foo' as foo,
>>    (chararray) mymap#'bar' as bar;
>>
>>
>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<el...@tumblr.com>  
>> wrote:
>>
>>> Hmmm, now it gets past my mention of the function, but when I run a 
>>> dump on
>>> generated information, I get:
>>>
>>> 2011-09-12 14:48:12,814 [main] ERROR 
>>> org.apache.pig.tools.grunt.**Grunt -
>>> ERROR 2997: Unable to recreate exception from backed error:
>>> java.lang.ClassCastException: *org.apache.pig.data.**DataByteArray 
>>> cannot
>>> be cast to java.lang.String*
>>>
>>> Thanks for all the help so far!
>>>
>>> Eli
>>>
>>>
>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
>>>
>>>> You also want json-simple-1.1.jar
>>>>
>>>>
>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli 
>>>> Finkelshteyn<ie...@gmail.com>
>>>>> wrote:
>>>>   Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, 
>>>> guava-*.jar,
>>>>> and
>>>>> piggybank.jar, and then trying to use that UDF, but getting the 
>>>>> following
>>>>> error:
>>>>>
>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>>>>> ParseException
>>>>>
>>>>> java.lang.****NoClassDefFoundError: org/json/simple/parser/****
>>>>> ParseException
>>>>>         at java.lang.Class.forName0(****Native Method)
>>>>>         at java.lang.Class.forName(Class.****java:247)
>>>>>         at org.apache.pig.impl.****PigContext.resolveClassName(**
>>>>> PigContext.java:426)
>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>> instantiateFuncFromSpec(**
>>>>> PigContext.java:456)
>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>> instantiateFuncFromSpec(**
>>>>> PigContext.java:508)
>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>> instantiateFuncFromAlias(**
>>>>> PigContext.java:531)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.EvalFuncSpec(****QueryParser.java:5462)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.BaseEvalSpec(****QueryParser.java:5291)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.UnaryExpr(****QueryParser.java:5187)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.CastExpr(****QueryParser.java:5133)
>>>>>         at 
>>>>> org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**
>>>>> MultiplicativeExpr(****QueryParser.java:5042)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.AdditiveExpr(****QueryParser.java:4968)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.InfixExpr(****QueryParser.java:4934)
>>>>>         at 
>>>>> org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**
>>>>> FlattenedGenerateItem(****QueryParser.java:4861)
>>>>>         at 
>>>>> org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**
>>>>> FlattenedGenerateItemList(****QueryParser.java:4747)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.GenerateStatement(****QueryParser.java:4704)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.NestedBlock(****QueryParser.java:4030)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.ForEachClause(****QueryParser.java:3433)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.BaseExpr(****QueryParser.java:1464)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.Expr(QueryParser.****java:1013)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.Parse(QueryParser.****java:800)
>>>>>         etc...
>>>>>
>>>>> Any ideas? I've verified that it recognizes the function itself, 
>>>>> and that
>>>>> the data it's running on is valid json. Not sure what else I can 
>>>>> check.
>>>>>
>>>>> Eli
>>>>>
>>>>>
>>>>>
>>>>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:
>>>>>
>>>>>   They derive from the same classes as far as lzo handling goes, so I
>>>>>> suspect
>>>>>> something's up with your environment or inputs if you get
>>>>>> LzoTokenizedLoader
>>>>>> to work, but LzoJsonStorage does not.
>>>>>>
>>>>>> Note that LzoTokenizedLoader is deprecated -- just use 
>>>>>> LzoPigStorage.
>>>>>>
>>>>>> JsonLoader wouldn't work for you because it expects the complete 
>>>>>> input
>>>>>> line
>>>>>> to be json, not part of it. You want to load with LzoPigStorage, and
>>>>>> then
>>>>>> apply the JsonStringToMap udf to the third field.
>>>>>>
>>>>>> -D
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<iefinkel@gmail.****
>>>>>> com<ie...@gmail.com>>
>>>>>>
>>>>>>   wrote:
>>>>>>
>>>>>>   Hi,
>>>>>>
>>>>>>> I'm currently working on trying to load lzos that contain some JSON
>>>>>>> elements. This is of the form:
>>>>>>>
>>>>>>> item1    item2    {'thing1':'1','thing2':'2'}
>>>>>>> item3    item4    {'thing3':'1','thing27':'2'}
>>>>>>> item5    item6    {'thing5':'1','thing19':'2'}
>>>>>>>
>>>>>>> I was thinking I could use LzoJsonLoader for this, but it keeps
>>>>>>> throwing
>>>>>>> me
>>>>>>> errors like:
>>>>>>> ERROR com.hadoop.compression.lzo.******LzoCodec - Cannot load
>>>>>>> native-lzo
>>>>>>> without native-hadoop
>>>>>>>
>>>>>>> This is despite the fact that I can load normal lzos just fine 
>>>>>>> using
>>>>>>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill. 
>>>>>>> What
>>>>>>> should
>>>>>>> I do to go about loading these files? Does anyone have any ideas?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Eli
>>>>>>>
>>>>>>>
>>>>>>>
>


Re: Loading LZOs With Some JSON

Posted by Eli Finkelshteyn <ie...@gmail.com>.
Well, it's not throwing me errors anymore. Now it's just discarding the 
field. When I run it on two records where I've verified a field exists 
in the json, I get:

Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 2 time(s).

More specifically, my json is of the following form:

{"foo":0,"bar":"hi"}

On that, I'm running:

initial = LOAD 'some_file.lzo' USING 
com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1, col2, 
col3, json_data);
extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS type;
dump extracted;

Which gives me the above warning along with:

()
()

I also tried it without the cast to chararray, but received the same 
results. Should I be casting json_data as some other data type when I 
load it initially? Seems by default it's cast to a bytearray when I 
describe initial. Would that be a problem?

Thanks for all the help so far!

Eli



On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:
> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
> theoretically).
> The values are bytearrays. You are probably trying to treat them as strings.
>   You have to do stuff like this:
>
> x = foreach myrelation generate
>    (chararray) mymap#'foo' as foo,
>    (chararray) mymap#'bar' as bar;
>
>
> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<el...@tumblr.com>  wrote:
>
>> Hmmm, now it gets past my mention of the function, but when I run a dump on
>> generated information, I get:
>>
>> 2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.**Grunt -
>> ERROR 2997: Unable to recreate exception from backed error:
>> java.lang.ClassCastException: *org.apache.pig.data.**DataByteArray cannot
>> be cast to java.lang.String*
>>
>> Thanks for all the help so far!
>>
>> Eli
>>
>>
>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
>>
>>> You also want json-simple-1.1.jar
>>>
>>>
>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<ie...@gmail.com>
>>>> wrote:
>>>   Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, guava-*.jar,
>>>> and
>>>> piggybank.jar, and then trying to use that UDF, but getting the following
>>>> error:
>>>>
>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>>>> ParseException
>>>>
>>>> java.lang.****NoClassDefFoundError: org/json/simple/parser/****
>>>> ParseException
>>>>         at java.lang.Class.forName0(****Native Method)
>>>>         at java.lang.Class.forName(Class.****java:247)
>>>>         at org.apache.pig.impl.****PigContext.resolveClassName(**
>>>> PigContext.java:426)
>>>>         at org.apache.pig.impl.****PigContext.****
>>>> instantiateFuncFromSpec(**
>>>> PigContext.java:456)
>>>>         at org.apache.pig.impl.****PigContext.****
>>>> instantiateFuncFromSpec(**
>>>> PigContext.java:508)
>>>>         at org.apache.pig.impl.****PigContext.****
>>>> instantiateFuncFromAlias(**
>>>> PigContext.java:531)
>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>> QueryParser.EvalFuncSpec(****QueryParser.java:5462)
>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>> QueryParser.BaseEvalSpec(****QueryParser.java:5291)
>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>> QueryParser.UnaryExpr(****QueryParser.java:5187)
>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>> QueryParser.CastExpr(****QueryParser.java:5133)
>>>>         at org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**
>>>> MultiplicativeExpr(****QueryParser.java:5042)
>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>> QueryParser.AdditiveExpr(****QueryParser.java:4968)
>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>> QueryParser.InfixExpr(****QueryParser.java:4934)
>>>>         at org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**
>>>> FlattenedGenerateItem(****QueryParser.java:4861)
>>>>         at org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**
>>>> FlattenedGenerateItemList(****QueryParser.java:4747)
>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>> QueryParser.GenerateStatement(****QueryParser.java:4704)
>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>> QueryParser.NestedBlock(****QueryParser.java:4030)
>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>> QueryParser.ForEachClause(****QueryParser.java:3433)
>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>> QueryParser.BaseExpr(****QueryParser.java:1464)
>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>> QueryParser.Expr(QueryParser.****java:1013)
>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>> QueryParser.Parse(QueryParser.****java:800)
>>>>         etc...
>>>>
>>>> Any ideas? I've verified that it recognizes the function itself, and that
>>>> the data it's running on is valid json. Not sure what else I can check.
>>>>
>>>> Eli
>>>>
>>>>
>>>>
>>>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:
>>>>
>>>>   They derive from the same classes as far as lzo handling goes, so I
>>>>> suspect
>>>>> something's up with your environment or inputs if you get
>>>>> LzoTokenizedLoader
>>>>> to work, but LzoJsonStorage does not.
>>>>>
>>>>> Note that LzoTokenizedLoader is deprecated -- just use LzoPigStorage.
>>>>>
>>>>> JsonLoader wouldn't work for you because it expects the complete input
>>>>> line
>>>>> to be json, not part of it. You want to load with LzoPigStorage, and
>>>>> then
>>>>> apply the JsonStringToMap udf to the third field.
>>>>>
>>>>> -D
>>>>>
>>>>>
>>>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<iefinkel@gmail.****
>>>>> com<ie...@gmail.com>>
>>>>>
>>>>>   wrote:
>>>>>
>>>>>   Hi,
>>>>>
>>>>>> I'm currently working on trying to load lzos that contain some JSON
>>>>>> elements. This is of the form:
>>>>>>
>>>>>> item1    item2    {'thing1':'1','thing2':'2'}
>>>>>> item3    item4    {'thing3':'1','thing27':'2'}
>>>>>> item5    item6    {'thing5':'1','thing19':'2'}
>>>>>>
>>>>>> I was thinking I could use LzoJsonLoader for this, but it keeps
>>>>>> throwing
>>>>>> me
>>>>>> errors like:
>>>>>> ERROR com.hadoop.compression.lzo.******LzoCodec - Cannot load
>>>>>> native-lzo
>>>>>> without native-hadoop
>>>>>>
>>>>>> This is despite the fact that I can load normal lzos just fine using
>>>>>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill. What
>>>>>> should
>>>>>> I do to go about loading these files? Does anyone have any ideas?
>>>>>>
>>>>>> Cheers,
>>>>>> Eli
>>>>>>
>>>>>>
>>>>>>


Re: Loading LZOs With Some JSON

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
theoretically).
The values are bytearrays. You are probably trying to treat them as strings.
 You have to do stuff like this:

x = foreach myrelation generate
  (chararray) mymap#'foo' as foo,
  (chararray) mymap#'bar' as bar;


On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn <el...@tumblr.com> wrote:

> Hmmm, now it gets past my mention of the function, but when I run a dump on
> generated information, I get:
>
> 2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.**Grunt -
> ERROR 2997: Unable to recreate exception from backed error:
> java.lang.ClassCastException: *org.apache.pig.data.**DataByteArray cannot
> be cast to java.lang.String*
>
> Thanks for all the help so far!
>
> Eli
>
>
> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
>
>> You also want json-simple-1.1.jar
>>
>>
>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<ie...@gmail.com>
>> >wrote:
>>
>>  Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, guava-*.jar,
>>> and
>>> piggybank.jar, and then trying to use that UDF, but getting the following
>>> error:
>>>
>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>>> ParseException
>>>
>>> java.lang.****NoClassDefFoundError: org/json/simple/parser/****
>>> ParseException
>>>        at java.lang.Class.forName0(****Native Method)
>>>        at java.lang.Class.forName(Class.****java:247)
>>>        at org.apache.pig.impl.****PigContext.resolveClassName(**
>>> PigContext.java:426)
>>>        at org.apache.pig.impl.****PigContext.****
>>> instantiateFuncFromSpec(**
>>> PigContext.java:456)
>>>        at org.apache.pig.impl.****PigContext.****
>>> instantiateFuncFromSpec(**
>>> PigContext.java:508)
>>>        at org.apache.pig.impl.****PigContext.****
>>> instantiateFuncFromAlias(**
>>> PigContext.java:531)
>>>        at org.apache.pig.impl.****logicalLayer.parser.**
>>> QueryParser.EvalFuncSpec(****QueryParser.java:5462)
>>>        at org.apache.pig.impl.****logicalLayer.parser.**
>>> QueryParser.BaseEvalSpec(****QueryParser.java:5291)
>>>        at org.apache.pig.impl.****logicalLayer.parser.**
>>> QueryParser.UnaryExpr(****QueryParser.java:5187)
>>>        at org.apache.pig.impl.****logicalLayer.parser.**
>>> QueryParser.CastExpr(****QueryParser.java:5133)
>>>        at org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**
>>> MultiplicativeExpr(****QueryParser.java:5042)
>>>        at org.apache.pig.impl.****logicalLayer.parser.**
>>> QueryParser.AdditiveExpr(****QueryParser.java:4968)
>>>        at org.apache.pig.impl.****logicalLayer.parser.**
>>> QueryParser.InfixExpr(****QueryParser.java:4934)
>>>        at org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**
>>> FlattenedGenerateItem(****QueryParser.java:4861)
>>>        at org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**
>>> FlattenedGenerateItemList(****QueryParser.java:4747)
>>>        at org.apache.pig.impl.****logicalLayer.parser.**
>>> QueryParser.GenerateStatement(****QueryParser.java:4704)
>>>        at org.apache.pig.impl.****logicalLayer.parser.**
>>> QueryParser.NestedBlock(****QueryParser.java:4030)
>>>        at org.apache.pig.impl.****logicalLayer.parser.**
>>> QueryParser.ForEachClause(****QueryParser.java:3433)
>>>        at org.apache.pig.impl.****logicalLayer.parser.**
>>> QueryParser.BaseExpr(****QueryParser.java:1464)
>>>        at org.apache.pig.impl.****logicalLayer.parser.**
>>> QueryParser.Expr(QueryParser.****java:1013)
>>>        at org.apache.pig.impl.****logicalLayer.parser.**
>>> QueryParser.Parse(QueryParser.****java:800)
>>>        etc...
>>>
>>> Any ideas? I've verified that it recognizes the function itself, and that
>>> the data it's running on is valid json. Not sure what else I can check.
>>>
>>> Eli
>>>
>>>
>>>
>>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:
>>>
>>>  They derive from the same classes as far as lzo handling goes, so I
>>>> suspect
>>>> something's up with your environment or inputs if you get
>>>> LzoTokenizedLoader
>>>> to work, but LzoJsonStorage does not.
>>>>
>>>> Note that LzoTokenizedLoader is deprecated -- just use LzoPigStorage.
>>>>
>>>> JsonLoader wouldn't work for you because it expects the complete input
>>>> line
>>>> to be json, not part of it. You want to load with LzoPigStorage, and
>>>> then
>>>> apply the JsonStringToMap udf to the third field.
>>>>
>>>> -D
>>>>
>>>>
>>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<iefinkel@gmail.****
>>>> com<ie...@gmail.com>>
>>>>
>>>>  wrote:
>>>>
>>>>  Hi,
>>>>
>>>>> I'm currently working on trying to load lzos that contain some JSON
>>>>> elements. This is of the form:
>>>>>
>>>>> item1    item2    {'thing1':'1','thing2':'2'}
>>>>> item3    item4    {'thing3':'1','thing27':'2'}
>>>>> item5    item6    {'thing5':'1','thing19':'2'}
>>>>>
>>>>> I was thinking I could use LzoJsonLoader for this, but it keeps
>>>>> throwing
>>>>> me
>>>>> errors like:
>>>>> ERROR com.hadoop.compression.lzo.******LzoCodec - Cannot load
>>>>> native-lzo
>>>>> without native-hadoop
>>>>>
>>>>> This is despite the fact that I can load normal lzos just fine using
>>>>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill. What
>>>>> should
>>>>> I do to go about loading these files? Does anyone have any ideas?
>>>>>
>>>>> Cheers,
>>>>> Eli
>>>>>
>>>>>
>>>>>
>

Re: Loading LZOs With Some JSON

Posted by Eli Finkelshteyn <el...@tumblr.com>.
Hmmm, now it gets past my mention of the function, but when I run a dump 
on generated information, I get:

2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
ERROR 2997: Unable to recreate exception from backed error: 
java.lang.ClassCastException: *org.apache.pig.data.DataByteArray cannot 
be cast to java.lang.String*

Thanks for all the help so far!

Eli

On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
> You also want json-simple-1.1.jar
>
>
> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<ie...@gmail.com>wrote:
>
>> Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, guava-*.jar, and
>> piggybank.jar, and then trying to use that UDF, but getting the following
>> error:
>>
>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>> ParseException
>>
>> java.lang.**NoClassDefFoundError: org/json/simple/parser/**ParseException
>>         at java.lang.Class.forName0(**Native Method)
>>         at java.lang.Class.forName(Class.**java:247)
>>         at org.apache.pig.impl.**PigContext.resolveClassName(**
>> PigContext.java:426)
>>         at org.apache.pig.impl.**PigContext.**instantiateFuncFromSpec(**
>> PigContext.java:456)
>>         at org.apache.pig.impl.**PigContext.**instantiateFuncFromSpec(**
>> PigContext.java:508)
>>         at org.apache.pig.impl.**PigContext.**instantiateFuncFromAlias(**
>> PigContext.java:531)
>>         at org.apache.pig.impl.**logicalLayer.parser.**
>> QueryParser.EvalFuncSpec(**QueryParser.java:5462)
>>         at org.apache.pig.impl.**logicalLayer.parser.**
>> QueryParser.BaseEvalSpec(**QueryParser.java:5291)
>>         at org.apache.pig.impl.**logicalLayer.parser.**
>> QueryParser.UnaryExpr(**QueryParser.java:5187)
>>         at org.apache.pig.impl.**logicalLayer.parser.**
>> QueryParser.CastExpr(**QueryParser.java:5133)
>>         at org.apache.pig.impl.**logicalLayer.parser.**QueryParser.**
>> MultiplicativeExpr(**QueryParser.java:5042)
>>         at org.apache.pig.impl.**logicalLayer.parser.**
>> QueryParser.AdditiveExpr(**QueryParser.java:4968)
>>         at org.apache.pig.impl.**logicalLayer.parser.**
>> QueryParser.InfixExpr(**QueryParser.java:4934)
>>         at org.apache.pig.impl.**logicalLayer.parser.**QueryParser.**
>> FlattenedGenerateItem(**QueryParser.java:4861)
>>         at org.apache.pig.impl.**logicalLayer.parser.**QueryParser.**
>> FlattenedGenerateItemList(**QueryParser.java:4747)
>>         at org.apache.pig.impl.**logicalLayer.parser.**
>> QueryParser.GenerateStatement(**QueryParser.java:4704)
>>         at org.apache.pig.impl.**logicalLayer.parser.**
>> QueryParser.NestedBlock(**QueryParser.java:4030)
>>         at org.apache.pig.impl.**logicalLayer.parser.**
>> QueryParser.ForEachClause(**QueryParser.java:3433)
>>         at org.apache.pig.impl.**logicalLayer.parser.**
>> QueryParser.BaseExpr(**QueryParser.java:1464)
>>         at org.apache.pig.impl.**logicalLayer.parser.**
>> QueryParser.Expr(QueryParser.**java:1013)
>>         at org.apache.pig.impl.**logicalLayer.parser.**
>> QueryParser.Parse(QueryParser.**java:800)
>>         etc...
>>
>> Any ideas? I've verified that it recognizes the function itself, and that
>> the data it's running on is valid json. Not sure what else I can check.
>>
>> Eli
>>
>>
>>
>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:
>>
>>> They derive from the same classes as far as lzo handling goes, so I
>>> suspect
>>> something's up with your environment or inputs if you get
>>> LzoTokenizedLoader
>>> to work, but LzoJsonStorage does not.
>>>
>>> Note that LzoTokenizedLoader is deprecated -- just use LzoPigStorage.
>>>
>>> JsonLoader wouldn't work for you because it expects the complete input
>>> line
>>> to be json, not part of it. You want to load with LzoPigStorage, and then
>>> apply the JsonStringToMap udf to the third field.
>>>
>>> -D
>>>
>>>
>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<ie...@gmail.com>>
>>>   wrote:
>>>
>>>   Hi,
>>>> I'm currently working on trying to load lzos that contain some JSON
>>>> elements. This is of the form:
>>>>
>>>> item1    item2    {'thing1':'1','thing2':'2'}
>>>> item3    item4    {'thing3':'1','thing27':'2'}
>>>> item5    item6    {'thing5':'1','thing19':'2'}
>>>>
>>>> I was thinking I could use LzoJsonLoader for this, but it keeps throwing
>>>> me
>>>> errors like:
>>>> ERROR com.hadoop.compression.lzo.****LzoCodec - Cannot load native-lzo
>>>> without native-hadoop
>>>>
>>>> This is despite the fact that I can load normal lzos just fine using
>>>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill. What
>>>> should
>>>> I do to go about loading these files? Does anyone have any ideas?
>>>>
>>>> Cheers,
>>>> Eli
>>>>
>>>>


Re: Loading LZOs With Some JSON

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
You also want json-simple-1.1.jar


On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn <ie...@gmail.com>wrote:

> Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, guava-*.jar, and
> piggybank.jar, and then trying to use that UDF, but getting the following
> error:
>
> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
> ParseException
>
> java.lang.**NoClassDefFoundError: org/json/simple/parser/**ParseException
>        at java.lang.Class.forName0(**Native Method)
>        at java.lang.Class.forName(Class.**java:247)
>        at org.apache.pig.impl.**PigContext.resolveClassName(**
> PigContext.java:426)
>        at org.apache.pig.impl.**PigContext.**instantiateFuncFromSpec(**
> PigContext.java:456)
>        at org.apache.pig.impl.**PigContext.**instantiateFuncFromSpec(**
> PigContext.java:508)
>        at org.apache.pig.impl.**PigContext.**instantiateFuncFromAlias(**
> PigContext.java:531)
>        at org.apache.pig.impl.**logicalLayer.parser.**
> QueryParser.EvalFuncSpec(**QueryParser.java:5462)
>        at org.apache.pig.impl.**logicalLayer.parser.**
> QueryParser.BaseEvalSpec(**QueryParser.java:5291)
>        at org.apache.pig.impl.**logicalLayer.parser.**
> QueryParser.UnaryExpr(**QueryParser.java:5187)
>        at org.apache.pig.impl.**logicalLayer.parser.**
> QueryParser.CastExpr(**QueryParser.java:5133)
>        at org.apache.pig.impl.**logicalLayer.parser.**QueryParser.**
> MultiplicativeExpr(**QueryParser.java:5042)
>        at org.apache.pig.impl.**logicalLayer.parser.**
> QueryParser.AdditiveExpr(**QueryParser.java:4968)
>        at org.apache.pig.impl.**logicalLayer.parser.**
> QueryParser.InfixExpr(**QueryParser.java:4934)
>        at org.apache.pig.impl.**logicalLayer.parser.**QueryParser.**
> FlattenedGenerateItem(**QueryParser.java:4861)
>        at org.apache.pig.impl.**logicalLayer.parser.**QueryParser.**
> FlattenedGenerateItemList(**QueryParser.java:4747)
>        at org.apache.pig.impl.**logicalLayer.parser.**
> QueryParser.GenerateStatement(**QueryParser.java:4704)
>        at org.apache.pig.impl.**logicalLayer.parser.**
> QueryParser.NestedBlock(**QueryParser.java:4030)
>        at org.apache.pig.impl.**logicalLayer.parser.**
> QueryParser.ForEachClause(**QueryParser.java:3433)
>        at org.apache.pig.impl.**logicalLayer.parser.**
> QueryParser.BaseExpr(**QueryParser.java:1464)
>        at org.apache.pig.impl.**logicalLayer.parser.**
> QueryParser.Expr(QueryParser.**java:1013)
>        at org.apache.pig.impl.**logicalLayer.parser.**
> QueryParser.Parse(QueryParser.**java:800)
>        etc...
>
> Any ideas? I've verified that it recognizes the function itself, and that
> the data it's running on is valid json. Not sure what else I can check.
>
> Eli
>
>
>
> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:
>
>> They derive from the same classes as far as lzo handling goes, so I
>> suspect
>> something's up with your environment or inputs if you get
>> LzoTokenizedLoader
>> to work, but LzoJsonStorage does not.
>>
>> Note that LzoTokenizedLoader is deprecated -- just use LzoPigStorage.
>>
>> JsonLoader wouldn't work for you because it expects the complete input
>> line
>> to be json, not part of it. You want to load with LzoPigStorage, and then
>> apply the JsonStringToMap udf to the third field.
>>
>> -D
>>
>>
>> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<ie...@gmail.com>>
>>  wrote:
>>
>>  Hi,
>>> I'm currently working on trying to load lzos that contain some JSON
>>> elements. This is of the form:
>>>
>>> item1    item2    {'thing1':'1','thing2':'2'}
>>> item3    item4    {'thing3':'1','thing27':'2'}
>>> item5    item6    {'thing5':'1','thing19':'2'}
>>>
>>> I was thinking I could use LzoJsonLoader for this, but it keeps throwing
>>> me
>>> errors like:
>>> ERROR com.hadoop.compression.lzo.****LzoCodec - Cannot load native-lzo
>>> without native-hadoop
>>>
>>> This is despite the fact that I can load normal lzos just fine using
>>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill. What
>>> should
>>> I do to go about loading these files? Does anyone have any ideas?
>>>
>>> Cheers,
>>> Eli
>>>
>>>
>

Re: Loading LZOs With Some JSON

Posted by Eli Finkelshteyn <ie...@gmail.com>.
Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, guava-*.jar, 
and piggybank.jar, and then trying to use that UDF, but getting the 
following error:

ERROR 2998: Unhandled internal error. org/json/simple/parser/ParseException

java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException
         at java.lang.Class.forName0(Native Method)
         at java.lang.Class.forName(Class.java:247)
         at 
org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426)
         at 
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:456)
         at 
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:508)
         at 
org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigContext.java:531)
         at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncSpec(QueryParser.java:5462)
         at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:5291)
         at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:5187)
         at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:5133)
         at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:5042)
         at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:4968)
         at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:4934)
         at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:4861)
         at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:4747)
         at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:4704)
         at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:4030)
         at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3433)
         at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464)
         at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013)
         at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800)
         etc...

Any ideas? I've verified that it recognizes the function itself, and 
that the data it's running on is valid json. Not sure what else I can check.

Eli


On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:
> They derive from the same classes as far as lzo handling goes, so I suspect
> something's up with your environment or inputs if you get LzoTokenizedLoader
> to work, but LzoJsonStorage does not.
>
> Note that LzoTokenizedLoader is deprecated -- just use LzoPigStorage.
>
> JsonLoader wouldn't work for you because it expects the complete input line
> to be json, not part of it. You want to load with LzoPigStorage, and then
> apply the JsonStringToMap udf to the third field.
>
> -D
>
>
> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<ie...@gmail.com>  wrote:
>
>> Hi,
>> I'm currently working on trying to load lzos that contain some JSON
>> elements. This is of the form:
>>
>> item1    item2    {'thing1':'1','thing2':'2'}
>> item3    item4    {'thing3':'1','thing27':'2'}
>> item5    item6    {'thing5':'1','thing19':'2'}
>>
>> I was thinking I could use LzoJsonLoader for this, but it keeps throwing me
>> errors like:
>> ERROR com.hadoop.compression.lzo.**LzoCodec - Cannot load native-lzo
>> without native-hadoop
>>
>> This is despite the fact that I can load normal lzos just fine using
>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill. What should
>> I do to go about loading these files? Does anyone have any ideas?
>>
>> Cheers,
>> Eli
>>


Re: Loading LZOs With Some JSON

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
They derive from the same classes as far as lzo handling goes, so I suspect
something's up with your environment or inputs if you get LzoTokenizedLoader
to work, but LzoJsonStorage does not.

Note that LzoTokenizedLoader is deprecated -- just use LzoPigStorage.

JsonLoader wouldn't work for you because it expects the complete input line
to be json, not part of it. You want to load with LzoPigStorage, and then
apply the JsonStringToMap udf to the third field.

-D


On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn <ie...@gmail.com> wrote:

> Hi,
> I'm currently working on trying to load lzos that contain some JSON
> elements. This is of the form:
>
> item1    item2    {'thing1':'1','thing2':'2'}
> item3    item4    {'thing3':'1','thing27':'2'}
> item5    item6    {'thing5':'1','thing19':'2'}
>
> I was thinking I could use LzoJsonLoader for this, but it keeps throwing me
> errors like:
> ERROR com.hadoop.compression.lzo.**LzoCodec - Cannot load native-lzo
> without native-hadoop
>
> This is despite the fact that I can load normal lzos just fine using
> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill. What should
> I do to go about loading these files? Does anyone have any ideas?
>
> Cheers,
> Eli
>