You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Dexin Wang <wa...@gmail.com> on 2011/05/18 20:12:17 UTC

elephantbird JsonLoader doesn't like gz?

Hi,

Anyone using Twitter's elephantbird library? I was using its JsonLoader and
got this error:

WARN  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
string
Unexpected character () at position 0.
at org.json.simple.parser.Yylex.yylex(Unknown Source)
at org.json.simple.parser.JSONParser.nextToken(Unknown Source)
 at org.json.simple.parser.JSONParser.parse(Unknown Source)
at org.json.simple.parser.JSONParser.parse(Unknown Source)

But if I manually gunzip the file to a clear text json file, JsonLoader
works fine.

Again this fails:

raw_json = LOAD 'cc.json.gz' USING
com.twitter.elephantbird.pig.load.JsonLoader();

this works:

$ gunzip cc.json.gz
raw_json = LOAD 'cc.json' USING
com.twitter.elephantbird.pig.load.JsonLoader();

Any suggestions for this? Or is there any other json loader library out
there? I can write my own but would rather use one if already exists.

Thanks,

Dexin

Re: elephantbird JsonLoader doesn't like gz?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Without getting into the details -- local mode in Pig was
fundamentally flawed when it comes to reading anything but the
simplest of formats, which is why the whole thing was changed in 0.7.

Upgrade :).

D

On Thu, May 19, 2011 at 5:46 AM, Eric Lubow <er...@gmail.com> wrote:
> If you are trying to read gzip files on EMR, you CAN'T use local mode.  Once
> you switch to normal mode, everything will start to work.  On EMR, Pig 0.6
> (their stock version) will not read gzip or bzip files in local mode.
>
> -e
>
> On Thu, May 19, 2011 at 00:32, Dexin Wang <wa...@gmail.com> wrote:
>
>> Turns out it's only a problem if I run it in local mode, running it in
>> cluster doesn't have this problem. I'm using EB1.2.5.
>>
>> Wonder how you fix the problem since it seems it's not EB problem. Or are
>> you gunzipping it in EB load function?
>>
>> On Wed, May 18, 2011 at 8:43 PM, Dmitriy Ryaboy <dv...@gmail.com>
>> wrote:
>>
>> > Which version of EB are you using? I recently fixed this for someone,
>> > I believe it's been in every version since 1.2.3
>> >
>> > D
>> >
>> > On Wed, May 18, 2011 at 11:26 AM, Dexin Wang <wa...@gmail.com>
>> wrote:
>> > > Or is it because I'm using Pig 0.6 where gz format is not supported?
>> I'll
>> > > run this on aws EMR which only pig 0.6 is supported. I have to use
>> later
>> > > version of Pig?
>> > >
>> > > On Wed, May 18, 2011 at 11:12 AM, Dexin Wang <wa...@gmail.com>
>> > wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> Anyone using Twitter's elephantbird library? I was using its
>> JsonLoader
>> > and
>> > >> got this error:
>> > >>
>> > >> WARN  com.twitter.elephantbird.pig.load.JsonLoader - Could not
>> > json-decode
>> > >> string
>> > >> Unexpected character () at position 0.
>> > >> at org.json.simple.parser.Yylex.yylex(Unknown Source)
>> > >> at org.json.simple.parser.JSONParser.nextToken(Unknown Source)
>> > >>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
>> > >> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>> > >>
>> > >> But if I manually gunzip the file to a clear text json file,
>> JsonLoader
>> > >> works fine.
>> > >>
>> > >> Again this fails:
>> > >>
>> > >> raw_json = LOAD 'cc.json.gz' USING
>> > >> com.twitter.elephantbird.pig.load.JsonLoader();
>> > >>
>> > >> this works:
>> > >>
>> > >> $ gunzip cc.json.gz
>> > >> raw_json = LOAD 'cc.json' USING
>> > >> com.twitter.elephantbird.pig.load.JsonLoader();
>> > >>
>> > >> Any suggestions for this? Or is there any other json loader library
>> out
>> > >> there? I can write my own but would rather use one if already exists.
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Dexin
>> > >>
>> > >
>> >
>>
>
> Eric Lubow e: eric.lubow@gmail.com w: eric.lubow.org
>

Re: elephantbird JsonLoader doesn't like gz?

Posted by Eric Lubow <er...@gmail.com>.
If you are trying to read gzip files on EMR, you CAN'T use local mode.  Once
you switch to normal mode, everything will start to work.  On EMR, Pig 0.6
(their stock version) will not read gzip or bzip files in local mode.

-e

On Thu, May 19, 2011 at 00:32, Dexin Wang <wa...@gmail.com> wrote:

> Turns out it's only a problem if I run it in local mode, running it in
> cluster doesn't have this problem. I'm using EB1.2.5.
>
> Wonder how you fix the problem since it seems it's not EB problem. Or are
> you gunzipping it in EB load function?
>
> On Wed, May 18, 2011 at 8:43 PM, Dmitriy Ryaboy <dv...@gmail.com>
> wrote:
>
> > Which version of EB are you using? I recently fixed this for someone,
> > I believe it's been in every version since 1.2.3
> >
> > D
> >
> > On Wed, May 18, 2011 at 11:26 AM, Dexin Wang <wa...@gmail.com>
> wrote:
> > > Or is it because I'm using Pig 0.6 where gz format is not supported?
> I'll
> > > run this on aws EMR which only pig 0.6 is supported. I have to use
> later
> > > version of Pig?
> > >
> > > On Wed, May 18, 2011 at 11:12 AM, Dexin Wang <wa...@gmail.com>
> > wrote:
> > >
> > >> Hi,
> > >>
> > >> Anyone using Twitter's elephantbird library? I was using its
> JsonLoader
> > and
> > >> got this error:
> > >>
> > >> WARN  com.twitter.elephantbird.pig.load.JsonLoader - Could not
> > json-decode
> > >> string
> > >> Unexpected character () at position 0.
> > >> at org.json.simple.parser.Yylex.yylex(Unknown Source)
> > >> at org.json.simple.parser.JSONParser.nextToken(Unknown Source)
> > >>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> > >> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> > >>
> > >> But if I manually gunzip the file to a clear text json file,
> JsonLoader
> > >> works fine.
> > >>
> > >> Again this fails:
> > >>
> > >> raw_json = LOAD 'cc.json.gz' USING
> > >> com.twitter.elephantbird.pig.load.JsonLoader();
> > >>
> > >> this works:
> > >>
> > >> $ gunzip cc.json.gz
> > >> raw_json = LOAD 'cc.json' USING
> > >> com.twitter.elephantbird.pig.load.JsonLoader();
> > >>
> > >> Any suggestions for this? Or is there any other json loader library
> out
> > >> there? I can write my own but would rather use one if already exists.
> > >>
> > >> Thanks,
> > >>
> > >> Dexin
> > >>
> > >
> >
>

Eric Lubow e: eric.lubow@gmail.com w: eric.lubow.org

Re: elephantbird JsonLoader doesn't like gz?

Posted by Dexin Wang <wa...@gmail.com>.
Turns out it's only a problem if I run it in local mode, running it in
cluster doesn't have this problem. I'm using EB1.2.5.

Wonder how you fix the problem since it seems it's not EB problem. Or are
you gunzipping it in EB load function?

On Wed, May 18, 2011 at 8:43 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Which version of EB are you using? I recently fixed this for someone,
> I believe it's been in every version since 1.2.3
>
> D
>
> On Wed, May 18, 2011 at 11:26 AM, Dexin Wang <wa...@gmail.com> wrote:
> > Or is it because I'm using Pig 0.6 where gz format is not supported? I'll
> > run this on aws EMR which only pig 0.6 is supported. I have to use later
> > version of Pig?
> >
> > On Wed, May 18, 2011 at 11:12 AM, Dexin Wang <wa...@gmail.com>
> wrote:
> >
> >> Hi,
> >>
> >> Anyone using Twitter's elephantbird library? I was using its JsonLoader
> and
> >> got this error:
> >>
> >> WARN  com.twitter.elephantbird.pig.load.JsonLoader - Could not
> json-decode
> >> string
> >> Unexpected character () at position 0.
> >> at org.json.simple.parser.Yylex.yylex(Unknown Source)
> >> at org.json.simple.parser.JSONParser.nextToken(Unknown Source)
> >>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> >> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> >>
> >> But if I manually gunzip the file to a clear text json file, JsonLoader
> >> works fine.
> >>
> >> Again this fails:
> >>
> >> raw_json = LOAD 'cc.json.gz' USING
> >> com.twitter.elephantbird.pig.load.JsonLoader();
> >>
> >> this works:
> >>
> >> $ gunzip cc.json.gz
> >> raw_json = LOAD 'cc.json' USING
> >> com.twitter.elephantbird.pig.load.JsonLoader();
> >>
> >> Any suggestions for this? Or is there any other json loader library out
> >> there? I can write my own but would rather use one if already exists.
> >>
> >> Thanks,
> >>
> >> Dexin
> >>
> >
>

Re: elephantbird JsonLoader doesn't like gz?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Which version of EB are you using? I recently fixed this for someone,
I believe it's been in every version since 1.2.3

D

On Wed, May 18, 2011 at 11:26 AM, Dexin Wang <wa...@gmail.com> wrote:
> Or is it because I'm using Pig 0.6 where gz format is not supported? I'll
> run this on aws EMR which only pig 0.6 is supported. I have to use later
> version of Pig?
>
> On Wed, May 18, 2011 at 11:12 AM, Dexin Wang <wa...@gmail.com> wrote:
>
>> Hi,
>>
>> Anyone using Twitter's elephantbird library? I was using its JsonLoader and
>> got this error:
>>
>> WARN  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
>> string
>> Unexpected character () at position 0.
>> at org.json.simple.parser.Yylex.yylex(Unknown Source)
>> at org.json.simple.parser.JSONParser.nextToken(Unknown Source)
>>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
>> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>>
>> But if I manually gunzip the file to a clear text json file, JsonLoader
>> works fine.
>>
>> Again this fails:
>>
>> raw_json = LOAD 'cc.json.gz' USING
>> com.twitter.elephantbird.pig.load.JsonLoader();
>>
>> this works:
>>
>> $ gunzip cc.json.gz
>> raw_json = LOAD 'cc.json' USING
>> com.twitter.elephantbird.pig.load.JsonLoader();
>>
>> Any suggestions for this? Or is there any other json loader library out
>> there? I can write my own but would rather use one if already exists.
>>
>> Thanks,
>>
>> Dexin
>>
>

Re: elephantbird JsonLoader doesn't like gz?

Posted by Dexin Wang <wa...@gmail.com>.
Or is it because I'm using Pig 0.6 where gz format is not supported? I'll
run this on aws EMR which only pig 0.6 is supported. I have to use later
version of Pig?

On Wed, May 18, 2011 at 11:12 AM, Dexin Wang <wa...@gmail.com> wrote:

> Hi,
>
> Anyone using Twitter's elephantbird library? I was using its JsonLoader and
> got this error:
>
> WARN  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string
> Unexpected character () at position 0.
> at org.json.simple.parser.Yylex.yylex(Unknown Source)
> at org.json.simple.parser.JSONParser.nextToken(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>
> But if I manually gunzip the file to a clear text json file, JsonLoader
> works fine.
>
> Again this fails:
>
> raw_json = LOAD 'cc.json.gz' USING
> com.twitter.elephantbird.pig.load.JsonLoader();
>
> this works:
>
> $ gunzip cc.json.gz
> raw_json = LOAD 'cc.json' USING
> com.twitter.elephantbird.pig.load.JsonLoader();
>
> Any suggestions for this? Or is there any other json loader library out
> there? I can write my own but would rather use one if already exists.
>
> Thanks,
>
> Dexin
>