You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jerry Lam <ch...@gmail.com> on 2013/04/17 03:28:53 UTC

Unable to load data using PigStorage that was previously stored using PigStorage

Hi pig users,

I tried to load data using PigStorage that was previously stored using
PigStorage but it failed.

Each line looks like this in the data file that is generated by PigStorage:
[a#hello,b#{([c#11,d#22]),([c#33,d#44])}]

I did the following:
A = load 'data.txt' as document:[];
B = foreach A generate document#'b' as b;
C = foreach B generate flatten(b);
dump C;

I expect to see the following output:
([c#11,d#22])
([c#33,d#44])

Instead, I got:
java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
cast to org.apache.pig.data.DataBag

Anyone encounters this problem before? How can I read the data back?

Thanks,

Jerry

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Jerry Lam <ch...@gmail.com>.
Hi Ruslan:

No worries. It is all good. :) I still have a lot to learn about pig.
The jiras you pointed to did clarified my misunderstandings. Thank you for
your help!

Best Regards,

Jerry






On Fri, Apr 19, 2013 at 4:56 PM, Ruslan Al-Fakikh <me...@gmail.com>wrote:

> Hi Jerry,
> Sorry I misled you in my suggestions a bit:)
> As for your last question: it was interesting for me to investigate the
> issue. Here is what I found:
> https://issues.apache.org/jira/browse/PIG-2216
> https://issues.apache.org/jira/browse/PIG-2315
> So here
> B = foreach A generate document#'b' as b:bag{};"
> due to the misleading Pig syntax/behaviour you are not casting, just
> renaming:(
>
> Ruslan
>
>
>
> On Fri, Apr 19, 2013 at 2:57 AM, Jerry Lam <ch...@gmail.com> wrote:
>
> > Hi Prashant:
> >
> > Just trying to understand my mistake...
> > I thought "B = foreach A generate document#'b' as b:bag{};" will cast
> > bytearray to bag because of b:bag{}. If I understand correctly, this is
> not
> > what I thought. Am I correct?
> >
> > Best Regards,
> >
> > Jerry
> >
> >
> >
> >
> > On Thu, Apr 18, 2013 at 5:41 PM, Prashant Kommireddi <
> prash1784@gmail.com
> > >wrote:
> >
> > > Hi Jerry,
> > >
> > > Like I mentioned in my earlier email "Map values by default are
> > bytearrays.
> > > If you need them to be any other type, you would need to define it
> > > explicitly."
> > >
> > > Difference in the 2 statements is one does a cast to "bag" and the
> other
> > is
> > > a bytearray (default).
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Apr 18, 2013 at 2:14 PM, Jerry Lam <ch...@gmail.com>
> wrote:
> > >
> > > > Hi Prashant:
> > > >
> > > > IT WORKS! THANKS!
> > > > What is the difference between :
> > > > "B = foreach A generate (bag{})document#'b' as b;
> > > > and
> > > > B = foreach A generate document#'b' as b:bag{};"
> > > > ?
> > > >
> > > > The latter gives error: java.lang.ClassCastException:
> > > > org.apache.pig.data.DataByteArray cannot be cast to
> > > > org.apache.pig.data.DataBag
> > > >
> > > > Best Regards,
> > > >
> > > > Jerry
> > > >
> > > >
> > > > On Thu, Apr 18, 2013 at 12:34 PM, Prashant Kommireddi
> > > > <pr...@gmail.com>wrote:
> > > >
> > > > > Well, let me rephrase - the values all have to be the same type if
> > you
> > > > > choose to read all columns in a similar way. If you know in advance
> > its
> > > > > always the value associated with key 'b' that's a bag, why don't
> you
> > > cast
> > > > > that single value?
> > > > >
> > > > > B = foreach A generate (bag{})document#'b' as b;
> > > > >
> > > > >
> > > > > On Thu, Apr 18, 2013 at 7:43 AM, Jerry Lam <ch...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Prashant:
> > > > > >
> > > > > > I read about the map data type in the book "Programming Pig", it
> > > says:
> > > > > > "... By default there is no requirement that all values in a map
> > must
> > > > be
> > > > > of
> > > > > > the same type. It is legitimate to have a map with two keys name
> > and
> > > > age,
> > > > > > where the value for name is a chararray and the value for age is
> an
> > > > int.
> > > > > > Beginning in Pig 0.9, a map can declare its values to all be of
> the
> > > > same
> > > > > > type... "
> > > > > >
> > > > > > I agree that all values in the map can be of the same type but
> this
> > > is
> > > > > not
> > > > > > required in pig.
> > > > > >
> > > > > > Best Regards,
> > > > > >
> > > > > > Jerry
> > > > > >
> > > > > >
> > > > > > On Thu, Apr 18, 2013 at 10:37 AM, Jerry Lam <
> chilinglam@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi Rusian:
> > > > > > >
> > > > > > > I used PigStorage to store the data that is originally using
> Pig
> > > data
> > > > > > > type. It is strange (or a bug in Pig) that I cannot read the
> data
> > > > using
> > > > > > > PigStorage that have been stored using PigStorage, isn't it?
> > > > > > >
> > > > > > > Best Regards,
> > > > > > >
> > > > > > > Jerry
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Apr 17, 2013 at 10:52 PM, Ruslan Al-Fakikh <
> > > > > metaruslan@gmail.com
> > > > > > >wrote:
> > > > > > >
> > > > > > >> The output:
> > > > > > >> ({ ([c#11,d#22]),([c#33,d#44]) })
> > > > > > >> ()
> > > > > > >> looks weird.
> > > > > > >>
> > > > > > >> Jerry, maybe the problem is in using PigStorage. As its
> javadoc
> > > > says:
> > > > > > >>
> > > > > > >> A load function that parses a line of input into fields using
> a
> > > > > > character
> > > > > > >> delimiter
> > > > > > >>
> > > > > > >> So I guess this is just for simple csv lines.
> > > > > > >> But you are trying to load a complicated Map structure as it
> was
> > > > > > formatted
> > > > > > >> by previous storing.
> > > > > > >> Probably you'll need to write your own Loader for this.
> Another
> > > > hint:
> > > > > > >> using
> > > > > > >> the -schema paramenter to PigStorage, but I am not sure it can
> > > > help:(
> > > > > > >>
> > > > > > >> Ruslan
> > > > > > >>
> > > > > > >>
> > > > > > >> On Wed, Apr 17, 2013 at 11:48 PM, Jerry Lam <
> > chilinglam@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >>
> > > > > > >> > Hi Rusian:
> > > > > > >> >
> > > > > > >> > I did a describe B followed by a dump B, the output is:
> > > > > > >> > B: {b: {()}}
> > > > > > >> >
> > > > > > >> > ({ ([c#11,d#22]),([c#33,d#44]) })
> > > > > > >> > ()
> > > > > > >> >
> > > > > > >> > but when I executed
> > > > > > >> >
> > > > > > >> > C = foreach B generate flatten(b);
> > > > > > >> >
> > > > > > >> > dump C;
> > > > > > >> >
> > > > > > >> > I got the exception again...
> > > > > > >> >
> > > > > > >> > 2013-04-17 15:47:39,933 [Thread-26] WARN
> > > > > > >> >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> > > > > > >> > java.lang.Exception: java.lang.ClassCastException:
> > > > > > >> > org.apache.pig.data.DataByteArray cannot be cast to
> > > > > > >> > org.apache.pig.data.DataBag
> > > > > > >> > at
> > > > > > >>
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > > > > > >> > Caused by: java.lang.ClassCastException:
> > > > > > >> org.apache.pig.data.DataByteArray
> > > > > > >> > cannot be cast to org.apache.pig.data.DataBag
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > > > > > >> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > >> > at
> > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > > > > > >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > > > > > >> > at
> > > > > > >>
> > > > >
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > > > > > >> > at
> > > > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > > > > >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > > > > > >> > at java.lang.Thread.run(Thread.java:680)
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > Best Regards,
> > > > > > >> >
> > > > > > >> > Jerry
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Wed, Apr 17, 2013 at 3:26 PM, Ruslan Al-Fakikh <
> > > > > > metaruslan@gmail.com
> > > > > > >> > >wrote:
> > > > > > >> >
> > > > > > >> > > I think that before doing the FLATTEN, you should be 100%
> > sure
> > > > > that
> > > > > > >> your
> > > > > > >> > > cast worked properly. Can you first DESCRIBE B and then
> > DUMP B
> > > > > right
> > > > > > >> > away?
> > > > > > >> > > Or probably it just can't be cast in this way. Honestly I
> > > don't
> > > > > know
> > > > > > >> > > exactly how it works, but here:
> > > > > > >> > > http://pig.apache.org/docs/r0.10.0/basic.html#cast
> > > > > > >> > > I see that casting from a map to a bag should produce an
> > > error.
> > > > > > >> > > Hope that helps.
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam <
> > > > chilinglam@gmail.com>
> > > > > > >> wrote:
> > > > > > >> > >
> > > > > > >> > > > Hi Rusian:
> > > > > > >> > > >
> > > > > > >> > > > Thanks for your help. I really appreciate it. It really
> > > > puzzled
> > > > > > me.
> > > > > > >> > > >
> > > > > > >> > > > I did a "describe B", the output is "B: {b: bytearray}".
> > > > > > >> > > >
> > > > > > >> > > > I then tried to cast it as suggested, I got:
> > > > > > >> > > > B = foreach A generate document#'b' as b:{};
> > > > > > >> > > > describe B;
> > > > > > >> > > > B: {b: {()}}
> > > > > > >> > > >
> > > > > > >> > > > Then I proceed with:
> > > > > > >> > > > C = foreach B generate flatten(b);
> > > > > > >> > > >
> > > > > > >> > > > I got:
> > > > > > >> > > > 2013-04-17 13:38:04,601 [Thread-16] WARN
> > > > > > >> > > >  org.apache.hadoop.mapred.LocalJobRunner -
> job_local_0002
> > > > > > >> > > > java.lang.Exception: java.lang.ClassCastException:
> > > > > > >> > > > org.apache.pig.data.DataByteArray cannot be cast to
> > > > > > >> > > > org.apache.pig.data.DataBag
> > > > > > >> > > > at
> > > > > > >> > >
> > > > > > >>
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > > > > > >> > > > Caused by: java.lang.ClassCastException:
> > > > > > >> > > org.apache.pig.data.DataByteArray
> > > > > > >> > > > cannot be cast to org.apache.pig.data.DataBag
> > > > > > >> > > > at
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > > > > > >> > > > at
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > > > > > >> > > > at
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > > > > > >> > > > at
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > > > > > >> > > > at
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > > > > > >> > > > at
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > > > > > >> > > > at
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > > > > > >> > > > at
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > > > > > >> > > > at
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > > >> > > > at
> > > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > > > > > >> > > > at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > > > > > >> > > > at
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > > > > > >> > > > at
> > > > > > >> > >
> > > > > > >>
> > > > >
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > > > > > >> > > > at
> > > > > > >>
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > > > > >> > > > at
> > java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > > > > >> > > > at
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > > > > > >> > > > at
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > > > > > >> > > > at java.lang.Thread.run(Thread.java:680)
> > > > > > >> > > >
> > > > > > >> > > > Best Regards,
> > > > > > >> > > >
> > > > > > >> > > > Jerry
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <
> > > > > > >> > metaruslan@gmail.com
> > > > > > >> > > > >wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Hey, and as for converting a map of tuples, probably i
> > got
> > > > you
> > > > > > >> wrong.
> > > > > > >> > > If
> > > > > > >> > > > > you can get to every value manually withing FOREACH
> > then I
> > > > see
> > > > > > no
> > > > > > >> > > problem
> > > > > > >> > > > > in doing so.
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <
> > > > > > >> > > metaruslan@gmail.com
> > > > > > >> > > > > >wrote:
> > > > > > >> > > > >
> > > > > > >> > > > > > I am not sure whether you can convert a map to a
> > tuple.
> > > > > > >> > > > > > But I am curious about one thing:
> > > > > > >> > > > > > your are trying to use 'b' as a Bag, right? Because
> > > > FLATTEN
> > > > > > >> needs
> > > > > > >> > it
> > > > > > >> > > to
> > > > > > >> > > > > be
> > > > > > >> > > > > > a Bag I guess:
> > > > > > >> > > > > >
> http://pig.apache.org/docs/r0.10.0/basic.html#flatten
> > > > > > >> > > > > > But it seems that Pig thinks that b is a byte array:
> > > > > > >> > > > > > java.lang.ClassCastException:
> > > > > > org.apache.pig.data.DataByteArray
> > > > > > >> > > cannot
> > > > > > >> > > > be
> > > > > > >> > > > > > cast to org.apache.pig.data.DataBag
> > > > > > >> > > > > > Can you do this?:
> > > > > > >> > > > > > DESCRIBE B
> > > > > > >> > > > > >
> > > > > > >> > > > > > I suppose it can look like a Bag in the output of
> > DUMP,
> > > > but
> > > > > I
> > > > > > >> think
> > > > > > >> > > Pig
> > > > > > >> > > > > > doesn't know it is a Bag, maybe you'll need some
> kind
> > of
> > > > > > >> explicit
> > > > > > >> > > cast?
> > > > > > >> > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <
> > > > > > >> chilinglam@gmail.com>
> > > > > > >> > > > wrote:
> > > > > > >> > > > > >
> > > > > > >> > > > > >> Hi Rusian,
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> I tried to debug each step already but no luck.
> > > > > > >> > > > > >> I did a dump (dump B;) after B = foreach A generate
> > > > > > >> document#'b'
> > > > > > >> > as
> > > > > > >> > > b;
> > > > > > >> > > > > >> I got {([c#11,d#22]),([c#33,d#44])}
> > > > > > >> > > > > >> but it fails when I did C = foreach B generate
> > > > flatten(b);
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> I don't have controls over the input. It is passed
> as
> > > Map
> > > > > of
> > > > > > >> > Maps. I
> > > > > > >> > > > > guess
> > > > > > >> > > > > >> it makes lookup easier using a map with keys.
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> Can I convert map to tuple?
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> Best Regards,
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> Jerry
> > > > > > >> > > > > >>
> > > > > > >> > > > > >>
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh
> <
> > > > > > >> > > > > metaruslan@gmail.com
> > > > > > >> > > > > >> >wrote:
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> > Hi Jerry,
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> > I would recommend to debug the issue step by
> step.
> > > Just
> > > > > > after
> > > > > > >> > this
> > > > > > >> > > > > line:
> > > > > > >> > > > > >> > A = load 'data.txt' as document:[];
> > > > > > >> > > > > >> > and then right after that:
> > > > > > >> > > > > >> > DESCRIBE A;
> > > > > > >> > > > > >> > DUMP A;
> > > > > > >> > > > > >> > and so on...
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> > To be honest I haven't used maps that much. Just
> > > > curious,
> > > > > > why
> > > > > > >> > did
> > > > > > >> > > > you
> > > > > > >> > > > > >> > choose to use them? You can also use regular
> tuples
> > > for
> > > > > > >> storing
> > > > > > >> > > the
> > > > > > >> > > > > >> > relations. Also you can store the tuples with a
> > > schema
> > > > > > file.
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> > Ruslan
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <
> > > > > > >> > chilinglam@gmail.com>
> > > > > > >> > > > > >> wrote:
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >> > > Hi pig users,
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > I tried to load data using PigStorage that was
> > > > > previously
> > > > > > >> > stored
> > > > > > >> > > > > using
> > > > > > >> > > > > >> > > PigStorage but it failed.
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > Each line looks like this in the data file that
> > is
> > > > > > >> generated
> > > > > > >> > by
> > > > > > >> > > > > >> > PigStorage:
> > > > > > >> > > > > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > I did the following:
> > > > > > >> > > > > >> > > A = load 'data.txt' as document:[];
> > > > > > >> > > > > >> > > B = foreach A generate document#'b' as b;
> > > > > > >> > > > > >> > > C = foreach B generate flatten(b);
> > > > > > >> > > > > >> > > dump C;
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > I expect to see the following output:
> > > > > > >> > > > > >> > > ([c#11,d#22])
> > > > > > >> > > > > >> > > ([c#33,d#44])
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > Instead, I got:
> > > > > > >> > > > > >> > > java.lang.ClassCastException:
> > > > > > >> > org.apache.pig.data.DataByteArray
> > > > > > >> > > > > >> cannot be
> > > > > > >> > > > > >> > > cast to org.apache.pig.data.DataBag
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > Anyone encounters this problem before? How can
> I
> > > read
> > > > > the
> > > > > > >> data
> > > > > > >> > > > back?
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > Thanks,
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> > > Jerry
> > > > > > >> > > > > >> > >
> > > > > > >> > > > > >> >
> > > > > > >> > > > > >>
> > > > > > >> > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Ruslan Al-Fakikh <me...@gmail.com>.
Hi Jerry,
Sorry I misled you in my suggestions a bit:)
As for your last question: it was interesting for me to investigate the
issue. Here is what I found:
https://issues.apache.org/jira/browse/PIG-2216
https://issues.apache.org/jira/browse/PIG-2315
So here
B = foreach A generate document#'b' as b:bag{};"
due to the misleading Pig syntax/behaviour you are not casting, just
renaming:(

Ruslan



On Fri, Apr 19, 2013 at 2:57 AM, Jerry Lam <ch...@gmail.com> wrote:

> Hi Prashant:
>
> Just trying to understand my mistake...
> I thought "B = foreach A generate document#'b' as b:bag{};" will cast
> bytearray to bag because of b:bag{}. If I understand correctly, this is not
> what I thought. Am I correct?
>
> Best Regards,
>
> Jerry
>
>
>
>
> On Thu, Apr 18, 2013 at 5:41 PM, Prashant Kommireddi <prash1784@gmail.com
> >wrote:
>
> > Hi Jerry,
> >
> > Like I mentioned in my earlier email "Map values by default are
> bytearrays.
> > If you need them to be any other type, you would need to define it
> > explicitly."
> >
> > Difference in the 2 statements is one does a cast to "bag" and the other
> is
> > a bytearray (default).
> >
> >
> >
> >
> >
> >
> > On Thu, Apr 18, 2013 at 2:14 PM, Jerry Lam <ch...@gmail.com> wrote:
> >
> > > Hi Prashant:
> > >
> > > IT WORKS! THANKS!
> > > What is the difference between :
> > > "B = foreach A generate (bag{})document#'b' as b;
> > > and
> > > B = foreach A generate document#'b' as b:bag{};"
> > > ?
> > >
> > > The latter gives error: java.lang.ClassCastException:
> > > org.apache.pig.data.DataByteArray cannot be cast to
> > > org.apache.pig.data.DataBag
> > >
> > > Best Regards,
> > >
> > > Jerry
> > >
> > >
> > > On Thu, Apr 18, 2013 at 12:34 PM, Prashant Kommireddi
> > > <pr...@gmail.com>wrote:
> > >
> > > > Well, let me rephrase - the values all have to be the same type if
> you
> > > > choose to read all columns in a similar way. If you know in advance
> its
> > > > always the value associated with key 'b' that's a bag, why don't you
> > cast
> > > > that single value?
> > > >
> > > > B = foreach A generate (bag{})document#'b' as b;
> > > >
> > > >
> > > > On Thu, Apr 18, 2013 at 7:43 AM, Jerry Lam <ch...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Prashant:
> > > > >
> > > > > I read about the map data type in the book "Programming Pig", it
> > says:
> > > > > "... By default there is no requirement that all values in a map
> must
> > > be
> > > > of
> > > > > the same type. It is legitimate to have a map with two keys name
> and
> > > age,
> > > > > where the value for name is a chararray and the value for age is an
> > > int.
> > > > > Beginning in Pig 0.9, a map can declare its values to all be of the
> > > same
> > > > > type... "
> > > > >
> > > > > I agree that all values in the map can be of the same type but this
> > is
> > > > not
> > > > > required in pig.
> > > > >
> > > > > Best Regards,
> > > > >
> > > > > Jerry
> > > > >
> > > > >
> > > > > On Thu, Apr 18, 2013 at 10:37 AM, Jerry Lam <ch...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi Rusian:
> > > > > >
> > > > > > I used PigStorage to store the data that is originally using Pig
> > data
> > > > > > type. It is strange (or a bug in Pig) that I cannot read the data
> > > using
> > > > > > PigStorage that have been stored using PigStorage, isn't it?
> > > > > >
> > > > > > Best Regards,
> > > > > >
> > > > > > Jerry
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Apr 17, 2013 at 10:52 PM, Ruslan Al-Fakikh <
> > > > metaruslan@gmail.com
> > > > > >wrote:
> > > > > >
> > > > > >> The output:
> > > > > >> ({ ([c#11,d#22]),([c#33,d#44]) })
> > > > > >> ()
> > > > > >> looks weird.
> > > > > >>
> > > > > >> Jerry, maybe the problem is in using PigStorage. As its javadoc
> > > says:
> > > > > >>
> > > > > >> A load function that parses a line of input into fields using a
> > > > > character
> > > > > >> delimiter
> > > > > >>
> > > > > >> So I guess this is just for simple csv lines.
> > > > > >> But you are trying to load a complicated Map structure as it was
> > > > > formatted
> > > > > >> by previous storing.
> > > > > >> Probably you'll need to write your own Loader for this. Another
> > > hint:
> > > > > >> using
> > > > > >> the -schema paramenter to PigStorage, but I am not sure it can
> > > help:(
> > > > > >>
> > > > > >> Ruslan
> > > > > >>
> > > > > >>
> > > > > >> On Wed, Apr 17, 2013 at 11:48 PM, Jerry Lam <
> chilinglam@gmail.com
> > >
> > > > > wrote:
> > > > > >>
> > > > > >> > Hi Rusian:
> > > > > >> >
> > > > > >> > I did a describe B followed by a dump B, the output is:
> > > > > >> > B: {b: {()}}
> > > > > >> >
> > > > > >> > ({ ([c#11,d#22]),([c#33,d#44]) })
> > > > > >> > ()
> > > > > >> >
> > > > > >> > but when I executed
> > > > > >> >
> > > > > >> > C = foreach B generate flatten(b);
> > > > > >> >
> > > > > >> > dump C;
> > > > > >> >
> > > > > >> > I got the exception again...
> > > > > >> >
> > > > > >> > 2013-04-17 15:47:39,933 [Thread-26] WARN
> > > > > >> >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> > > > > >> > java.lang.Exception: java.lang.ClassCastException:
> > > > > >> > org.apache.pig.data.DataByteArray cannot be cast to
> > > > > >> > org.apache.pig.data.DataBag
> > > > > >> > at
> > > > > >>
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > > > > >> > Caused by: java.lang.ClassCastException:
> > > > > >> org.apache.pig.data.DataByteArray
> > > > > >> > cannot be cast to org.apache.pig.data.DataBag
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > > > > >> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > >> > at
> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > > > > >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > > > > >> > at
> > > > > >>
> > > >
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > > > > >> > at
> > > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > > > >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > > > > >> > at java.lang.Thread.run(Thread.java:680)
> > > > > >> >
> > > > > >> >
> > > > > >> > Best Regards,
> > > > > >> >
> > > > > >> > Jerry
> > > > > >> >
> > > > > >> >
> > > > > >> > On Wed, Apr 17, 2013 at 3:26 PM, Ruslan Al-Fakikh <
> > > > > metaruslan@gmail.com
> > > > > >> > >wrote:
> > > > > >> >
> > > > > >> > > I think that before doing the FLATTEN, you should be 100%
> sure
> > > > that
> > > > > >> your
> > > > > >> > > cast worked properly. Can you first DESCRIBE B and then
> DUMP B
> > > > right
> > > > > >> > away?
> > > > > >> > > Or probably it just can't be cast in this way. Honestly I
> > don't
> > > > know
> > > > > >> > > exactly how it works, but here:
> > > > > >> > > http://pig.apache.org/docs/r0.10.0/basic.html#cast
> > > > > >> > > I see that casting from a map to a bag should produce an
> > error.
> > > > > >> > > Hope that helps.
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam <
> > > chilinglam@gmail.com>
> > > > > >> wrote:
> > > > > >> > >
> > > > > >> > > > Hi Rusian:
> > > > > >> > > >
> > > > > >> > > > Thanks for your help. I really appreciate it. It really
> > > puzzled
> > > > > me.
> > > > > >> > > >
> > > > > >> > > > I did a "describe B", the output is "B: {b: bytearray}".
> > > > > >> > > >
> > > > > >> > > > I then tried to cast it as suggested, I got:
> > > > > >> > > > B = foreach A generate document#'b' as b:{};
> > > > > >> > > > describe B;
> > > > > >> > > > B: {b: {()}}
> > > > > >> > > >
> > > > > >> > > > Then I proceed with:
> > > > > >> > > > C = foreach B generate flatten(b);
> > > > > >> > > >
> > > > > >> > > > I got:
> > > > > >> > > > 2013-04-17 13:38:04,601 [Thread-16] WARN
> > > > > >> > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> > > > > >> > > > java.lang.Exception: java.lang.ClassCastException:
> > > > > >> > > > org.apache.pig.data.DataByteArray cannot be cast to
> > > > > >> > > > org.apache.pig.data.DataBag
> > > > > >> > > > at
> > > > > >> > >
> > > > > >>
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > > > > >> > > > Caused by: java.lang.ClassCastException:
> > > > > >> > > org.apache.pig.data.DataByteArray
> > > > > >> > > > cannot be cast to org.apache.pig.data.DataBag
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > > > > >> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > > >> > > > at
> > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > > > > >> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > > > > >> > > > at
> > > > > >> > >
> > > > > >>
> > > >
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > > > > >> > > > at
> > > > > >>
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > > > >> > > > at
> java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > > > > >> > > > at
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > > > > >> > > > at java.lang.Thread.run(Thread.java:680)
> > > > > >> > > >
> > > > > >> > > > Best Regards,
> > > > > >> > > >
> > > > > >> > > > Jerry
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <
> > > > > >> > metaruslan@gmail.com
> > > > > >> > > > >wrote:
> > > > > >> > > >
> > > > > >> > > > > Hey, and as for converting a map of tuples, probably i
> got
> > > you
> > > > > >> wrong.
> > > > > >> > > If
> > > > > >> > > > > you can get to every value manually withing FOREACH
> then I
> > > see
> > > > > no
> > > > > >> > > problem
> > > > > >> > > > > in doing so.
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <
> > > > > >> > > metaruslan@gmail.com
> > > > > >> > > > > >wrote:
> > > > > >> > > > >
> > > > > >> > > > > > I am not sure whether you can convert a map to a
> tuple.
> > > > > >> > > > > > But I am curious about one thing:
> > > > > >> > > > > > your are trying to use 'b' as a Bag, right? Because
> > > FLATTEN
> > > > > >> needs
> > > > > >> > it
> > > > > >> > > to
> > > > > >> > > > > be
> > > > > >> > > > > > a Bag I guess:
> > > > > >> > > > > > http://pig.apache.org/docs/r0.10.0/basic.html#flatten
> > > > > >> > > > > > But it seems that Pig thinks that b is a byte array:
> > > > > >> > > > > > java.lang.ClassCastException:
> > > > > org.apache.pig.data.DataByteArray
> > > > > >> > > cannot
> > > > > >> > > > be
> > > > > >> > > > > > cast to org.apache.pig.data.DataBag
> > > > > >> > > > > > Can you do this?:
> > > > > >> > > > > > DESCRIBE B
> > > > > >> > > > > >
> > > > > >> > > > > > I suppose it can look like a Bag in the output of
> DUMP,
> > > but
> > > > I
> > > > > >> think
> > > > > >> > > Pig
> > > > > >> > > > > > doesn't know it is a Bag, maybe you'll need some kind
> of
> > > > > >> explicit
> > > > > >> > > cast?
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <
> > > > > >> chilinglam@gmail.com>
> > > > > >> > > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > >> Hi Rusian,
> > > > > >> > > > > >>
> > > > > >> > > > > >> I tried to debug each step already but no luck.
> > > > > >> > > > > >> I did a dump (dump B;) after B = foreach A generate
> > > > > >> document#'b'
> > > > > >> > as
> > > > > >> > > b;
> > > > > >> > > > > >> I got {([c#11,d#22]),([c#33,d#44])}
> > > > > >> > > > > >> but it fails when I did C = foreach B generate
> > > flatten(b);
> > > > > >> > > > > >>
> > > > > >> > > > > >> I don't have controls over the input. It is passed as
> > Map
> > > > of
> > > > > >> > Maps. I
> > > > > >> > > > > guess
> > > > > >> > > > > >> it makes lookup easier using a map with keys.
> > > > > >> > > > > >>
> > > > > >> > > > > >> Can I convert map to tuple?
> > > > > >> > > > > >>
> > > > > >> > > > > >> Best Regards,
> > > > > >> > > > > >>
> > > > > >> > > > > >> Jerry
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <
> > > > > >> > > > > metaruslan@gmail.com
> > > > > >> > > > > >> >wrote:
> > > > > >> > > > > >>
> > > > > >> > > > > >> > Hi Jerry,
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > I would recommend to debug the issue step by step.
> > Just
> > > > > after
> > > > > >> > this
> > > > > >> > > > > line:
> > > > > >> > > > > >> > A = load 'data.txt' as document:[];
> > > > > >> > > > > >> > and then right after that:
> > > > > >> > > > > >> > DESCRIBE A;
> > > > > >> > > > > >> > DUMP A;
> > > > > >> > > > > >> > and so on...
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > To be honest I haven't used maps that much. Just
> > > curious,
> > > > > why
> > > > > >> > did
> > > > > >> > > > you
> > > > > >> > > > > >> > choose to use them? You can also use regular tuples
> > for
> > > > > >> storing
> > > > > >> > > the
> > > > > >> > > > > >> > relations. Also you can store the tuples with a
> > schema
> > > > > file.
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > Ruslan
> > > > > >> > > > > >> >
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <
> > > > > >> > chilinglam@gmail.com>
> > > > > >> > > > > >> wrote:
> > > > > >> > > > > >> >
> > > > > >> > > > > >> > > Hi pig users,
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > I tried to load data using PigStorage that was
> > > > previously
> > > > > >> > stored
> > > > > >> > > > > using
> > > > > >> > > > > >> > > PigStorage but it failed.
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > Each line looks like this in the data file that
> is
> > > > > >> generated
> > > > > >> > by
> > > > > >> > > > > >> > PigStorage:
> > > > > >> > > > > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > I did the following:
> > > > > >> > > > > >> > > A = load 'data.txt' as document:[];
> > > > > >> > > > > >> > > B = foreach A generate document#'b' as b;
> > > > > >> > > > > >> > > C = foreach B generate flatten(b);
> > > > > >> > > > > >> > > dump C;
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > I expect to see the following output:
> > > > > >> > > > > >> > > ([c#11,d#22])
> > > > > >> > > > > >> > > ([c#33,d#44])
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > Instead, I got:
> > > > > >> > > > > >> > > java.lang.ClassCastException:
> > > > > >> > org.apache.pig.data.DataByteArray
> > > > > >> > > > > >> cannot be
> > > > > >> > > > > >> > > cast to org.apache.pig.data.DataBag
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > Anyone encounters this problem before? How can I
> > read
> > > > the
> > > > > >> data
> > > > > >> > > > back?
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > Thanks,
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> > > Jerry
> > > > > >> > > > > >> > >
> > > > > >> > > > > >> >
> > > > > >> > > > > >>
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Jerry Lam <ch...@gmail.com>.
Hi Prashant:

Just trying to understand my mistake...
I thought "B = foreach A generate document#'b' as b:bag{};" will cast
bytearray to bag because of b:bag{}. If I understand correctly, this is not
what I thought. Am I correct?

Best Regards,

Jerry




On Thu, Apr 18, 2013 at 5:41 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> Hi Jerry,
>
> Like I mentioned in my earlier email "Map values by default are bytearrays.
> If you need them to be any other type, you would need to define it
> explicitly."
>
> Difference in the 2 statements is one does a cast to "bag" and the other is
> a bytearray (default).
>
>
>
>
>
>
> On Thu, Apr 18, 2013 at 2:14 PM, Jerry Lam <ch...@gmail.com> wrote:
>
> > Hi Prashant:
> >
> > IT WORKS! THANKS!
> > What is the difference between :
> > "B = foreach A generate (bag{})document#'b' as b;
> > and
> > B = foreach A generate document#'b' as b:bag{};"
> > ?
> >
> > The latter gives error: java.lang.ClassCastException:
> > org.apache.pig.data.DataByteArray cannot be cast to
> > org.apache.pig.data.DataBag
> >
> > Best Regards,
> >
> > Jerry
> >
> >
> > On Thu, Apr 18, 2013 at 12:34 PM, Prashant Kommireddi
> > <pr...@gmail.com>wrote:
> >
> > > Well, let me rephrase - the values all have to be the same type if you
> > > choose to read all columns in a similar way. If you know in advance its
> > > always the value associated with key 'b' that's a bag, why don't you
> cast
> > > that single value?
> > >
> > > B = foreach A generate (bag{})document#'b' as b;
> > >
> > >
> > > On Thu, Apr 18, 2013 at 7:43 AM, Jerry Lam <ch...@gmail.com>
> wrote:
> > >
> > > > Hi Prashant:
> > > >
> > > > I read about the map data type in the book "Programming Pig", it
> says:
> > > > "... By default there is no requirement that all values in a map must
> > be
> > > of
> > > > the same type. It is legitimate to have a map with two keys name and
> > age,
> > > > where the value for name is a chararray and the value for age is an
> > int.
> > > > Beginning in Pig 0.9, a map can declare its values to all be of the
> > same
> > > > type... "
> > > >
> > > > I agree that all values in the map can be of the same type but this
> is
> > > not
> > > > required in pig.
> > > >
> > > > Best Regards,
> > > >
> > > > Jerry
> > > >
> > > >
> > > > On Thu, Apr 18, 2013 at 10:37 AM, Jerry Lam <ch...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Rusian:
> > > > >
> > > > > I used PigStorage to store the data that is originally using Pig
> data
> > > > > type. It is strange (or a bug in Pig) that I cannot read the data
> > using
> > > > > PigStorage that have been stored using PigStorage, isn't it?
> > > > >
> > > > > Best Regards,
> > > > >
> > > > > Jerry
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Apr 17, 2013 at 10:52 PM, Ruslan Al-Fakikh <
> > > metaruslan@gmail.com
> > > > >wrote:
> > > > >
> > > > >> The output:
> > > > >> ({ ([c#11,d#22]),([c#33,d#44]) })
> > > > >> ()
> > > > >> looks weird.
> > > > >>
> > > > >> Jerry, maybe the problem is in using PigStorage. As its javadoc
> > says:
> > > > >>
> > > > >> A load function that parses a line of input into fields using a
> > > > character
> > > > >> delimiter
> > > > >>
> > > > >> So I guess this is just for simple csv lines.
> > > > >> But you are trying to load a complicated Map structure as it was
> > > > formatted
> > > > >> by previous storing.
> > > > >> Probably you'll need to write your own Loader for this. Another
> > hint:
> > > > >> using
> > > > >> the -schema paramenter to PigStorage, but I am not sure it can
> > help:(
> > > > >>
> > > > >> Ruslan
> > > > >>
> > > > >>
> > > > >> On Wed, Apr 17, 2013 at 11:48 PM, Jerry Lam <chilinglam@gmail.com
> >
> > > > wrote:
> > > > >>
> > > > >> > Hi Rusian:
> > > > >> >
> > > > >> > I did a describe B followed by a dump B, the output is:
> > > > >> > B: {b: {()}}
> > > > >> >
> > > > >> > ({ ([c#11,d#22]),([c#33,d#44]) })
> > > > >> > ()
> > > > >> >
> > > > >> > but when I executed
> > > > >> >
> > > > >> > C = foreach B generate flatten(b);
> > > > >> >
> > > > >> > dump C;
> > > > >> >
> > > > >> > I got the exception again...
> > > > >> >
> > > > >> > 2013-04-17 15:47:39,933 [Thread-26] WARN
> > > > >> >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> > > > >> > java.lang.Exception: java.lang.ClassCastException:
> > > > >> > org.apache.pig.data.DataByteArray cannot be cast to
> > > > >> > org.apache.pig.data.DataBag
> > > > >> > at
> > > > >>
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > > > >> > Caused by: java.lang.ClassCastException:
> > > > >> org.apache.pig.data.DataByteArray
> > > > >> > cannot be cast to org.apache.pig.data.DataBag
> > > > >> > at
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > > > >> > at
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > > > >> > at
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > > > >> > at
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > > > >> > at
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > > > >> > at
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > > > >> > at
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > > > >> > at
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > > > >> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > >> > at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > > > >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > > > >> > at
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > > > >> > at
> > > > >>
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > > > >> > at
> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > > >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > > >> > at
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > > > >> > at
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > > > >> > at java.lang.Thread.run(Thread.java:680)
> > > > >> >
> > > > >> >
> > > > >> > Best Regards,
> > > > >> >
> > > > >> > Jerry
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Apr 17, 2013 at 3:26 PM, Ruslan Al-Fakikh <
> > > > metaruslan@gmail.com
> > > > >> > >wrote:
> > > > >> >
> > > > >> > > I think that before doing the FLATTEN, you should be 100% sure
> > > that
> > > > >> your
> > > > >> > > cast worked properly. Can you first DESCRIBE B and then DUMP B
> > > right
> > > > >> > away?
> > > > >> > > Or probably it just can't be cast in this way. Honestly I
> don't
> > > know
> > > > >> > > exactly how it works, but here:
> > > > >> > > http://pig.apache.org/docs/r0.10.0/basic.html#cast
> > > > >> > > I see that casting from a map to a bag should produce an
> error.
> > > > >> > > Hope that helps.
> > > > >> > >
> > > > >> > >
> > > > >> > > On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam <
> > chilinglam@gmail.com>
> > > > >> wrote:
> > > > >> > >
> > > > >> > > > Hi Rusian:
> > > > >> > > >
> > > > >> > > > Thanks for your help. I really appreciate it. It really
> > puzzled
> > > > me.
> > > > >> > > >
> > > > >> > > > I did a "describe B", the output is "B: {b: bytearray}".
> > > > >> > > >
> > > > >> > > > I then tried to cast it as suggested, I got:
> > > > >> > > > B = foreach A generate document#'b' as b:{};
> > > > >> > > > describe B;
> > > > >> > > > B: {b: {()}}
> > > > >> > > >
> > > > >> > > > Then I proceed with:
> > > > >> > > > C = foreach B generate flatten(b);
> > > > >> > > >
> > > > >> > > > I got:
> > > > >> > > > 2013-04-17 13:38:04,601 [Thread-16] WARN
> > > > >> > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> > > > >> > > > java.lang.Exception: java.lang.ClassCastException:
> > > > >> > > > org.apache.pig.data.DataByteArray cannot be cast to
> > > > >> > > > org.apache.pig.data.DataBag
> > > > >> > > > at
> > > > >> > >
> > > > >>
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > > > >> > > > Caused by: java.lang.ClassCastException:
> > > > >> > > org.apache.pig.data.DataByteArray
> > > > >> > > > cannot be cast to org.apache.pig.data.DataBag
> > > > >> > > > at
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > > > >> > > > at
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > > > >> > > > at
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > > > >> > > > at
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > > > >> > > > at
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > > > >> > > > at
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > > > >> > > > at
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > > > >> > > > at
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > > > >> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > >> > > > at
> > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > > > >> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > > > >> > > > at
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > > > >> > > > at
> > > > >> > >
> > > > >>
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > > > >> > > > at
> > > > >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > > >> > > > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > > >> > > > at
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > > > >> > > > at
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > > > >> > > > at java.lang.Thread.run(Thread.java:680)
> > > > >> > > >
> > > > >> > > > Best Regards,
> > > > >> > > >
> > > > >> > > > Jerry
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <
> > > > >> > metaruslan@gmail.com
> > > > >> > > > >wrote:
> > > > >> > > >
> > > > >> > > > > Hey, and as for converting a map of tuples, probably i got
> > you
> > > > >> wrong.
> > > > >> > > If
> > > > >> > > > > you can get to every value manually withing FOREACH then I
> > see
> > > > no
> > > > >> > > problem
> > > > >> > > > > in doing so.
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <
> > > > >> > > metaruslan@gmail.com
> > > > >> > > > > >wrote:
> > > > >> > > > >
> > > > >> > > > > > I am not sure whether you can convert a map to a tuple.
> > > > >> > > > > > But I am curious about one thing:
> > > > >> > > > > > your are trying to use 'b' as a Bag, right? Because
> > FLATTEN
> > > > >> needs
> > > > >> > it
> > > > >> > > to
> > > > >> > > > > be
> > > > >> > > > > > a Bag I guess:
> > > > >> > > > > > http://pig.apache.org/docs/r0.10.0/basic.html#flatten
> > > > >> > > > > > But it seems that Pig thinks that b is a byte array:
> > > > >> > > > > > java.lang.ClassCastException:
> > > > org.apache.pig.data.DataByteArray
> > > > >> > > cannot
> > > > >> > > > be
> > > > >> > > > > > cast to org.apache.pig.data.DataBag
> > > > >> > > > > > Can you do this?:
> > > > >> > > > > > DESCRIBE B
> > > > >> > > > > >
> > > > >> > > > > > I suppose it can look like a Bag in the output of DUMP,
> > but
> > > I
> > > > >> think
> > > > >> > > Pig
> > > > >> > > > > > doesn't know it is a Bag, maybe you'll need some kind of
> > > > >> explicit
> > > > >> > > cast?
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <
> > > > >> chilinglam@gmail.com>
> > > > >> > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > >> Hi Rusian,
> > > > >> > > > > >>
> > > > >> > > > > >> I tried to debug each step already but no luck.
> > > > >> > > > > >> I did a dump (dump B;) after B = foreach A generate
> > > > >> document#'b'
> > > > >> > as
> > > > >> > > b;
> > > > >> > > > > >> I got {([c#11,d#22]),([c#33,d#44])}
> > > > >> > > > > >> but it fails when I did C = foreach B generate
> > flatten(b);
> > > > >> > > > > >>
> > > > >> > > > > >> I don't have controls over the input. It is passed as
> Map
> > > of
> > > > >> > Maps. I
> > > > >> > > > > guess
> > > > >> > > > > >> it makes lookup easier using a map with keys.
> > > > >> > > > > >>
> > > > >> > > > > >> Can I convert map to tuple?
> > > > >> > > > > >>
> > > > >> > > > > >> Best Regards,
> > > > >> > > > > >>
> > > > >> > > > > >> Jerry
> > > > >> > > > > >>
> > > > >> > > > > >>
> > > > >> > > > > >>
> > > > >> > > > > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <
> > > > >> > > > > metaruslan@gmail.com
> > > > >> > > > > >> >wrote:
> > > > >> > > > > >>
> > > > >> > > > > >> > Hi Jerry,
> > > > >> > > > > >> >
> > > > >> > > > > >> > I would recommend to debug the issue step by step.
> Just
> > > > after
> > > > >> > this
> > > > >> > > > > line:
> > > > >> > > > > >> > A = load 'data.txt' as document:[];
> > > > >> > > > > >> > and then right after that:
> > > > >> > > > > >> > DESCRIBE A;
> > > > >> > > > > >> > DUMP A;
> > > > >> > > > > >> > and so on...
> > > > >> > > > > >> >
> > > > >> > > > > >> > To be honest I haven't used maps that much. Just
> > curious,
> > > > why
> > > > >> > did
> > > > >> > > > you
> > > > >> > > > > >> > choose to use them? You can also use regular tuples
> for
> > > > >> storing
> > > > >> > > the
> > > > >> > > > > >> > relations. Also you can store the tuples with a
> schema
> > > > file.
> > > > >> > > > > >> >
> > > > >> > > > > >> > Ruslan
> > > > >> > > > > >> >
> > > > >> > > > > >> >
> > > > >> > > > > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <
> > > > >> > chilinglam@gmail.com>
> > > > >> > > > > >> wrote:
> > > > >> > > > > >> >
> > > > >> > > > > >> > > Hi pig users,
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > I tried to load data using PigStorage that was
> > > previously
> > > > >> > stored
> > > > >> > > > > using
> > > > >> > > > > >> > > PigStorage but it failed.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > Each line looks like this in the data file that is
> > > > >> generated
> > > > >> > by
> > > > >> > > > > >> > PigStorage:
> > > > >> > > > > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > I did the following:
> > > > >> > > > > >> > > A = load 'data.txt' as document:[];
> > > > >> > > > > >> > > B = foreach A generate document#'b' as b;
> > > > >> > > > > >> > > C = foreach B generate flatten(b);
> > > > >> > > > > >> > > dump C;
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > I expect to see the following output:
> > > > >> > > > > >> > > ([c#11,d#22])
> > > > >> > > > > >> > > ([c#33,d#44])
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > Instead, I got:
> > > > >> > > > > >> > > java.lang.ClassCastException:
> > > > >> > org.apache.pig.data.DataByteArray
> > > > >> > > > > >> cannot be
> > > > >> > > > > >> > > cast to org.apache.pig.data.DataBag
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > Anyone encounters this problem before? How can I
> read
> > > the
> > > > >> data
> > > > >> > > > back?
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > Thanks,
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > Jerry
> > > > >> > > > > >> > >
> > > > >> > > > > >> >
> > > > >> > > > > >>
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Prashant Kommireddi <pr...@gmail.com>.
Hi Jerry,

Like I mentioned in my earlier email "Map values by default are bytearrays.
If you need them to be any other type, you would need to define it
explicitly."

Difference in the 2 statements is one does a cast to "bag" and the other is
a bytearray (default).






On Thu, Apr 18, 2013 at 2:14 PM, Jerry Lam <ch...@gmail.com> wrote:

> Hi Prashant:
>
> IT WORKS! THANKS!
> What is the difference between :
> "B = foreach A generate (bag{})document#'b' as b;
> and
> B = foreach A generate document#'b' as b:bag{};"
> ?
>
> The latter gives error: java.lang.ClassCastException:
> org.apache.pig.data.DataByteArray cannot be cast to
> org.apache.pig.data.DataBag
>
> Best Regards,
>
> Jerry
>
>
> On Thu, Apr 18, 2013 at 12:34 PM, Prashant Kommireddi
> <pr...@gmail.com>wrote:
>
> > Well, let me rephrase - the values all have to be the same type if you
> > choose to read all columns in a similar way. If you know in advance its
> > always the value associated with key 'b' that's a bag, why don't you cast
> > that single value?
> >
> > B = foreach A generate (bag{})document#'b' as b;
> >
> >
> > On Thu, Apr 18, 2013 at 7:43 AM, Jerry Lam <ch...@gmail.com> wrote:
> >
> > > Hi Prashant:
> > >
> > > I read about the map data type in the book "Programming Pig", it says:
> > > "... By default there is no requirement that all values in a map must
> be
> > of
> > > the same type. It is legitimate to have a map with two keys name and
> age,
> > > where the value for name is a chararray and the value for age is an
> int.
> > > Beginning in Pig 0.9, a map can declare its values to all be of the
> same
> > > type... "
> > >
> > > I agree that all values in the map can be of the same type but this is
> > not
> > > required in pig.
> > >
> > > Best Regards,
> > >
> > > Jerry
> > >
> > >
> > > On Thu, Apr 18, 2013 at 10:37 AM, Jerry Lam <ch...@gmail.com>
> > wrote:
> > >
> > > > Hi Rusian:
> > > >
> > > > I used PigStorage to store the data that is originally using Pig data
> > > > type. It is strange (or a bug in Pig) that I cannot read the data
> using
> > > > PigStorage that have been stored using PigStorage, isn't it?
> > > >
> > > > Best Regards,
> > > >
> > > > Jerry
> > > >
> > > >
> > > >
> > > > On Wed, Apr 17, 2013 at 10:52 PM, Ruslan Al-Fakikh <
> > metaruslan@gmail.com
> > > >wrote:
> > > >
> > > >> The output:
> > > >> ({ ([c#11,d#22]),([c#33,d#44]) })
> > > >> ()
> > > >> looks weird.
> > > >>
> > > >> Jerry, maybe the problem is in using PigStorage. As its javadoc
> says:
> > > >>
> > > >> A load function that parses a line of input into fields using a
> > > character
> > > >> delimiter
> > > >>
> > > >> So I guess this is just for simple csv lines.
> > > >> But you are trying to load a complicated Map structure as it was
> > > formatted
> > > >> by previous storing.
> > > >> Probably you'll need to write your own Loader for this. Another
> hint:
> > > >> using
> > > >> the -schema paramenter to PigStorage, but I am not sure it can
> help:(
> > > >>
> > > >> Ruslan
> > > >>
> > > >>
> > > >> On Wed, Apr 17, 2013 at 11:48 PM, Jerry Lam <ch...@gmail.com>
> > > wrote:
> > > >>
> > > >> > Hi Rusian:
> > > >> >
> > > >> > I did a describe B followed by a dump B, the output is:
> > > >> > B: {b: {()}}
> > > >> >
> > > >> > ({ ([c#11,d#22]),([c#33,d#44]) })
> > > >> > ()
> > > >> >
> > > >> > but when I executed
> > > >> >
> > > >> > C = foreach B generate flatten(b);
> > > >> >
> > > >> > dump C;
> > > >> >
> > > >> > I got the exception again...
> > > >> >
> > > >> > 2013-04-17 15:47:39,933 [Thread-26] WARN
> > > >> >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> > > >> > java.lang.Exception: java.lang.ClassCastException:
> > > >> > org.apache.pig.data.DataByteArray cannot be cast to
> > > >> > org.apache.pig.data.DataBag
> > > >> > at
> > > >>
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > > >> > Caused by: java.lang.ClassCastException:
> > > >> org.apache.pig.data.DataByteArray
> > > >> > cannot be cast to org.apache.pig.data.DataBag
> > > >> > at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > > >> > at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > > >> > at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > > >> > at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > > >> > at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > > >> > at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > > >> > at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > > >> > at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > > >> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > >> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > > >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > > >> > at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > > >> > at
> > > >>
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > > >> > at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > >> > at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > > >> > at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > > >> > at java.lang.Thread.run(Thread.java:680)
> > > >> >
> > > >> >
> > > >> > Best Regards,
> > > >> >
> > > >> > Jerry
> > > >> >
> > > >> >
> > > >> > On Wed, Apr 17, 2013 at 3:26 PM, Ruslan Al-Fakikh <
> > > metaruslan@gmail.com
> > > >> > >wrote:
> > > >> >
> > > >> > > I think that before doing the FLATTEN, you should be 100% sure
> > that
> > > >> your
> > > >> > > cast worked properly. Can you first DESCRIBE B and then DUMP B
> > right
> > > >> > away?
> > > >> > > Or probably it just can't be cast in this way. Honestly I don't
> > know
> > > >> > > exactly how it works, but here:
> > > >> > > http://pig.apache.org/docs/r0.10.0/basic.html#cast
> > > >> > > I see that casting from a map to a bag should produce an error.
> > > >> > > Hope that helps.
> > > >> > >
> > > >> > >
> > > >> > > On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam <
> chilinglam@gmail.com>
> > > >> wrote:
> > > >> > >
> > > >> > > > Hi Rusian:
> > > >> > > >
> > > >> > > > Thanks for your help. I really appreciate it. It really
> puzzled
> > > me.
> > > >> > > >
> > > >> > > > I did a "describe B", the output is "B: {b: bytearray}".
> > > >> > > >
> > > >> > > > I then tried to cast it as suggested, I got:
> > > >> > > > B = foreach A generate document#'b' as b:{};
> > > >> > > > describe B;
> > > >> > > > B: {b: {()}}
> > > >> > > >
> > > >> > > > Then I proceed with:
> > > >> > > > C = foreach B generate flatten(b);
> > > >> > > >
> > > >> > > > I got:
> > > >> > > > 2013-04-17 13:38:04,601 [Thread-16] WARN
> > > >> > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> > > >> > > > java.lang.Exception: java.lang.ClassCastException:
> > > >> > > > org.apache.pig.data.DataByteArray cannot be cast to
> > > >> > > > org.apache.pig.data.DataBag
> > > >> > > > at
> > > >> > >
> > > >>
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > > >> > > > Caused by: java.lang.ClassCastException:
> > > >> > > org.apache.pig.data.DataByteArray
> > > >> > > > cannot be cast to org.apache.pig.data.DataBag
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > > >> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > >> > > > at
> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > > >> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > > >> > > > at
> > > >> > >
> > > >>
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > > >> > > > at
> > > >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > >> > > > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > > >> > > > at java.lang.Thread.run(Thread.java:680)
> > > >> > > >
> > > >> > > > Best Regards,
> > > >> > > >
> > > >> > > > Jerry
> > > >> > > >
> > > >> > > >
> > > >> > > > On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <
> > > >> > metaruslan@gmail.com
> > > >> > > > >wrote:
> > > >> > > >
> > > >> > > > > Hey, and as for converting a map of tuples, probably i got
> you
> > > >> wrong.
> > > >> > > If
> > > >> > > > > you can get to every value manually withing FOREACH then I
> see
> > > no
> > > >> > > problem
> > > >> > > > > in doing so.
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <
> > > >> > > metaruslan@gmail.com
> > > >> > > > > >wrote:
> > > >> > > > >
> > > >> > > > > > I am not sure whether you can convert a map to a tuple.
> > > >> > > > > > But I am curious about one thing:
> > > >> > > > > > your are trying to use 'b' as a Bag, right? Because
> FLATTEN
> > > >> needs
> > > >> > it
> > > >> > > to
> > > >> > > > > be
> > > >> > > > > > a Bag I guess:
> > > >> > > > > > http://pig.apache.org/docs/r0.10.0/basic.html#flatten
> > > >> > > > > > But it seems that Pig thinks that b is a byte array:
> > > >> > > > > > java.lang.ClassCastException:
> > > org.apache.pig.data.DataByteArray
> > > >> > > cannot
> > > >> > > > be
> > > >> > > > > > cast to org.apache.pig.data.DataBag
> > > >> > > > > > Can you do this?:
> > > >> > > > > > DESCRIBE B
> > > >> > > > > >
> > > >> > > > > > I suppose it can look like a Bag in the output of DUMP,
> but
> > I
> > > >> think
> > > >> > > Pig
> > > >> > > > > > doesn't know it is a Bag, maybe you'll need some kind of
> > > >> explicit
> > > >> > > cast?
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <
> > > >> chilinglam@gmail.com>
> > > >> > > > wrote:
> > > >> > > > > >
> > > >> > > > > >> Hi Rusian,
> > > >> > > > > >>
> > > >> > > > > >> I tried to debug each step already but no luck.
> > > >> > > > > >> I did a dump (dump B;) after B = foreach A generate
> > > >> document#'b'
> > > >> > as
> > > >> > > b;
> > > >> > > > > >> I got {([c#11,d#22]),([c#33,d#44])}
> > > >> > > > > >> but it fails when I did C = foreach B generate
> flatten(b);
> > > >> > > > > >>
> > > >> > > > > >> I don't have controls over the input. It is passed as Map
> > of
> > > >> > Maps. I
> > > >> > > > > guess
> > > >> > > > > >> it makes lookup easier using a map with keys.
> > > >> > > > > >>
> > > >> > > > > >> Can I convert map to tuple?
> > > >> > > > > >>
> > > >> > > > > >> Best Regards,
> > > >> > > > > >>
> > > >> > > > > >> Jerry
> > > >> > > > > >>
> > > >> > > > > >>
> > > >> > > > > >>
> > > >> > > > > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <
> > > >> > > > > metaruslan@gmail.com
> > > >> > > > > >> >wrote:
> > > >> > > > > >>
> > > >> > > > > >> > Hi Jerry,
> > > >> > > > > >> >
> > > >> > > > > >> > I would recommend to debug the issue step by step. Just
> > > after
> > > >> > this
> > > >> > > > > line:
> > > >> > > > > >> > A = load 'data.txt' as document:[];
> > > >> > > > > >> > and then right after that:
> > > >> > > > > >> > DESCRIBE A;
> > > >> > > > > >> > DUMP A;
> > > >> > > > > >> > and so on...
> > > >> > > > > >> >
> > > >> > > > > >> > To be honest I haven't used maps that much. Just
> curious,
> > > why
> > > >> > did
> > > >> > > > you
> > > >> > > > > >> > choose to use them? You can also use regular tuples for
> > > >> storing
> > > >> > > the
> > > >> > > > > >> > relations. Also you can store the tuples with a schema
> > > file.
> > > >> > > > > >> >
> > > >> > > > > >> > Ruslan
> > > >> > > > > >> >
> > > >> > > > > >> >
> > > >> > > > > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <
> > > >> > chilinglam@gmail.com>
> > > >> > > > > >> wrote:
> > > >> > > > > >> >
> > > >> > > > > >> > > Hi pig users,
> > > >> > > > > >> > >
> > > >> > > > > >> > > I tried to load data using PigStorage that was
> > previously
> > > >> > stored
> > > >> > > > > using
> > > >> > > > > >> > > PigStorage but it failed.
> > > >> > > > > >> > >
> > > >> > > > > >> > > Each line looks like this in the data file that is
> > > >> generated
> > > >> > by
> > > >> > > > > >> > PigStorage:
> > > >> > > > > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
> > > >> > > > > >> > >
> > > >> > > > > >> > > I did the following:
> > > >> > > > > >> > > A = load 'data.txt' as document:[];
> > > >> > > > > >> > > B = foreach A generate document#'b' as b;
> > > >> > > > > >> > > C = foreach B generate flatten(b);
> > > >> > > > > >> > > dump C;
> > > >> > > > > >> > >
> > > >> > > > > >> > > I expect to see the following output:
> > > >> > > > > >> > > ([c#11,d#22])
> > > >> > > > > >> > > ([c#33,d#44])
> > > >> > > > > >> > >
> > > >> > > > > >> > > Instead, I got:
> > > >> > > > > >> > > java.lang.ClassCastException:
> > > >> > org.apache.pig.data.DataByteArray
> > > >> > > > > >> cannot be
> > > >> > > > > >> > > cast to org.apache.pig.data.DataBag
> > > >> > > > > >> > >
> > > >> > > > > >> > > Anyone encounters this problem before? How can I read
> > the
> > > >> data
> > > >> > > > back?
> > > >> > > > > >> > >
> > > >> > > > > >> > > Thanks,
> > > >> > > > > >> > >
> > > >> > > > > >> > > Jerry
> > > >> > > > > >> > >
> > > >> > > > > >> >
> > > >> > > > > >>
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Jerry Lam <ch...@gmail.com>.
Hi Prashant:

IT WORKS! THANKS!
What is the difference between :
"B = foreach A generate (bag{})document#'b' as b;
and
B = foreach A generate document#'b' as b:bag{};"
?

The latter gives error: java.lang.ClassCastException:
org.apache.pig.data.DataByteArray cannot be cast to
org.apache.pig.data.DataBag

Best Regards,

Jerry


On Thu, Apr 18, 2013 at 12:34 PM, Prashant Kommireddi
<pr...@gmail.com>wrote:

> Well, let me rephrase - the values all have to be the same type if you
> choose to read all columns in a similar way. If you know in advance its
> always the value associated with key 'b' that's a bag, why don't you cast
> that single value?
>
> B = foreach A generate (bag{})document#'b' as b;
>
>
> On Thu, Apr 18, 2013 at 7:43 AM, Jerry Lam <ch...@gmail.com> wrote:
>
> > Hi Prashant:
> >
> > I read about the map data type in the book "Programming Pig", it says:
> > "... By default there is no requirement that all values in a map must be
> of
> > the same type. It is legitimate to have a map with two keys name and age,
> > where the value for name is a chararray and the value for age is an int.
> > Beginning in Pig 0.9, a map can declare its values to all be of the same
> > type... "
> >
> > I agree that all values in the map can be of the same type but this is
> not
> > required in pig.
> >
> > Best Regards,
> >
> > Jerry
> >
> >
> > On Thu, Apr 18, 2013 at 10:37 AM, Jerry Lam <ch...@gmail.com>
> wrote:
> >
> > > Hi Rusian:
> > >
> > > I used PigStorage to store the data that is originally using Pig data
> > > type. It is strange (or a bug in Pig) that I cannot read the data using
> > > PigStorage that have been stored using PigStorage, isn't it?
> > >
> > > Best Regards,
> > >
> > > Jerry
> > >
> > >
> > >
> > > On Wed, Apr 17, 2013 at 10:52 PM, Ruslan Al-Fakikh <
> metaruslan@gmail.com
> > >wrote:
> > >
> > >> The output:
> > >> ({ ([c#11,d#22]),([c#33,d#44]) })
> > >> ()
> > >> looks weird.
> > >>
> > >> Jerry, maybe the problem is in using PigStorage. As its javadoc says:
> > >>
> > >> A load function that parses a line of input into fields using a
> > character
> > >> delimiter
> > >>
> > >> So I guess this is just for simple csv lines.
> > >> But you are trying to load a complicated Map structure as it was
> > formatted
> > >> by previous storing.
> > >> Probably you'll need to write your own Loader for this. Another hint:
> > >> using
> > >> the -schema paramenter to PigStorage, but I am not sure it can help:(
> > >>
> > >> Ruslan
> > >>
> > >>
> > >> On Wed, Apr 17, 2013 at 11:48 PM, Jerry Lam <ch...@gmail.com>
> > wrote:
> > >>
> > >> > Hi Rusian:
> > >> >
> > >> > I did a describe B followed by a dump B, the output is:
> > >> > B: {b: {()}}
> > >> >
> > >> > ({ ([c#11,d#22]),([c#33,d#44]) })
> > >> > ()
> > >> >
> > >> > but when I executed
> > >> >
> > >> > C = foreach B generate flatten(b);
> > >> >
> > >> > dump C;
> > >> >
> > >> > I got the exception again...
> > >> >
> > >> > 2013-04-17 15:47:39,933 [Thread-26] WARN
> > >> >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> > >> > java.lang.Exception: java.lang.ClassCastException:
> > >> > org.apache.pig.data.DataByteArray cannot be cast to
> > >> > org.apache.pig.data.DataBag
> > >> > at
> > >>
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > >> > Caused by: java.lang.ClassCastException:
> > >> org.apache.pig.data.DataByteArray
> > >> > cannot be cast to org.apache.pig.data.DataBag
> > >> > at
> > >> >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > >> > at
> > >> >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > >> > at
> > >> >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > >> > at
> > >> >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > >> > at
> > >> >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > >> > at
> > >> >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > >> > at
> > >> >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > >> > at
> > >> >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > >> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > >> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > >> > at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > >> > at
> > >>
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > >> > at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > >> > at
> > >> >
> > >> >
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > >> > at
> > >> >
> > >> >
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > >> > at java.lang.Thread.run(Thread.java:680)
> > >> >
> > >> >
> > >> > Best Regards,
> > >> >
> > >> > Jerry
> > >> >
> > >> >
> > >> > On Wed, Apr 17, 2013 at 3:26 PM, Ruslan Al-Fakikh <
> > metaruslan@gmail.com
> > >> > >wrote:
> > >> >
> > >> > > I think that before doing the FLATTEN, you should be 100% sure
> that
> > >> your
> > >> > > cast worked properly. Can you first DESCRIBE B and then DUMP B
> right
> > >> > away?
> > >> > > Or probably it just can't be cast in this way. Honestly I don't
> know
> > >> > > exactly how it works, but here:
> > >> > > http://pig.apache.org/docs/r0.10.0/basic.html#cast
> > >> > > I see that casting from a map to a bag should produce an error.
> > >> > > Hope that helps.
> > >> > >
> > >> > >
> > >> > > On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam <ch...@gmail.com>
> > >> wrote:
> > >> > >
> > >> > > > Hi Rusian:
> > >> > > >
> > >> > > > Thanks for your help. I really appreciate it. It really puzzled
> > me.
> > >> > > >
> > >> > > > I did a "describe B", the output is "B: {b: bytearray}".
> > >> > > >
> > >> > > > I then tried to cast it as suggested, I got:
> > >> > > > B = foreach A generate document#'b' as b:{};
> > >> > > > describe B;
> > >> > > > B: {b: {()}}
> > >> > > >
> > >> > > > Then I proceed with:
> > >> > > > C = foreach B generate flatten(b);
> > >> > > >
> > >> > > > I got:
> > >> > > > 2013-04-17 13:38:04,601 [Thread-16] WARN
> > >> > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> > >> > > > java.lang.Exception: java.lang.ClassCastException:
> > >> > > > org.apache.pig.data.DataByteArray cannot be cast to
> > >> > > > org.apache.pig.data.DataBag
> > >> > > > at
> > >> > >
> > >>
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > >> > > > Caused by: java.lang.ClassCastException:
> > >> > > org.apache.pig.data.DataByteArray
> > >> > > > cannot be cast to org.apache.pig.data.DataBag
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > >> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > >> > > > at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > >> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > >> > > > at
> > >> > >
> > >>
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > >> > > > at
> > >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > >> > > > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > >> > > > at java.lang.Thread.run(Thread.java:680)
> > >> > > >
> > >> > > > Best Regards,
> > >> > > >
> > >> > > > Jerry
> > >> > > >
> > >> > > >
> > >> > > > On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <
> > >> > metaruslan@gmail.com
> > >> > > > >wrote:
> > >> > > >
> > >> > > > > Hey, and as for converting a map of tuples, probably i got you
> > >> wrong.
> > >> > > If
> > >> > > > > you can get to every value manually withing FOREACH then I see
> > no
> > >> > > problem
> > >> > > > > in doing so.
> > >> > > > >
> > >> > > > >
> > >> > > > > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <
> > >> > > metaruslan@gmail.com
> > >> > > > > >wrote:
> > >> > > > >
> > >> > > > > > I am not sure whether you can convert a map to a tuple.
> > >> > > > > > But I am curious about one thing:
> > >> > > > > > your are trying to use 'b' as a Bag, right? Because FLATTEN
> > >> needs
> > >> > it
> > >> > > to
> > >> > > > > be
> > >> > > > > > a Bag I guess:
> > >> > > > > > http://pig.apache.org/docs/r0.10.0/basic.html#flatten
> > >> > > > > > But it seems that Pig thinks that b is a byte array:
> > >> > > > > > java.lang.ClassCastException:
> > org.apache.pig.data.DataByteArray
> > >> > > cannot
> > >> > > > be
> > >> > > > > > cast to org.apache.pig.data.DataBag
> > >> > > > > > Can you do this?:
> > >> > > > > > DESCRIBE B
> > >> > > > > >
> > >> > > > > > I suppose it can look like a Bag in the output of DUMP, but
> I
> > >> think
> > >> > > Pig
> > >> > > > > > doesn't know it is a Bag, maybe you'll need some kind of
> > >> explicit
> > >> > > cast?
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <
> > >> chilinglam@gmail.com>
> > >> > > > wrote:
> > >> > > > > >
> > >> > > > > >> Hi Rusian,
> > >> > > > > >>
> > >> > > > > >> I tried to debug each step already but no luck.
> > >> > > > > >> I did a dump (dump B;) after B = foreach A generate
> > >> document#'b'
> > >> > as
> > >> > > b;
> > >> > > > > >> I got {([c#11,d#22]),([c#33,d#44])}
> > >> > > > > >> but it fails when I did C = foreach B generate flatten(b);
> > >> > > > > >>
> > >> > > > > >> I don't have controls over the input. It is passed as Map
> of
> > >> > Maps. I
> > >> > > > > guess
> > >> > > > > >> it makes lookup easier using a map with keys.
> > >> > > > > >>
> > >> > > > > >> Can I convert map to tuple?
> > >> > > > > >>
> > >> > > > > >> Best Regards,
> > >> > > > > >>
> > >> > > > > >> Jerry
> > >> > > > > >>
> > >> > > > > >>
> > >> > > > > >>
> > >> > > > > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <
> > >> > > > > metaruslan@gmail.com
> > >> > > > > >> >wrote:
> > >> > > > > >>
> > >> > > > > >> > Hi Jerry,
> > >> > > > > >> >
> > >> > > > > >> > I would recommend to debug the issue step by step. Just
> > after
> > >> > this
> > >> > > > > line:
> > >> > > > > >> > A = load 'data.txt' as document:[];
> > >> > > > > >> > and then right after that:
> > >> > > > > >> > DESCRIBE A;
> > >> > > > > >> > DUMP A;
> > >> > > > > >> > and so on...
> > >> > > > > >> >
> > >> > > > > >> > To be honest I haven't used maps that much. Just curious,
> > why
> > >> > did
> > >> > > > you
> > >> > > > > >> > choose to use them? You can also use regular tuples for
> > >> storing
> > >> > > the
> > >> > > > > >> > relations. Also you can store the tuples with a schema
> > file.
> > >> > > > > >> >
> > >> > > > > >> > Ruslan
> > >> > > > > >> >
> > >> > > > > >> >
> > >> > > > > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <
> > >> > chilinglam@gmail.com>
> > >> > > > > >> wrote:
> > >> > > > > >> >
> > >> > > > > >> > > Hi pig users,
> > >> > > > > >> > >
> > >> > > > > >> > > I tried to load data using PigStorage that was
> previously
> > >> > stored
> > >> > > > > using
> > >> > > > > >> > > PigStorage but it failed.
> > >> > > > > >> > >
> > >> > > > > >> > > Each line looks like this in the data file that is
> > >> generated
> > >> > by
> > >> > > > > >> > PigStorage:
> > >> > > > > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
> > >> > > > > >> > >
> > >> > > > > >> > > I did the following:
> > >> > > > > >> > > A = load 'data.txt' as document:[];
> > >> > > > > >> > > B = foreach A generate document#'b' as b;
> > >> > > > > >> > > C = foreach B generate flatten(b);
> > >> > > > > >> > > dump C;
> > >> > > > > >> > >
> > >> > > > > >> > > I expect to see the following output:
> > >> > > > > >> > > ([c#11,d#22])
> > >> > > > > >> > > ([c#33,d#44])
> > >> > > > > >> > >
> > >> > > > > >> > > Instead, I got:
> > >> > > > > >> > > java.lang.ClassCastException:
> > >> > org.apache.pig.data.DataByteArray
> > >> > > > > >> cannot be
> > >> > > > > >> > > cast to org.apache.pig.data.DataBag
> > >> > > > > >> > >
> > >> > > > > >> > > Anyone encounters this problem before? How can I read
> the
> > >> data
> > >> > > > back?
> > >> > > > > >> > >
> > >> > > > > >> > > Thanks,
> > >> > > > > >> > >
> > >> > > > > >> > > Jerry
> > >> > > > > >> > >
> > >> > > > > >> >
> > >> > > > > >>
> > >> > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Prashant Kommireddi <pr...@gmail.com>.
Well, let me rephrase - the values all have to be the same type if you
choose to read all columns in a similar way. If you know in advance its
always the value associated with key 'b' that's a bag, why don't you cast
that single value?

B = foreach A generate (bag{})document#'b' as b;


On Thu, Apr 18, 2013 at 7:43 AM, Jerry Lam <ch...@gmail.com> wrote:

> Hi Prashant:
>
> I read about the map data type in the book "Programming Pig", it says:
> "... By default there is no requirement that all values in a map must be of
> the same type. It is legitimate to have a map with two keys name and age,
> where the value for name is a chararray and the value for age is an int.
> Beginning in Pig 0.9, a map can declare its values to all be of the same
> type... "
>
> I agree that all values in the map can be of the same type but this is not
> required in pig.
>
> Best Regards,
>
> Jerry
>
>
> On Thu, Apr 18, 2013 at 10:37 AM, Jerry Lam <ch...@gmail.com> wrote:
>
> > Hi Rusian:
> >
> > I used PigStorage to store the data that is originally using Pig data
> > type. It is strange (or a bug in Pig) that I cannot read the data using
> > PigStorage that have been stored using PigStorage, isn't it?
> >
> > Best Regards,
> >
> > Jerry
> >
> >
> >
> > On Wed, Apr 17, 2013 at 10:52 PM, Ruslan Al-Fakikh <metaruslan@gmail.com
> >wrote:
> >
> >> The output:
> >> ({ ([c#11,d#22]),([c#33,d#44]) })
> >> ()
> >> looks weird.
> >>
> >> Jerry, maybe the problem is in using PigStorage. As its javadoc says:
> >>
> >> A load function that parses a line of input into fields using a
> character
> >> delimiter
> >>
> >> So I guess this is just for simple csv lines.
> >> But you are trying to load a complicated Map structure as it was
> formatted
> >> by previous storing.
> >> Probably you'll need to write your own Loader for this. Another hint:
> >> using
> >> the -schema paramenter to PigStorage, but I am not sure it can help:(
> >>
> >> Ruslan
> >>
> >>
> >> On Wed, Apr 17, 2013 at 11:48 PM, Jerry Lam <ch...@gmail.com>
> wrote:
> >>
> >> > Hi Rusian:
> >> >
> >> > I did a describe B followed by a dump B, the output is:
> >> > B: {b: {()}}
> >> >
> >> > ({ ([c#11,d#22]),([c#33,d#44]) })
> >> > ()
> >> >
> >> > but when I executed
> >> >
> >> > C = foreach B generate flatten(b);
> >> >
> >> > dump C;
> >> >
> >> > I got the exception again...
> >> >
> >> > 2013-04-17 15:47:39,933 [Thread-26] WARN
> >> >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> >> > java.lang.Exception: java.lang.ClassCastException:
> >> > org.apache.pig.data.DataByteArray cannot be cast to
> >> > org.apache.pig.data.DataBag
> >> > at
> >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> >> > Caused by: java.lang.ClassCastException:
> >> org.apache.pig.data.DataByteArray
> >> > cannot be cast to org.apache.pig.data.DataBag
> >> > at
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> >> > at
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> >> > at
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> >> > at
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> >> > at
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> >> > at
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> >> > at
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> >> > at
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> >> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> >> > at
> >> >
> >> >
> >>
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> >> > at
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> >> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >> > at
> >> >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> >> > at
> >> >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> >> > at java.lang.Thread.run(Thread.java:680)
> >> >
> >> >
> >> > Best Regards,
> >> >
> >> > Jerry
> >> >
> >> >
> >> > On Wed, Apr 17, 2013 at 3:26 PM, Ruslan Al-Fakikh <
> metaruslan@gmail.com
> >> > >wrote:
> >> >
> >> > > I think that before doing the FLATTEN, you should be 100% sure that
> >> your
> >> > > cast worked properly. Can you first DESCRIBE B and then DUMP B right
> >> > away?
> >> > > Or probably it just can't be cast in this way. Honestly I don't know
> >> > > exactly how it works, but here:
> >> > > http://pig.apache.org/docs/r0.10.0/basic.html#cast
> >> > > I see that casting from a map to a bag should produce an error.
> >> > > Hope that helps.
> >> > >
> >> > >
> >> > > On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam <ch...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Hi Rusian:
> >> > > >
> >> > > > Thanks for your help. I really appreciate it. It really puzzled
> me.
> >> > > >
> >> > > > I did a "describe B", the output is "B: {b: bytearray}".
> >> > > >
> >> > > > I then tried to cast it as suggested, I got:
> >> > > > B = foreach A generate document#'b' as b:{};
> >> > > > describe B;
> >> > > > B: {b: {()}}
> >> > > >
> >> > > > Then I proceed with:
> >> > > > C = foreach B generate flatten(b);
> >> > > >
> >> > > > I got:
> >> > > > 2013-04-17 13:38:04,601 [Thread-16] WARN
> >> > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> >> > > > java.lang.Exception: java.lang.ClassCastException:
> >> > > > org.apache.pig.data.DataByteArray cannot be cast to
> >> > > > org.apache.pig.data.DataBag
> >> > > > at
> >> > >
> >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> >> > > > Caused by: java.lang.ClassCastException:
> >> > > org.apache.pig.data.DataByteArray
> >> > > > cannot be cast to org.apache.pig.data.DataBag
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> >> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >> > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> >> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> >> > > > at
> >> > >
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> >> > > > at
> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >> > > > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> >> > > > at java.lang.Thread.run(Thread.java:680)
> >> > > >
> >> > > > Best Regards,
> >> > > >
> >> > > > Jerry
> >> > > >
> >> > > >
> >> > > > On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <
> >> > metaruslan@gmail.com
> >> > > > >wrote:
> >> > > >
> >> > > > > Hey, and as for converting a map of tuples, probably i got you
> >> wrong.
> >> > > If
> >> > > > > you can get to every value manually withing FOREACH then I see
> no
> >> > > problem
> >> > > > > in doing so.
> >> > > > >
> >> > > > >
> >> > > > > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <
> >> > > metaruslan@gmail.com
> >> > > > > >wrote:
> >> > > > >
> >> > > > > > I am not sure whether you can convert a map to a tuple.
> >> > > > > > But I am curious about one thing:
> >> > > > > > your are trying to use 'b' as a Bag, right? Because FLATTEN
> >> needs
> >> > it
> >> > > to
> >> > > > > be
> >> > > > > > a Bag I guess:
> >> > > > > > http://pig.apache.org/docs/r0.10.0/basic.html#flatten
> >> > > > > > But it seems that Pig thinks that b is a byte array:
> >> > > > > > java.lang.ClassCastException:
> org.apache.pig.data.DataByteArray
> >> > > cannot
> >> > > > be
> >> > > > > > cast to org.apache.pig.data.DataBag
> >> > > > > > Can you do this?:
> >> > > > > > DESCRIBE B
> >> > > > > >
> >> > > > > > I suppose it can look like a Bag in the output of DUMP, but I
> >> think
> >> > > Pig
> >> > > > > > doesn't know it is a Bag, maybe you'll need some kind of
> >> explicit
> >> > > cast?
> >> > > > > >
> >> > > > > >
> >> > > > > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <
> >> chilinglam@gmail.com>
> >> > > > wrote:
> >> > > > > >
> >> > > > > >> Hi Rusian,
> >> > > > > >>
> >> > > > > >> I tried to debug each step already but no luck.
> >> > > > > >> I did a dump (dump B;) after B = foreach A generate
> >> document#'b'
> >> > as
> >> > > b;
> >> > > > > >> I got {([c#11,d#22]),([c#33,d#44])}
> >> > > > > >> but it fails when I did C = foreach B generate flatten(b);
> >> > > > > >>
> >> > > > > >> I don't have controls over the input. It is passed as Map of
> >> > Maps. I
> >> > > > > guess
> >> > > > > >> it makes lookup easier using a map with keys.
> >> > > > > >>
> >> > > > > >> Can I convert map to tuple?
> >> > > > > >>
> >> > > > > >> Best Regards,
> >> > > > > >>
> >> > > > > >> Jerry
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <
> >> > > > > metaruslan@gmail.com
> >> > > > > >> >wrote:
> >> > > > > >>
> >> > > > > >> > Hi Jerry,
> >> > > > > >> >
> >> > > > > >> > I would recommend to debug the issue step by step. Just
> after
> >> > this
> >> > > > > line:
> >> > > > > >> > A = load 'data.txt' as document:[];
> >> > > > > >> > and then right after that:
> >> > > > > >> > DESCRIBE A;
> >> > > > > >> > DUMP A;
> >> > > > > >> > and so on...
> >> > > > > >> >
> >> > > > > >> > To be honest I haven't used maps that much. Just curious,
> why
> >> > did
> >> > > > you
> >> > > > > >> > choose to use them? You can also use regular tuples for
> >> storing
> >> > > the
> >> > > > > >> > relations. Also you can store the tuples with a schema
> file.
> >> > > > > >> >
> >> > > > > >> > Ruslan
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <
> >> > chilinglam@gmail.com>
> >> > > > > >> wrote:
> >> > > > > >> >
> >> > > > > >> > > Hi pig users,
> >> > > > > >> > >
> >> > > > > >> > > I tried to load data using PigStorage that was previously
> >> > stored
> >> > > > > using
> >> > > > > >> > > PigStorage but it failed.
> >> > > > > >> > >
> >> > > > > >> > > Each line looks like this in the data file that is
> >> generated
> >> > by
> >> > > > > >> > PigStorage:
> >> > > > > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
> >> > > > > >> > >
> >> > > > > >> > > I did the following:
> >> > > > > >> > > A = load 'data.txt' as document:[];
> >> > > > > >> > > B = foreach A generate document#'b' as b;
> >> > > > > >> > > C = foreach B generate flatten(b);
> >> > > > > >> > > dump C;
> >> > > > > >> > >
> >> > > > > >> > > I expect to see the following output:
> >> > > > > >> > > ([c#11,d#22])
> >> > > > > >> > > ([c#33,d#44])
> >> > > > > >> > >
> >> > > > > >> > > Instead, I got:
> >> > > > > >> > > java.lang.ClassCastException:
> >> > org.apache.pig.data.DataByteArray
> >> > > > > >> cannot be
> >> > > > > >> > > cast to org.apache.pig.data.DataBag
> >> > > > > >> > >
> >> > > > > >> > > Anyone encounters this problem before? How can I read the
> >> data
> >> > > > back?
> >> > > > > >> > >
> >> > > > > >> > > Thanks,
> >> > > > > >> > >
> >> > > > > >> > > Jerry
> >> > > > > >> > >
> >> > > > > >> >
> >> > > > > >>
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Jerry Lam <ch...@gmail.com>.
Hi Prashant:

I read about the map data type in the book "Programming Pig", it says:
"... By default there is no requirement that all values in a map must be of
the same type. It is legitimate to have a map with two keys name and age,
where the value for name is a chararray and the value for age is an int.
Beginning in Pig 0.9, a map can declare its values to all be of the same
type... "

I agree that all values in the map can be of the same type but this is not
required in pig.

Best Regards,

Jerry


On Thu, Apr 18, 2013 at 10:37 AM, Jerry Lam <ch...@gmail.com> wrote:

> Hi Rusian:
>
> I used PigStorage to store the data that is originally using Pig data
> type. It is strange (or a bug in Pig) that I cannot read the data using
> PigStorage that have been stored using PigStorage, isn't it?
>
> Best Regards,
>
> Jerry
>
>
>
> On Wed, Apr 17, 2013 at 10:52 PM, Ruslan Al-Fakikh <me...@gmail.com>wrote:
>
>> The output:
>> ({ ([c#11,d#22]),([c#33,d#44]) })
>> ()
>> looks weird.
>>
>> Jerry, maybe the problem is in using PigStorage. As its javadoc says:
>>
>> A load function that parses a line of input into fields using a character
>> delimiter
>>
>> So I guess this is just for simple csv lines.
>> But you are trying to load a complicated Map structure as it was formatted
>> by previous storing.
>> Probably you'll need to write your own Loader for this. Another hint:
>> using
>> the -schema paramenter to PigStorage, but I am not sure it can help:(
>>
>> Ruslan
>>
>>
>> On Wed, Apr 17, 2013 at 11:48 PM, Jerry Lam <ch...@gmail.com> wrote:
>>
>> > Hi Rusian:
>> >
>> > I did a describe B followed by a dump B, the output is:
>> > B: {b: {()}}
>> >
>> > ({ ([c#11,d#22]),([c#33,d#44]) })
>> > ()
>> >
>> > but when I executed
>> >
>> > C = foreach B generate flatten(b);
>> >
>> > dump C;
>> >
>> > I got the exception again...
>> >
>> > 2013-04-17 15:47:39,933 [Thread-26] WARN
>> >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
>> > java.lang.Exception: java.lang.ClassCastException:
>> > org.apache.pig.data.DataByteArray cannot be cast to
>> > org.apache.pig.data.DataBag
>> > at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
>> > Caused by: java.lang.ClassCastException:
>> org.apache.pig.data.DataByteArray
>> > cannot be cast to org.apache.pig.data.DataBag
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>> > at
>> >
>> >
>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> > at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> > at java.lang.Thread.run(Thread.java:680)
>> >
>> >
>> > Best Regards,
>> >
>> > Jerry
>> >
>> >
>> > On Wed, Apr 17, 2013 at 3:26 PM, Ruslan Al-Fakikh <metaruslan@gmail.com
>> > >wrote:
>> >
>> > > I think that before doing the FLATTEN, you should be 100% sure that
>> your
>> > > cast worked properly. Can you first DESCRIBE B and then DUMP B right
>> > away?
>> > > Or probably it just can't be cast in this way. Honestly I don't know
>> > > exactly how it works, but here:
>> > > http://pig.apache.org/docs/r0.10.0/basic.html#cast
>> > > I see that casting from a map to a bag should produce an error.
>> > > Hope that helps.
>> > >
>> > >
>> > > On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam <ch...@gmail.com>
>> wrote:
>> > >
>> > > > Hi Rusian:
>> > > >
>> > > > Thanks for your help. I really appreciate it. It really puzzled me.
>> > > >
>> > > > I did a "describe B", the output is "B: {b: bytearray}".
>> > > >
>> > > > I then tried to cast it as suggested, I got:
>> > > > B = foreach A generate document#'b' as b:{};
>> > > > describe B;
>> > > > B: {b: {()}}
>> > > >
>> > > > Then I proceed with:
>> > > > C = foreach B generate flatten(b);
>> > > >
>> > > > I got:
>> > > > 2013-04-17 13:38:04,601 [Thread-16] WARN
>> > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
>> > > > java.lang.Exception: java.lang.ClassCastException:
>> > > > org.apache.pig.data.DataByteArray cannot be cast to
>> > > > org.apache.pig.data.DataBag
>> > > > at
>> > >
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
>> > > > Caused by: java.lang.ClassCastException:
>> > > org.apache.pig.data.DataByteArray
>> > > > cannot be cast to org.apache.pig.data.DataBag
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
>> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
>> > > > at
>> > >
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> > > > at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > > > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> > > > at java.lang.Thread.run(Thread.java:680)
>> > > >
>> > > > Best Regards,
>> > > >
>> > > > Jerry
>> > > >
>> > > >
>> > > > On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <
>> > metaruslan@gmail.com
>> > > > >wrote:
>> > > >
>> > > > > Hey, and as for converting a map of tuples, probably i got you
>> wrong.
>> > > If
>> > > > > you can get to every value manually withing FOREACH then I see no
>> > > problem
>> > > > > in doing so.
>> > > > >
>> > > > >
>> > > > > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <
>> > > metaruslan@gmail.com
>> > > > > >wrote:
>> > > > >
>> > > > > > I am not sure whether you can convert a map to a tuple.
>> > > > > > But I am curious about one thing:
>> > > > > > your are trying to use 'b' as a Bag, right? Because FLATTEN
>> needs
>> > it
>> > > to
>> > > > > be
>> > > > > > a Bag I guess:
>> > > > > > http://pig.apache.org/docs/r0.10.0/basic.html#flatten
>> > > > > > But it seems that Pig thinks that b is a byte array:
>> > > > > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray
>> > > cannot
>> > > > be
>> > > > > > cast to org.apache.pig.data.DataBag
>> > > > > > Can you do this?:
>> > > > > > DESCRIBE B
>> > > > > >
>> > > > > > I suppose it can look like a Bag in the output of DUMP, but I
>> think
>> > > Pig
>> > > > > > doesn't know it is a Bag, maybe you'll need some kind of
>> explicit
>> > > cast?
>> > > > > >
>> > > > > >
>> > > > > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <
>> chilinglam@gmail.com>
>> > > > wrote:
>> > > > > >
>> > > > > >> Hi Rusian,
>> > > > > >>
>> > > > > >> I tried to debug each step already but no luck.
>> > > > > >> I did a dump (dump B;) after B = foreach A generate
>> document#'b'
>> > as
>> > > b;
>> > > > > >> I got {([c#11,d#22]),([c#33,d#44])}
>> > > > > >> but it fails when I did C = foreach B generate flatten(b);
>> > > > > >>
>> > > > > >> I don't have controls over the input. It is passed as Map of
>> > Maps. I
>> > > > > guess
>> > > > > >> it makes lookup easier using a map with keys.
>> > > > > >>
>> > > > > >> Can I convert map to tuple?
>> > > > > >>
>> > > > > >> Best Regards,
>> > > > > >>
>> > > > > >> Jerry
>> > > > > >>
>> > > > > >>
>> > > > > >>
>> > > > > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <
>> > > > > metaruslan@gmail.com
>> > > > > >> >wrote:
>> > > > > >>
>> > > > > >> > Hi Jerry,
>> > > > > >> >
>> > > > > >> > I would recommend to debug the issue step by step. Just after
>> > this
>> > > > > line:
>> > > > > >> > A = load 'data.txt' as document:[];
>> > > > > >> > and then right after that:
>> > > > > >> > DESCRIBE A;
>> > > > > >> > DUMP A;
>> > > > > >> > and so on...
>> > > > > >> >
>> > > > > >> > To be honest I haven't used maps that much. Just curious, why
>> > did
>> > > > you
>> > > > > >> > choose to use them? You can also use regular tuples for
>> storing
>> > > the
>> > > > > >> > relations. Also you can store the tuples with a schema file.
>> > > > > >> >
>> > > > > >> > Ruslan
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <
>> > chilinglam@gmail.com>
>> > > > > >> wrote:
>> > > > > >> >
>> > > > > >> > > Hi pig users,
>> > > > > >> > >
>> > > > > >> > > I tried to load data using PigStorage that was previously
>> > stored
>> > > > > using
>> > > > > >> > > PigStorage but it failed.
>> > > > > >> > >
>> > > > > >> > > Each line looks like this in the data file that is
>> generated
>> > by
>> > > > > >> > PigStorage:
>> > > > > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
>> > > > > >> > >
>> > > > > >> > > I did the following:
>> > > > > >> > > A = load 'data.txt' as document:[];
>> > > > > >> > > B = foreach A generate document#'b' as b;
>> > > > > >> > > C = foreach B generate flatten(b);
>> > > > > >> > > dump C;
>> > > > > >> > >
>> > > > > >> > > I expect to see the following output:
>> > > > > >> > > ([c#11,d#22])
>> > > > > >> > > ([c#33,d#44])
>> > > > > >> > >
>> > > > > >> > > Instead, I got:
>> > > > > >> > > java.lang.ClassCastException:
>> > org.apache.pig.data.DataByteArray
>> > > > > >> cannot be
>> > > > > >> > > cast to org.apache.pig.data.DataBag
>> > > > > >> > >
>> > > > > >> > > Anyone encounters this problem before? How can I read the
>> data
>> > > > back?
>> > > > > >> > >
>> > > > > >> > > Thanks,
>> > > > > >> > >
>> > > > > >> > > Jerry
>> > > > > >> > >
>> > > > > >> >
>> > > > > >>
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Jerry Lam <ch...@gmail.com>.
Hi Rusian:

I used PigStorage to store the data that is originally using Pig data type.
It is strange (or a bug in Pig) that I cannot read the data using
PigStorage that have been stored using PigStorage, isn't it?

Best Regards,

Jerry



On Wed, Apr 17, 2013 at 10:52 PM, Ruslan Al-Fakikh <me...@gmail.com>wrote:

> The output:
> ({ ([c#11,d#22]),([c#33,d#44]) })
> ()
> looks weird.
>
> Jerry, maybe the problem is in using PigStorage. As its javadoc says:
>
> A load function that parses a line of input into fields using a character
> delimiter
>
> So I guess this is just for simple csv lines.
> But you are trying to load a complicated Map structure as it was formatted
> by previous storing.
> Probably you'll need to write your own Loader for this. Another hint: using
> the -schema paramenter to PigStorage, but I am not sure it can help:(
>
> Ruslan
>
>
> On Wed, Apr 17, 2013 at 11:48 PM, Jerry Lam <ch...@gmail.com> wrote:
>
> > Hi Rusian:
> >
> > I did a describe B followed by a dump B, the output is:
> > B: {b: {()}}
> >
> > ({ ([c#11,d#22]),([c#33,d#44]) })
> > ()
> >
> > but when I executed
> >
> > C = foreach B generate flatten(b);
> >
> > dump C;
> >
> > I got the exception again...
> >
> > 2013-04-17 15:47:39,933 [Thread-26] WARN
> >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> > java.lang.Exception: java.lang.ClassCastException:
> > org.apache.pig.data.DataByteArray cannot be cast to
> > org.apache.pig.data.DataBag
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > Caused by: java.lang.ClassCastException:
> org.apache.pig.data.DataByteArray
> > cannot be cast to org.apache.pig.data.DataBag
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > at
> >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > at java.lang.Thread.run(Thread.java:680)
> >
> >
> > Best Regards,
> >
> > Jerry
> >
> >
> > On Wed, Apr 17, 2013 at 3:26 PM, Ruslan Al-Fakikh <metaruslan@gmail.com
> > >wrote:
> >
> > > I think that before doing the FLATTEN, you should be 100% sure that
> your
> > > cast worked properly. Can you first DESCRIBE B and then DUMP B right
> > away?
> > > Or probably it just can't be cast in this way. Honestly I don't know
> > > exactly how it works, but here:
> > > http://pig.apache.org/docs/r0.10.0/basic.html#cast
> > > I see that casting from a map to a bag should produce an error.
> > > Hope that helps.
> > >
> > >
> > > On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam <ch...@gmail.com>
> wrote:
> > >
> > > > Hi Rusian:
> > > >
> > > > Thanks for your help. I really appreciate it. It really puzzled me.
> > > >
> > > > I did a "describe B", the output is "B: {b: bytearray}".
> > > >
> > > > I then tried to cast it as suggested, I got:
> > > > B = foreach A generate document#'b' as b:{};
> > > > describe B;
> > > > B: {b: {()}}
> > > >
> > > > Then I proceed with:
> > > > C = foreach B generate flatten(b);
> > > >
> > > > I got:
> > > > 2013-04-17 13:38:04,601 [Thread-16] WARN
> > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> > > > java.lang.Exception: java.lang.ClassCastException:
> > > > org.apache.pig.data.DataByteArray cannot be cast to
> > > > org.apache.pig.data.DataBag
> > > > at
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > > > Caused by: java.lang.ClassCastException:
> > > org.apache.pig.data.DataByteArray
> > > > cannot be cast to org.apache.pig.data.DataBag
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > > > at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > > at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > > > at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > > > at java.lang.Thread.run(Thread.java:680)
> > > >
> > > > Best Regards,
> > > >
> > > > Jerry
> > > >
> > > >
> > > > On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <
> > metaruslan@gmail.com
> > > > >wrote:
> > > >
> > > > > Hey, and as for converting a map of tuples, probably i got you
> wrong.
> > > If
> > > > > you can get to every value manually withing FOREACH then I see no
> > > problem
> > > > > in doing so.
> > > > >
> > > > >
> > > > > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <
> > > metaruslan@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > I am not sure whether you can convert a map to a tuple.
> > > > > > But I am curious about one thing:
> > > > > > your are trying to use 'b' as a Bag, right? Because FLATTEN needs
> > it
> > > to
> > > > > be
> > > > > > a Bag I guess:
> > > > > > http://pig.apache.org/docs/r0.10.0/basic.html#flatten
> > > > > > But it seems that Pig thinks that b is a byte array:
> > > > > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> > > cannot
> > > > be
> > > > > > cast to org.apache.pig.data.DataBag
> > > > > > Can you do this?:
> > > > > > DESCRIBE B
> > > > > >
> > > > > > I suppose it can look like a Bag in the output of DUMP, but I
> think
> > > Pig
> > > > > > doesn't know it is a Bag, maybe you'll need some kind of explicit
> > > cast?
> > > > > >
> > > > > >
> > > > > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <chilinglam@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > >> Hi Rusian,
> > > > > >>
> > > > > >> I tried to debug each step already but no luck.
> > > > > >> I did a dump (dump B;) after B = foreach A generate document#'b'
> > as
> > > b;
> > > > > >> I got {([c#11,d#22]),([c#33,d#44])}
> > > > > >> but it fails when I did C = foreach B generate flatten(b);
> > > > > >>
> > > > > >> I don't have controls over the input. It is passed as Map of
> > Maps. I
> > > > > guess
> > > > > >> it makes lookup easier using a map with keys.
> > > > > >>
> > > > > >> Can I convert map to tuple?
> > > > > >>
> > > > > >> Best Regards,
> > > > > >>
> > > > > >> Jerry
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <
> > > > > metaruslan@gmail.com
> > > > > >> >wrote:
> > > > > >>
> > > > > >> > Hi Jerry,
> > > > > >> >
> > > > > >> > I would recommend to debug the issue step by step. Just after
> > this
> > > > > line:
> > > > > >> > A = load 'data.txt' as document:[];
> > > > > >> > and then right after that:
> > > > > >> > DESCRIBE A;
> > > > > >> > DUMP A;
> > > > > >> > and so on...
> > > > > >> >
> > > > > >> > To be honest I haven't used maps that much. Just curious, why
> > did
> > > > you
> > > > > >> > choose to use them? You can also use regular tuples for
> storing
> > > the
> > > > > >> > relations. Also you can store the tuples with a schema file.
> > > > > >> >
> > > > > >> > Ruslan
> > > > > >> >
> > > > > >> >
> > > > > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <
> > chilinglam@gmail.com>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Hi pig users,
> > > > > >> > >
> > > > > >> > > I tried to load data using PigStorage that was previously
> > stored
> > > > > using
> > > > > >> > > PigStorage but it failed.
> > > > > >> > >
> > > > > >> > > Each line looks like this in the data file that is generated
> > by
> > > > > >> > PigStorage:
> > > > > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
> > > > > >> > >
> > > > > >> > > I did the following:
> > > > > >> > > A = load 'data.txt' as document:[];
> > > > > >> > > B = foreach A generate document#'b' as b;
> > > > > >> > > C = foreach B generate flatten(b);
> > > > > >> > > dump C;
> > > > > >> > >
> > > > > >> > > I expect to see the following output:
> > > > > >> > > ([c#11,d#22])
> > > > > >> > > ([c#33,d#44])
> > > > > >> > >
> > > > > >> > > Instead, I got:
> > > > > >> > > java.lang.ClassCastException:
> > org.apache.pig.data.DataByteArray
> > > > > >> cannot be
> > > > > >> > > cast to org.apache.pig.data.DataBag
> > > > > >> > >
> > > > > >> > > Anyone encounters this problem before? How can I read the
> data
> > > > back?
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > >
> > > > > >> > > Jerry
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Ruslan Al-Fakikh <me...@gmail.com>.
The output:
({ ([c#11,d#22]),([c#33,d#44]) })
()
looks weird.

Jerry, maybe the problem is in using PigStorage. As its javadoc says:

A load function that parses a line of input into fields using a character
delimiter

So I guess this is just for simple csv lines.
But you are trying to load a complicated Map structure as it was formatted
by previous storing.
Probably you'll need to write your own Loader for this. Another hint: using
the -schema paramenter to PigStorage, but I am not sure it can help:(

Ruslan


On Wed, Apr 17, 2013 at 11:48 PM, Jerry Lam <ch...@gmail.com> wrote:

> Hi Rusian:
>
> I did a describe B followed by a dump B, the output is:
> B: {b: {()}}
>
> ({ ([c#11,d#22]),([c#33,d#44]) })
> ()
>
> but when I executed
>
> C = foreach B generate flatten(b);
>
> dump C;
>
> I got the exception again...
>
> 2013-04-17 15:47:39,933 [Thread-26] WARN
>  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> java.lang.Exception: java.lang.ClassCastException:
> org.apache.pig.data.DataByteArray cannot be cast to
> org.apache.pig.data.DataBag
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> cannot be cast to org.apache.pig.data.DataBag
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> at
>
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:680)
>
>
> Best Regards,
>
> Jerry
>
>
> On Wed, Apr 17, 2013 at 3:26 PM, Ruslan Al-Fakikh <metaruslan@gmail.com
> >wrote:
>
> > I think that before doing the FLATTEN, you should be 100% sure that your
> > cast worked properly. Can you first DESCRIBE B and then DUMP B right
> away?
> > Or probably it just can't be cast in this way. Honestly I don't know
> > exactly how it works, but here:
> > http://pig.apache.org/docs/r0.10.0/basic.html#cast
> > I see that casting from a map to a bag should produce an error.
> > Hope that helps.
> >
> >
> > On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam <ch...@gmail.com> wrote:
> >
> > > Hi Rusian:
> > >
> > > Thanks for your help. I really appreciate it. It really puzzled me.
> > >
> > > I did a "describe B", the output is "B: {b: bytearray}".
> > >
> > > I then tried to cast it as suggested, I got:
> > > B = foreach A generate document#'b' as b:{};
> > > describe B;
> > > B: {b: {()}}
> > >
> > > Then I proceed with:
> > > C = foreach B generate flatten(b);
> > >
> > > I got:
> > > 2013-04-17 13:38:04,601 [Thread-16] WARN
> > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> > > java.lang.Exception: java.lang.ClassCastException:
> > > org.apache.pig.data.DataByteArray cannot be cast to
> > > org.apache.pig.data.DataBag
> > > at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > > Caused by: java.lang.ClassCastException:
> > org.apache.pig.data.DataByteArray
> > > cannot be cast to org.apache.pig.data.DataBag
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > > at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > > at
> > >
> > >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > > at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > > at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > > at java.lang.Thread.run(Thread.java:680)
> > >
> > > Best Regards,
> > >
> > > Jerry
> > >
> > >
> > > On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <
> metaruslan@gmail.com
> > > >wrote:
> > >
> > > > Hey, and as for converting a map of tuples, probably i got you wrong.
> > If
> > > > you can get to every value manually withing FOREACH then I see no
> > problem
> > > > in doing so.
> > > >
> > > >
> > > > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <
> > metaruslan@gmail.com
> > > > >wrote:
> > > >
> > > > > I am not sure whether you can convert a map to a tuple.
> > > > > But I am curious about one thing:
> > > > > your are trying to use 'b' as a Bag, right? Because FLATTEN needs
> it
> > to
> > > > be
> > > > > a Bag I guess:
> > > > > http://pig.apache.org/docs/r0.10.0/basic.html#flatten
> > > > > But it seems that Pig thinks that b is a byte array:
> > > > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> > cannot
> > > be
> > > > > cast to org.apache.pig.data.DataBag
> > > > > Can you do this?:
> > > > > DESCRIBE B
> > > > >
> > > > > I suppose it can look like a Bag in the output of DUMP, but I think
> > Pig
> > > > > doesn't know it is a Bag, maybe you'll need some kind of explicit
> > cast?
> > > > >
> > > > >
> > > > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <ch...@gmail.com>
> > > wrote:
> > > > >
> > > > >> Hi Rusian,
> > > > >>
> > > > >> I tried to debug each step already but no luck.
> > > > >> I did a dump (dump B;) after B = foreach A generate document#'b'
> as
> > b;
> > > > >> I got {([c#11,d#22]),([c#33,d#44])}
> > > > >> but it fails when I did C = foreach B generate flatten(b);
> > > > >>
> > > > >> I don't have controls over the input. It is passed as Map of
> Maps. I
> > > > guess
> > > > >> it makes lookup easier using a map with keys.
> > > > >>
> > > > >> Can I convert map to tuple?
> > > > >>
> > > > >> Best Regards,
> > > > >>
> > > > >> Jerry
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <
> > > > metaruslan@gmail.com
> > > > >> >wrote:
> > > > >>
> > > > >> > Hi Jerry,
> > > > >> >
> > > > >> > I would recommend to debug the issue step by step. Just after
> this
> > > > line:
> > > > >> > A = load 'data.txt' as document:[];
> > > > >> > and then right after that:
> > > > >> > DESCRIBE A;
> > > > >> > DUMP A;
> > > > >> > and so on...
> > > > >> >
> > > > >> > To be honest I haven't used maps that much. Just curious, why
> did
> > > you
> > > > >> > choose to use them? You can also use regular tuples for storing
> > the
> > > > >> > relations. Also you can store the tuples with a schema file.
> > > > >> >
> > > > >> > Ruslan
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <
> chilinglam@gmail.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Hi pig users,
> > > > >> > >
> > > > >> > > I tried to load data using PigStorage that was previously
> stored
> > > > using
> > > > >> > > PigStorage but it failed.
> > > > >> > >
> > > > >> > > Each line looks like this in the data file that is generated
> by
> > > > >> > PigStorage:
> > > > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
> > > > >> > >
> > > > >> > > I did the following:
> > > > >> > > A = load 'data.txt' as document:[];
> > > > >> > > B = foreach A generate document#'b' as b;
> > > > >> > > C = foreach B generate flatten(b);
> > > > >> > > dump C;
> > > > >> > >
> > > > >> > > I expect to see the following output:
> > > > >> > > ([c#11,d#22])
> > > > >> > > ([c#33,d#44])
> > > > >> > >
> > > > >> > > Instead, I got:
> > > > >> > > java.lang.ClassCastException:
> org.apache.pig.data.DataByteArray
> > > > >> cannot be
> > > > >> > > cast to org.apache.pig.data.DataBag
> > > > >> > >
> > > > >> > > Anyone encounters this problem before? How can I read the data
> > > back?
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > >
> > > > >> > > Jerry
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Jerry Lam <ch...@gmail.com>.
Hi Rusian:

I did a describe B followed by a dump B, the output is:
B: {b: {()}}

({ ([c#11,d#22]),([c#33,d#44]) })
()

but when I executed

C = foreach B generate flatten(b);

dump C;

I got the exception again...

2013-04-17 15:47:39,933 [Thread-26] WARN
 org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
java.lang.Exception: java.lang.ClassCastException:
org.apache.pig.data.DataByteArray cannot be cast to
org.apache.pig.data.DataBag
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray
cannot be cast to org.apache.pig.data.DataBag
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)


Best Regards,

Jerry


On Wed, Apr 17, 2013 at 3:26 PM, Ruslan Al-Fakikh <me...@gmail.com>wrote:

> I think that before doing the FLATTEN, you should be 100% sure that your
> cast worked properly. Can you first DESCRIBE B and then DUMP B right away?
> Or probably it just can't be cast in this way. Honestly I don't know
> exactly how it works, but here:
> http://pig.apache.org/docs/r0.10.0/basic.html#cast
> I see that casting from a map to a bag should produce an error.
> Hope that helps.
>
>
> On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam <ch...@gmail.com> wrote:
>
> > Hi Rusian:
> >
> > Thanks for your help. I really appreciate it. It really puzzled me.
> >
> > I did a "describe B", the output is "B: {b: bytearray}".
> >
> > I then tried to cast it as suggested, I got:
> > B = foreach A generate document#'b' as b:{};
> > describe B;
> > B: {b: {()}}
> >
> > Then I proceed with:
> > C = foreach B generate flatten(b);
> >
> > I got:
> > 2013-04-17 13:38:04,601 [Thread-16] WARN
> >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> > java.lang.Exception: java.lang.ClassCastException:
> > org.apache.pig.data.DataByteArray cannot be cast to
> > org.apache.pig.data.DataBag
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> > Caused by: java.lang.ClassCastException:
> org.apache.pig.data.DataByteArray
> > cannot be cast to org.apache.pig.data.DataBag
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> > at
> >
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > at java.lang.Thread.run(Thread.java:680)
> >
> > Best Regards,
> >
> > Jerry
> >
> >
> > On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <metaruslan@gmail.com
> > >wrote:
> >
> > > Hey, and as for converting a map of tuples, probably i got you wrong.
> If
> > > you can get to every value manually withing FOREACH then I see no
> problem
> > > in doing so.
> > >
> > >
> > > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <
> metaruslan@gmail.com
> > > >wrote:
> > >
> > > > I am not sure whether you can convert a map to a tuple.
> > > > But I am curious about one thing:
> > > > your are trying to use 'b' as a Bag, right? Because FLATTEN needs it
> to
> > > be
> > > > a Bag I guess:
> > > > http://pig.apache.org/docs/r0.10.0/basic.html#flatten
> > > > But it seems that Pig thinks that b is a byte array:
> > > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> cannot
> > be
> > > > cast to org.apache.pig.data.DataBag
> > > > Can you do this?:
> > > > DESCRIBE B
> > > >
> > > > I suppose it can look like a Bag in the output of DUMP, but I think
> Pig
> > > > doesn't know it is a Bag, maybe you'll need some kind of explicit
> cast?
> > > >
> > > >
> > > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <ch...@gmail.com>
> > wrote:
> > > >
> > > >> Hi Rusian,
> > > >>
> > > >> I tried to debug each step already but no luck.
> > > >> I did a dump (dump B;) after B = foreach A generate document#'b' as
> b;
> > > >> I got {([c#11,d#22]),([c#33,d#44])}
> > > >> but it fails when I did C = foreach B generate flatten(b);
> > > >>
> > > >> I don't have controls over the input. It is passed as Map of Maps. I
> > > guess
> > > >> it makes lookup easier using a map with keys.
> > > >>
> > > >> Can I convert map to tuple?
> > > >>
> > > >> Best Regards,
> > > >>
> > > >> Jerry
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <
> > > metaruslan@gmail.com
> > > >> >wrote:
> > > >>
> > > >> > Hi Jerry,
> > > >> >
> > > >> > I would recommend to debug the issue step by step. Just after this
> > > line:
> > > >> > A = load 'data.txt' as document:[];
> > > >> > and then right after that:
> > > >> > DESCRIBE A;
> > > >> > DUMP A;
> > > >> > and so on...
> > > >> >
> > > >> > To be honest I haven't used maps that much. Just curious, why did
> > you
> > > >> > choose to use them? You can also use regular tuples for storing
> the
> > > >> > relations. Also you can store the tuples with a schema file.
> > > >> >
> > > >> > Ruslan
> > > >> >
> > > >> >
> > > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <ch...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> > > Hi pig users,
> > > >> > >
> > > >> > > I tried to load data using PigStorage that was previously stored
> > > using
> > > >> > > PigStorage but it failed.
> > > >> > >
> > > >> > > Each line looks like this in the data file that is generated by
> > > >> > PigStorage:
> > > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
> > > >> > >
> > > >> > > I did the following:
> > > >> > > A = load 'data.txt' as document:[];
> > > >> > > B = foreach A generate document#'b' as b;
> > > >> > > C = foreach B generate flatten(b);
> > > >> > > dump C;
> > > >> > >
> > > >> > > I expect to see the following output:
> > > >> > > ([c#11,d#22])
> > > >> > > ([c#33,d#44])
> > > >> > >
> > > >> > > Instead, I got:
> > > >> > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> > > >> cannot be
> > > >> > > cast to org.apache.pig.data.DataBag
> > > >> > >
> > > >> > > Anyone encounters this problem before? How can I read the data
> > back?
> > > >> > >
> > > >> > > Thanks,
> > > >> > >
> > > >> > > Jerry
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Ruslan Al-Fakikh <me...@gmail.com>.
I think that before doing the FLATTEN, you should be 100% sure that your
cast worked properly. Can you first DESCRIBE B and then DUMP B right away?
Or probably it just can't be cast in this way. Honestly I don't know
exactly how it works, but here:
http://pig.apache.org/docs/r0.10.0/basic.html#cast
I see that casting from a map to a bag should produce an error.
Hope that helps.


On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam <ch...@gmail.com> wrote:

> Hi Rusian:
>
> Thanks for your help. I really appreciate it. It really puzzled me.
>
> I did a "describe B", the output is "B: {b: bytearray}".
>
> I then tried to cast it as suggested, I got:
> B = foreach A generate document#'b' as b:{};
> describe B;
> B: {b: {()}}
>
> Then I proceed with:
> C = foreach B generate flatten(b);
>
> I got:
> 2013-04-17 13:38:04,601 [Thread-16] WARN
>  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
> java.lang.Exception: java.lang.ClassCastException:
> org.apache.pig.data.DataByteArray cannot be cast to
> org.apache.pig.data.DataBag
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
> Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> cannot be cast to org.apache.pig.data.DataBag
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> at
>
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:680)
>
> Best Regards,
>
> Jerry
>
>
> On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <metaruslan@gmail.com
> >wrote:
>
> > Hey, and as for converting a map of tuples, probably i got you wrong. If
> > you can get to every value manually withing FOREACH then I see no problem
> > in doing so.
> >
> >
> > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <metaruslan@gmail.com
> > >wrote:
> >
> > > I am not sure whether you can convert a map to a tuple.
> > > But I am curious about one thing:
> > > your are trying to use 'b' as a Bag, right? Because FLATTEN needs it to
> > be
> > > a Bag I guess:
> > > http://pig.apache.org/docs/r0.10.0/basic.html#flatten
> > > But it seems that Pig thinks that b is a byte array:
> > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot
> be
> > > cast to org.apache.pig.data.DataBag
> > > Can you do this?:
> > > DESCRIBE B
> > >
> > > I suppose it can look like a Bag in the output of DUMP, but I think Pig
> > > doesn't know it is a Bag, maybe you'll need some kind of explicit cast?
> > >
> > >
> > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <ch...@gmail.com>
> wrote:
> > >
> > >> Hi Rusian,
> > >>
> > >> I tried to debug each step already but no luck.
> > >> I did a dump (dump B;) after B = foreach A generate document#'b' as b;
> > >> I got {([c#11,d#22]),([c#33,d#44])}
> > >> but it fails when I did C = foreach B generate flatten(b);
> > >>
> > >> I don't have controls over the input. It is passed as Map of Maps. I
> > guess
> > >> it makes lookup easier using a map with keys.
> > >>
> > >> Can I convert map to tuple?
> > >>
> > >> Best Regards,
> > >>
> > >> Jerry
> > >>
> > >>
> > >>
> > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <
> > metaruslan@gmail.com
> > >> >wrote:
> > >>
> > >> > Hi Jerry,
> > >> >
> > >> > I would recommend to debug the issue step by step. Just after this
> > line:
> > >> > A = load 'data.txt' as document:[];
> > >> > and then right after that:
> > >> > DESCRIBE A;
> > >> > DUMP A;
> > >> > and so on...
> > >> >
> > >> > To be honest I haven't used maps that much. Just curious, why did
> you
> > >> > choose to use them? You can also use regular tuples for storing the
> > >> > relations. Also you can store the tuples with a schema file.
> > >> >
> > >> > Ruslan
> > >> >
> > >> >
> > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <ch...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Hi pig users,
> > >> > >
> > >> > > I tried to load data using PigStorage that was previously stored
> > using
> > >> > > PigStorage but it failed.
> > >> > >
> > >> > > Each line looks like this in the data file that is generated by
> > >> > PigStorage:
> > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
> > >> > >
> > >> > > I did the following:
> > >> > > A = load 'data.txt' as document:[];
> > >> > > B = foreach A generate document#'b' as b;
> > >> > > C = foreach B generate flatten(b);
> > >> > > dump C;
> > >> > >
> > >> > > I expect to see the following output:
> > >> > > ([c#11,d#22])
> > >> > > ([c#33,d#44])
> > >> > >
> > >> > > Instead, I got:
> > >> > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> > >> cannot be
> > >> > > cast to org.apache.pig.data.DataBag
> > >> > >
> > >> > > Anyone encounters this problem before? How can I read the data
> back?
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Jerry
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Jerry Lam <ch...@gmail.com>.
Hi Rusian:

Thanks for your help. I really appreciate it. It really puzzled me.

I did a "describe B", the output is "B: {b: bytearray}".

I then tried to cast it as suggested, I got:
B = foreach A generate document#'b' as b:{};
describe B;
B: {b: {()}}

Then I proceed with:
C = foreach B generate flatten(b);

I got:
2013-04-17 13:38:04,601 [Thread-16] WARN
 org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
java.lang.Exception: java.lang.ClassCastException:
org.apache.pig.data.DataByteArray cannot be cast to
org.apache.pig.data.DataBag
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray
cannot be cast to org.apache.pig.data.DataBag
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)

Best Regards,

Jerry


On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <me...@gmail.com>wrote:

> Hey, and as for converting a map of tuples, probably i got you wrong. If
> you can get to every value manually withing FOREACH then I see no problem
> in doing so.
>
>
> On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <metaruslan@gmail.com
> >wrote:
>
> > I am not sure whether you can convert a map to a tuple.
> > But I am curious about one thing:
> > your are trying to use 'b' as a Bag, right? Because FLATTEN needs it to
> be
> > a Bag I guess:
> > http://pig.apache.org/docs/r0.10.0/basic.html#flatten
> > But it seems that Pig thinks that b is a byte array:
> > java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
> > cast to org.apache.pig.data.DataBag
> > Can you do this?:
> > DESCRIBE B
> >
> > I suppose it can look like a Bag in the output of DUMP, but I think Pig
> > doesn't know it is a Bag, maybe you'll need some kind of explicit cast?
> >
> >
> > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <ch...@gmail.com> wrote:
> >
> >> Hi Rusian,
> >>
> >> I tried to debug each step already but no luck.
> >> I did a dump (dump B;) after B = foreach A generate document#'b' as b;
> >> I got {([c#11,d#22]),([c#33,d#44])}
> >> but it fails when I did C = foreach B generate flatten(b);
> >>
> >> I don't have controls over the input. It is passed as Map of Maps. I
> guess
> >> it makes lookup easier using a map with keys.
> >>
> >> Can I convert map to tuple?
> >>
> >> Best Regards,
> >>
> >> Jerry
> >>
> >>
> >>
> >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <
> metaruslan@gmail.com
> >> >wrote:
> >>
> >> > Hi Jerry,
> >> >
> >> > I would recommend to debug the issue step by step. Just after this
> line:
> >> > A = load 'data.txt' as document:[];
> >> > and then right after that:
> >> > DESCRIBE A;
> >> > DUMP A;
> >> > and so on...
> >> >
> >> > To be honest I haven't used maps that much. Just curious, why did you
> >> > choose to use them? You can also use regular tuples for storing the
> >> > relations. Also you can store the tuples with a schema file.
> >> >
> >> > Ruslan
> >> >
> >> >
> >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <ch...@gmail.com>
> >> wrote:
> >> >
> >> > > Hi pig users,
> >> > >
> >> > > I tried to load data using PigStorage that was previously stored
> using
> >> > > PigStorage but it failed.
> >> > >
> >> > > Each line looks like this in the data file that is generated by
> >> > PigStorage:
> >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
> >> > >
> >> > > I did the following:
> >> > > A = load 'data.txt' as document:[];
> >> > > B = foreach A generate document#'b' as b;
> >> > > C = foreach B generate flatten(b);
> >> > > dump C;
> >> > >
> >> > > I expect to see the following output:
> >> > > ([c#11,d#22])
> >> > > ([c#33,d#44])
> >> > >
> >> > > Instead, I got:
> >> > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray
> >> cannot be
> >> > > cast to org.apache.pig.data.DataBag
> >> > >
> >> > > Anyone encounters this problem before? How can I read the data back?
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jerry
> >> > >
> >> >
> >>
> >
> >
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Ruslan Al-Fakikh <me...@gmail.com>.
Hey, and as for converting a map of tuples, probably i got you wrong. If
you can get to every value manually withing FOREACH then I see no problem
in doing so.


On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <me...@gmail.com>wrote:

> I am not sure whether you can convert a map to a tuple.
> But I am curious about one thing:
> your are trying to use 'b' as a Bag, right? Because FLATTEN needs it to be
> a Bag I guess:
> http://pig.apache.org/docs/r0.10.0/basic.html#flatten
> But it seems that Pig thinks that b is a byte array:
> java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
> cast to org.apache.pig.data.DataBag
> Can you do this?:
> DESCRIBE B
>
> I suppose it can look like a Bag in the output of DUMP, but I think Pig
> doesn't know it is a Bag, maybe you'll need some kind of explicit cast?
>
>
> On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <ch...@gmail.com> wrote:
>
>> Hi Rusian,
>>
>> I tried to debug each step already but no luck.
>> I did a dump (dump B;) after B = foreach A generate document#'b' as b;
>> I got {([c#11,d#22]),([c#33,d#44])}
>> but it fails when I did C = foreach B generate flatten(b);
>>
>> I don't have controls over the input. It is passed as Map of Maps. I guess
>> it makes lookup easier using a map with keys.
>>
>> Can I convert map to tuple?
>>
>> Best Regards,
>>
>> Jerry
>>
>>
>>
>> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <metaruslan@gmail.com
>> >wrote:
>>
>> > Hi Jerry,
>> >
>> > I would recommend to debug the issue step by step. Just after this line:
>> > A = load 'data.txt' as document:[];
>> > and then right after that:
>> > DESCRIBE A;
>> > DUMP A;
>> > and so on...
>> >
>> > To be honest I haven't used maps that much. Just curious, why did you
>> > choose to use them? You can also use regular tuples for storing the
>> > relations. Also you can store the tuples with a schema file.
>> >
>> > Ruslan
>> >
>> >
>> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <ch...@gmail.com>
>> wrote:
>> >
>> > > Hi pig users,
>> > >
>> > > I tried to load data using PigStorage that was previously stored using
>> > > PigStorage but it failed.
>> > >
>> > > Each line looks like this in the data file that is generated by
>> > PigStorage:
>> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
>> > >
>> > > I did the following:
>> > > A = load 'data.txt' as document:[];
>> > > B = foreach A generate document#'b' as b;
>> > > C = foreach B generate flatten(b);
>> > > dump C;
>> > >
>> > > I expect to see the following output:
>> > > ([c#11,d#22])
>> > > ([c#33,d#44])
>> > >
>> > > Instead, I got:
>> > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray
>> cannot be
>> > > cast to org.apache.pig.data.DataBag
>> > >
>> > > Anyone encounters this problem before? How can I read the data back?
>> > >
>> > > Thanks,
>> > >
>> > > Jerry
>> > >
>> >
>>
>
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Ruslan Al-Fakikh <me...@gmail.com>.
I am not sure whether you can convert a map to a tuple.
But I am curious about one thing:
your are trying to use 'b' as a Bag, right? Because FLATTEN needs it to be
a Bag I guess:
http://pig.apache.org/docs/r0.10.0/basic.html#flatten
But it seems that Pig thinks that b is a byte array:
java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
cast to org.apache.pig.data.DataBag
Can you do this?:
DESCRIBE B

I suppose it can look like a Bag in the output of DUMP, but I think Pig
doesn't know it is a Bag, maybe you'll need some kind of explicit cast?


On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <ch...@gmail.com> wrote:

> Hi Rusian,
>
> I tried to debug each step already but no luck.
> I did a dump (dump B;) after B = foreach A generate document#'b' as b;
> I got {([c#11,d#22]),([c#33,d#44])}
> but it fails when I did C = foreach B generate flatten(b);
>
> I don't have controls over the input. It is passed as Map of Maps. I guess
> it makes lookup easier using a map with keys.
>
> Can I convert map to tuple?
>
> Best Regards,
>
> Jerry
>
>
>
> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <metaruslan@gmail.com
> >wrote:
>
> > Hi Jerry,
> >
> > I would recommend to debug the issue step by step. Just after this line:
> > A = load 'data.txt' as document:[];
> > and then right after that:
> > DESCRIBE A;
> > DUMP A;
> > and so on...
> >
> > To be honest I haven't used maps that much. Just curious, why did you
> > choose to use them? You can also use regular tuples for storing the
> > relations. Also you can store the tuples with a schema file.
> >
> > Ruslan
> >
> >
> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <ch...@gmail.com> wrote:
> >
> > > Hi pig users,
> > >
> > > I tried to load data using PigStorage that was previously stored using
> > > PigStorage but it failed.
> > >
> > > Each line looks like this in the data file that is generated by
> > PigStorage:
> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
> > >
> > > I did the following:
> > > A = load 'data.txt' as document:[];
> > > B = foreach A generate document#'b' as b;
> > > C = foreach B generate flatten(b);
> > > dump C;
> > >
> > > I expect to see the following output:
> > > ([c#11,d#22])
> > > ([c#33,d#44])
> > >
> > > Instead, I got:
> > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot
> be
> > > cast to org.apache.pig.data.DataBag
> > >
> > > Anyone encounters this problem before? How can I read the data back?
> > >
> > > Thanks,
> > >
> > > Jerry
> > >
> >
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Jerry Lam <ch...@gmail.com>.
Hi Rusian,

I tried to debug each step already but no luck.
I did a dump (dump B;) after B = foreach A generate document#'b' as b;
I got {([c#11,d#22]),([c#33,d#44])}
but it fails when I did C = foreach B generate flatten(b);

I don't have controls over the input. It is passed as Map of Maps. I guess
it makes lookup easier using a map with keys.

Can I convert map to tuple?

Best Regards,

Jerry



On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <me...@gmail.com>wrote:

> Hi Jerry,
>
> I would recommend to debug the issue step by step. Just after this line:
> A = load 'data.txt' as document:[];
> and then right after that:
> DESCRIBE A;
> DUMP A;
> and so on...
>
> To be honest I haven't used maps that much. Just curious, why did you
> choose to use them? You can also use regular tuples for storing the
> relations. Also you can store the tuples with a schema file.
>
> Ruslan
>
>
> On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <ch...@gmail.com> wrote:
>
> > Hi pig users,
> >
> > I tried to load data using PigStorage that was previously stored using
> > PigStorage but it failed.
> >
> > Each line looks like this in the data file that is generated by
> PigStorage:
> > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
> >
> > I did the following:
> > A = load 'data.txt' as document:[];
> > B = foreach A generate document#'b' as b;
> > C = foreach B generate flatten(b);
> > dump C;
> >
> > I expect to see the following output:
> > ([c#11,d#22])
> > ([c#33,d#44])
> >
> > Instead, I got:
> > java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
> > cast to org.apache.pig.data.DataBag
> >
> > Anyone encounters this problem before? How can I read the data back?
> >
> > Thanks,
> >
> > Jerry
> >
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Ruslan Al-Fakikh <me...@gmail.com>.
Hi Jerry,

I would recommend to debug the issue step by step. Just after this line:
A = load 'data.txt' as document:[];
and then right after that:
DESCRIBE A;
DUMP A;
and so on...

To be honest I haven't used maps that much. Just curious, why did you
choose to use them? You can also use regular tuples for storing the
relations. Also you can store the tuples with a schema file.

Ruslan


On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <ch...@gmail.com> wrote:

> Hi pig users,
>
> I tried to load data using PigStorage that was previously stored using
> PigStorage but it failed.
>
> Each line looks like this in the data file that is generated by PigStorage:
> [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
>
> I did the following:
> A = load 'data.txt' as document:[];
> B = foreach A generate document#'b' as b;
> C = foreach B generate flatten(b);
> dump C;
>
> I expect to see the following output:
> ([c#11,d#22])
> ([c#33,d#44])
>
> Instead, I got:
> java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
> cast to org.apache.pig.data.DataBag
>
> Anyone encounters this problem before? How can I read the data back?
>
> Thanks,
>
> Jerry
>

Re: Unable to load data using PigStorage that was previously stored using PigStorage

Posted by Prashant Kommireddi <pr...@gmail.com>.
Hi Jerry,

Map values by default are bytearrays. If you need them to be any other
type, you would need to define it explicitly. In your case, since you want
them to be treated as bags

A = load 'data.txt' as document:map[bag{}];

An issue with your dataset is that the type of values in map is not
consistent with 1 being a chararray/bytearray "hello" and the 2nd a bag
"{([c#11,d#22]),([c#33,d#44])}". This is not permitted as the values all
have to be of the same type.

Instead your dataset should have all values as bags for your query to work,
for eg
[a#{(hello)},b#{([c#11,d#22]),([c#33,d#44])}]

A = load 'data.txt' as document:map[bag{}];
B = foreach A generate document#'b' as b;
C = foreach B generate flatten(b);
dump C;









On Tue, Apr 16, 2013 at 6:28 PM, Jerry Lam <ch...@gmail.com> wrote:

> Hi pig users,
>
> I tried to load data using PigStorage that was previously stored using
> PigStorage but it failed.
>
> Each line looks like this in the data file that is generated by PigStorage:
> [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
>
> I did the following:
> A = load 'data.txt' as document:[];
> B = foreach A generate document#'b' as b;
> C = foreach B generate flatten(b);
> dump C;
>
> I expect to see the following output:
> ([c#11,d#22])
> ([c#33,d#44])
>
> Instead, I got:
> java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
> cast to org.apache.pig.data.DataBag
>
> Anyone encounters this problem before? How can I read the data back?
>
> Thanks,
>
> Jerry
>