You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by "Roman V. Shapovalov" <sh...@graphics.cs.msu.su> on 2012/11/29 15:34:26 UTC

Saving collection to text files in Scrunch

Dear crunch-users,

I am trying to solve some toy MapReduce problem using Scrunch. When I
write the final result in the pipeline app, i.e. call

write(to.textFile(args(1)))

and get object names in the output file, like:

org.apache.avro.mapred.AvroWrapper@80
org.apache.avro.mapred.AvroWrapper@17a73

This happens only if I perform some mapping (even identity); just
reading and writing results in good strings in the file.

It seems that mapping wraps the strings using the AvroWrapper, but
writing to the text file does not unwrap them. Is it supposed to
unwrap them?

There is a factory method To.formattedFile() in Crunch (I guess it may
help, but it is not documented), but it is not ported to Scrunch. Is
there another idiom for writing strings?

Thanks in advance,
Roman

Re: Saving collection to text files in Scrunch

Posted by Robert Chu <ro...@wibidata.com>.
If I remember correctly, this is an issue caused by avro strings not being
written properly to text files.


On Thu, Nov 29, 2012 at 8:40 AM, Roman V. Shapovalov <
shapovalov@graphics.cs.msu.su> wrote:

> Hi Josh,
>
> The trick you suggested works the way I expected: it saves strings as text.
>
> Thank you!
> Roman
>
> On Thu, Nov 29, 2012 at 8:03 PM, Josh Wills <jw...@cloudera.com> wrote:
> > Hey Roman,
> >
> > While I take a look at that, would you try using the writeTextFile
> function
> > (e.g., writeTextFile(<pcollection>, args(1)) ) and let me know if that
> does
> > the trick?
> >
> > Josh
> >
> >
> > On Thu, Nov 29, 2012 at 6:34 AM, Roman V. Shapovalov
> > <sh...@graphics.cs.msu.su> wrote:
> >>
> >> Dear crunch-users,
> >>
> >> I am trying to solve some toy MapReduce problem using Scrunch. When I
> >> write the final result in the pipeline app, i.e. call
> >>
> >> write(to.textFile(args(1)))
> >>
> >> and get object names in the output file, like:
> >>
> >> org.apache.avro.mapred.AvroWrapper@80
> >> org.apache.avro.mapred.AvroWrapper@17a73
> >>
> >> This happens only if I perform some mapping (even identity); just
> >> reading and writing results in good strings in the file.
> >>
> >> It seems that mapping wraps the strings using the AvroWrapper, but
> >> writing to the text file does not unwrap them. Is it supposed to
> >> unwrap them?
> >>
> >> There is a factory method To.formattedFile() in Crunch (I guess it may
> >> help, but it is not documented), but it is not ported to Scrunch. Is
> >> there another idiom for writing strings?
> >>
> >> Thanks in advance,
> >> Roman
> >
> >
> >
> >
> > --
> > Director of Data Science
> > Cloudera
> > Twitter: @josh_wills
> >
>

Re: Saving collection to text files in Scrunch

Posted by "Roman V. Shapovalov" <sh...@graphics.cs.msu.su>.
Hi Josh,

The trick you suggested works the way I expected: it saves strings as text.

Thank you!
Roman

On Thu, Nov 29, 2012 at 8:03 PM, Josh Wills <jw...@cloudera.com> wrote:
> Hey Roman,
>
> While I take a look at that, would you try using the writeTextFile function
> (e.g., writeTextFile(<pcollection>, args(1)) ) and let me know if that does
> the trick?
>
> Josh
>
>
> On Thu, Nov 29, 2012 at 6:34 AM, Roman V. Shapovalov
> <sh...@graphics.cs.msu.su> wrote:
>>
>> Dear crunch-users,
>>
>> I am trying to solve some toy MapReduce problem using Scrunch. When I
>> write the final result in the pipeline app, i.e. call
>>
>> write(to.textFile(args(1)))
>>
>> and get object names in the output file, like:
>>
>> org.apache.avro.mapred.AvroWrapper@80
>> org.apache.avro.mapred.AvroWrapper@17a73
>>
>> This happens only if I perform some mapping (even identity); just
>> reading and writing results in good strings in the file.
>>
>> It seems that mapping wraps the strings using the AvroWrapper, but
>> writing to the text file does not unwrap them. Is it supposed to
>> unwrap them?
>>
>> There is a factory method To.formattedFile() in Crunch (I guess it may
>> help, but it is not documented), but it is not ported to Scrunch. Is
>> there another idiom for writing strings?
>>
>> Thanks in advance,
>> Roman
>
>
>
>
> --
> Director of Data Science
> Cloudera
> Twitter: @josh_wills
>

Re: Saving collection to text files in Scrunch

Posted by Josh Wills <jw...@cloudera.com>.
Hey Roman,

While I take a look at that, would you try using the writeTextFile function
(e.g., writeTextFile(<pcollection>, args(1)) ) and let me know if that does
the trick?

Josh


On Thu, Nov 29, 2012 at 6:34 AM, Roman V. Shapovalov <
shapovalov@graphics.cs.msu.su> wrote:

> Dear crunch-users,
>
> I am trying to solve some toy MapReduce problem using Scrunch. When I
> write the final result in the pipeline app, i.e. call
>
> write(to.textFile(args(1)))
>
> and get object names in the output file, like:
>
> org.apache.avro.mapred.AvroWrapper@80
> org.apache.avro.mapred.AvroWrapper@17a73
>
> This happens only if I perform some mapping (even identity); just
> reading and writing results in good strings in the file.
>
> It seems that mapping wraps the strings using the AvroWrapper, but
> writing to the text file does not unwrap them. Is it supposed to
> unwrap them?
>
> There is a factory method To.formattedFile() in Crunch (I guess it may
> help, but it is not documented), but it is not ported to Scrunch. Is
> there another idiom for writing strings?
>
> Thanks in advance,
> Roman
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>