You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by "Roman V. Shapovalov" <sh...@graphics.cs.msu.su> on 2012/11/29 15:34:26 UTC
Saving collection to text files in Scrunch
Dear crunch-users,
I am trying to solve some toy MapReduce problem using Scrunch. When I
write the final result in the pipeline app, i.e. call
write(to.textFile(args(1)))
and get object names in the output file, like:
org.apache.avro.mapred.AvroWrapper@80
org.apache.avro.mapred.AvroWrapper@17a73
This happens only if I perform some mapping (even identity); just
reading and writing results in good strings in the file.
It seems that mapping wraps the strings using the AvroWrapper, but
writing to the text file does not unwrap them. Is it supposed to
unwrap them?
There is a factory method To.formattedFile() in Crunch (I guess it may
help, but it is not documented), but it is not ported to Scrunch. Is
there another idiom for writing strings?
Thanks in advance,
Roman
Re: Saving collection to text files in Scrunch
Posted by Robert Chu <ro...@wibidata.com>.
If I remember correctly, this is an issue caused by avro strings not being
written properly to text files.
On Thu, Nov 29, 2012 at 8:40 AM, Roman V. Shapovalov <
shapovalov@graphics.cs.msu.su> wrote:
> Hi Josh,
>
> The trick you suggested works the way I expected: it saves strings as text.
>
> Thank you!
> Roman
>
> On Thu, Nov 29, 2012 at 8:03 PM, Josh Wills <jw...@cloudera.com> wrote:
> > Hey Roman,
> >
> > While I take a look at that, would you try using the writeTextFile
> function
> > (e.g., writeTextFile(<pcollection>, args(1)) ) and let me know if that
> does
> > the trick?
> >
> > Josh
> >
> >
> > On Thu, Nov 29, 2012 at 6:34 AM, Roman V. Shapovalov
> > <sh...@graphics.cs.msu.su> wrote:
> >>
> >> Dear crunch-users,
> >>
> >> I am trying to solve some toy MapReduce problem using Scrunch. When I
> >> write the final result in the pipeline app, i.e. call
> >>
> >> write(to.textFile(args(1)))
> >>
> >> and get object names in the output file, like:
> >>
> >> org.apache.avro.mapred.AvroWrapper@80
> >> org.apache.avro.mapred.AvroWrapper@17a73
> >>
> >> This happens only if I perform some mapping (even identity); just
> >> reading and writing results in good strings in the file.
> >>
> >> It seems that mapping wraps the strings using the AvroWrapper, but
> >> writing to the text file does not unwrap them. Is it supposed to
> >> unwrap them?
> >>
> >> There is a factory method To.formattedFile() in Crunch (I guess it may
> >> help, but it is not documented), but it is not ported to Scrunch. Is
> >> there another idiom for writing strings?
> >>
> >> Thanks in advance,
> >> Roman
> >
> >
> >
> >
> > --
> > Director of Data Science
> > Cloudera
> > Twitter: @josh_wills
> >
>
Re: Saving collection to text files in Scrunch
Posted by "Roman V. Shapovalov" <sh...@graphics.cs.msu.su>.
Hi Josh,
The trick you suggested works the way I expected: it saves strings as text.
Thank you!
Roman
On Thu, Nov 29, 2012 at 8:03 PM, Josh Wills <jw...@cloudera.com> wrote:
> Hey Roman,
>
> While I take a look at that, would you try using the writeTextFile function
> (e.g., writeTextFile(<pcollection>, args(1)) ) and let me know if that does
> the trick?
>
> Josh
>
>
> On Thu, Nov 29, 2012 at 6:34 AM, Roman V. Shapovalov
> <sh...@graphics.cs.msu.su> wrote:
>>
>> Dear crunch-users,
>>
>> I am trying to solve some toy MapReduce problem using Scrunch. When I
>> write the final result in the pipeline app, i.e. call
>>
>> write(to.textFile(args(1)))
>>
>> and get object names in the output file, like:
>>
>> org.apache.avro.mapred.AvroWrapper@80
>> org.apache.avro.mapred.AvroWrapper@17a73
>>
>> This happens only if I perform some mapping (even identity); just
>> reading and writing results in good strings in the file.
>>
>> It seems that mapping wraps the strings using the AvroWrapper, but
>> writing to the text file does not unwrap them. Is it supposed to
>> unwrap them?
>>
>> There is a factory method To.formattedFile() in Crunch (I guess it may
>> help, but it is not documented), but it is not ported to Scrunch. Is
>> there another idiom for writing strings?
>>
>> Thanks in advance,
>> Roman
>
>
>
>
> --
> Director of Data Science
> Cloudera
> Twitter: @josh_wills
>
Re: Saving collection to text files in Scrunch
Posted by Josh Wills <jw...@cloudera.com>.
Hey Roman,
While I take a look at that, would you try using the writeTextFile function
(e.g., writeTextFile(<pcollection>, args(1)) ) and let me know if that does
the trick?
Josh
On Thu, Nov 29, 2012 at 6:34 AM, Roman V. Shapovalov <
shapovalov@graphics.cs.msu.su> wrote:
> Dear crunch-users,
>
> I am trying to solve some toy MapReduce problem using Scrunch. When I
> write the final result in the pipeline app, i.e. call
>
> write(to.textFile(args(1)))
>
> and get object names in the output file, like:
>
> org.apache.avro.mapred.AvroWrapper@80
> org.apache.avro.mapred.AvroWrapper@17a73
>
> This happens only if I perform some mapping (even identity); just
> reading and writing results in good strings in the file.
>
> It seems that mapping wraps the strings using the AvroWrapper, but
> writing to the text file does not unwrap them. Is it supposed to
> unwrap them?
>
> There is a factory method To.formattedFile() in Crunch (I guess it may
> help, but it is not documented), but it is not ported to Scrunch. Is
> there another idiom for writing strings?
>
> Thanks in advance,
> Roman
>
--
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>