You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Tahir Hameed <ta...@gmail.com> on 2015/09/23 09:56:43 UTC

Apache Crunch Passing a Hash Map to DoFn

Hi,

I've a PTable which I store as an Avro file. The PTable file is later to be
used in another DoFn after it is converted into a HashMap.

PTable<String, MyClass> myClassData = table.parallelDo(new
MyClassDoFN(),Avros.tableOf(Avros.strings(),Avros.reflects(MyClass.class)));
Target target=To.avroFile("/user/xyz/output/");
myClassData.write(target,Target.WriteMode.OVERWRITE);

Can you please tell me how this file maybe read in another DoFn?

Best,

Tahir

Re: Apache Crunch Passing a Hash Map to DoFn

Posted by Gabriel Reid <ga...@gmail.com>.
Hi Tahir,

Good to hear you got it going.

It's difficult to say what the underlying issue would have been in
your original version (with materialize set to to true) without seeing
the code, but my guess is that there is an issue with reading a
materialized collection that is taken directly from a Source without
any DoFns between the original input and where it's being converted to
a ReadableData.

- Gabriel


On Wed, Sep 23, 2015 at 1:58 PM, Tahir Hameed <ta...@gmail.com> wrote:
> I solved the problem by setting materialize to false while getting the
> readable : myClassData.asReadable(false)  . Though I am still not sure why
> this happens.
>
>
> Tahir
>
> Tahir
>
> On Wed, Sep 23, 2015 at 1:36 PM, Tahir Hameed <ta...@gmail.com> wrote:
>>
>> Hi Gabriel,
>>
>> Thanks for the answer. After implementing what you suggested, I am getting
>> the following error:
>>
>> 2015-09-23 13:23:10,859 WARN [main] org.apache.hadoop.mapred.YarnChild:
>> Exception running child : org.apache.crunch.CrunchRuntimeException: Can't
>> find local cache file for '/tmp/crunch-253557813/p1'
>> 	at
>> org.apache.crunch.io.impl.ReadableDataImpl.getCacheFilePath(ReadableDataImpl.java:81)
>> 	at
>> org.apache.crunch.io.impl.ReadableDataImpl.access$000(ReadableDataImpl.java:42)
>> 	at
>> org.apache.crunch.io.impl.ReadableDataImpl$1.apply(ReadableDataImpl.java:93)
>> 	at
>> org.apache.crunch.io.impl.ReadableDataImpl$1.apply(ReadableDataImpl.java:90)
>> 	at
>> com.google.common.collect.Lists$TransformingRandomAccessList.get(Lists.java:451)
>> 	at java.util.AbstractList$Itr.next(AbstractList.java:358)
>> 	at com.google.common.collect.Iterables$3.next(Iterables.java:508)
>> 	at com.google.common.collect.Iterables$3.next(Iterables.java:501)
>> 	at com.google.common.collect.Iterators$5.hasNext(Iterators.java:544)
>> 	at
>> com.bol.step.enrichmentdashboard.ProductsDoFN.initialize(ProductsDoFN.java:35)
>> 	at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:71)
>> 	at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:73)
>> 	at org.apache.crunch.impl.mr.run.CrunchMapper.setup(CrunchMapper.java:48)
>> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>
>>
>>
>>  Can you suggest where I can be going wrong?
>>
>>
>> Tahir
>>
>>
>> On Wed, Sep 23, 2015 at 11:57 AM, Gabriel Reid <ga...@gmail.com>
>> wrote:
>>>
>>> Hi Tahir,
>>>
>>> If I understand correctly, then you're trying to load the contents of
>>> a PTable into memory within a DoFn.
>>>
>>> This can be done via the PCollection.asReadable method. A couple of
>>> examples of this can be seen in the BloomFilterJoinStrategy.join and
>>> MapsideJoinStrategy.joinInternal methods. The general idea is that you
>>> pass a ReadableData instances into the constructor of you DoFn, and
>>> then you can access the contents of the underlying PCollection by
>>> iterating over the ReadableData within the initialize method of your
>>> DoFn.
>>>
>>> - Gabriel
>>>
>>>
>>> On Wed, Sep 23, 2015 at 9:56 AM, Tahir Hameed <ta...@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > I've a PTable which I store as an Avro file. The PTable file is later
>>> > to be
>>> > used in another DoFn after it is converted into a HashMap.
>>> >
>>> > PTable<String, MyClass> myClassData = table.parallelDo(new
>>> >
>>> > MyClassDoFN(),Avros.tableOf(Avros.strings(),Avros.reflects(MyClass.class)));
>>> > Target target=To.avroFile("/user/xyz/output/");
>>> > myClassData.write(target,Target.WriteMode.OVERWRITE);
>>> >
>>> > Can you please tell me how this file maybe read in another DoFn?
>>> >
>>> > Best,
>>> >
>>> > Tahir
>>> >
>>> >
>>
>>
>

Re: Apache Crunch Passing a Hash Map to DoFn

Posted by Tahir Hameed <ta...@gmail.com>.
I solved the problem by setting materialize to false while getting the
readable : myClassData.asReadable(false)  . Though I am still not sure why
this happens.


Tahir

Tahir

On Wed, Sep 23, 2015 at 1:36 PM, Tahir Hameed <ta...@gmail.com> wrote:

> Hi Gabriel,
>
> Thanks for the answer. After implementing what you suggested, I am getting
> the following error:
>
> 2015-09-23 13:23:10,859 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.crunch.CrunchRuntimeException: Can't find local cache file for '/tmp/crunch-253557813/p1'
> 	at org.apache.crunch.io.impl.ReadableDataImpl.getCacheFilePath(ReadableDataImpl.java:81)
> 	at org.apache.crunch.io.impl.ReadableDataImpl.access$000(ReadableDataImpl.java:42)
> 	at org.apache.crunch.io.impl.ReadableDataImpl$1.apply(ReadableDataImpl.java:93)
> 	at org.apache.crunch.io.impl.ReadableDataImpl$1.apply(ReadableDataImpl.java:90)
> 	at com.google.common.collect.Lists$TransformingRandomAccessList.get(Lists.java:451)
> 	at java.util.AbstractList$Itr.next(AbstractList.java:358)
> 	at com.google.common.collect.Iterables$3.next(Iterables.java:508)
> 	at com.google.common.collect.Iterables$3.next(Iterables.java:501)
> 	at com.google.common.collect.Iterators$5.hasNext(Iterators.java:544)
> 	at com.bol.step.enrichmentdashboard.ProductsDoFN.initialize(ProductsDoFN.java:35)
> 	at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:71)
> 	at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:73)
> 	at org.apache.crunch.impl.mr.run.CrunchMapper.setup(CrunchMapper.java:48)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>
>
>
>  Can you suggest where I can be going wrong?
>
>
> Tahir
>
>
> On Wed, Sep 23, 2015 at 11:57 AM, Gabriel Reid <ga...@gmail.com>
> wrote:
>
>> Hi Tahir,
>>
>> If I understand correctly, then you're trying to load the contents of
>> a PTable into memory within a DoFn.
>>
>> This can be done via the PCollection.asReadable method. A couple of
>> examples of this can be seen in the BloomFilterJoinStrategy.join and
>> MapsideJoinStrategy.joinInternal methods. The general idea is that you
>> pass a ReadableData instances into the constructor of you DoFn, and
>> then you can access the contents of the underlying PCollection by
>> iterating over the ReadableData within the initialize method of your
>> DoFn.
>>
>> - Gabriel
>>
>>
>> On Wed, Sep 23, 2015 at 9:56 AM, Tahir Hameed <ta...@gmail.com> wrote:
>> > Hi,
>> >
>> > I've a PTable which I store as an Avro file. The PTable file is later
>> to be
>> > used in another DoFn after it is converted into a HashMap.
>> >
>> > PTable<String, MyClass> myClassData = table.parallelDo(new
>> >
>> MyClassDoFN(),Avros.tableOf(Avros.strings(),Avros.reflects(MyClass.class)));
>> > Target target=To.avroFile("/user/xyz/output/");
>> > myClassData.write(target,Target.WriteMode.OVERWRITE);
>> >
>> > Can you please tell me how this file maybe read in another DoFn?
>> >
>> > Best,
>> >
>> > Tahir
>> >
>> >
>>
>
>

Re: Apache Crunch Passing a Hash Map to DoFn

Posted by Tahir Hameed <ta...@gmail.com>.
Hi Gabriel,

Thanks for the answer. After implementing what you suggested, I am getting
the following error:

2015-09-23 13:23:10,859 WARN [main]
org.apache.hadoop.mapred.YarnChild: Exception running child :
org.apache.crunch.CrunchRuntimeException: Can't find local cache file
for '/tmp/crunch-253557813/p1'
	at org.apache.crunch.io.impl.ReadableDataImpl.getCacheFilePath(ReadableDataImpl.java:81)
	at org.apache.crunch.io.impl.ReadableDataImpl.access$000(ReadableDataImpl.java:42)
	at org.apache.crunch.io.impl.ReadableDataImpl$1.apply(ReadableDataImpl.java:93)
	at org.apache.crunch.io.impl.ReadableDataImpl$1.apply(ReadableDataImpl.java:90)
	at com.google.common.collect.Lists$TransformingRandomAccessList.get(Lists.java:451)
	at java.util.AbstractList$Itr.next(AbstractList.java:358)
	at com.google.common.collect.Iterables$3.next(Iterables.java:508)
	at com.google.common.collect.Iterables$3.next(Iterables.java:501)
	at com.google.common.collect.Iterators$5.hasNext(Iterators.java:544)
	at com.bol.step.enrichmentdashboard.ProductsDoFN.initialize(ProductsDoFN.java:35)
	at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:71)
	at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:73)
	at org.apache.crunch.impl.mr.run.CrunchMapper.setup(CrunchMapper.java:48)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)



 Can you suggest where I can be going wrong?


Tahir


On Wed, Sep 23, 2015 at 11:57 AM, Gabriel Reid <ga...@gmail.com>
wrote:

> Hi Tahir,
>
> If I understand correctly, then you're trying to load the contents of
> a PTable into memory within a DoFn.
>
> This can be done via the PCollection.asReadable method. A couple of
> examples of this can be seen in the BloomFilterJoinStrategy.join and
> MapsideJoinStrategy.joinInternal methods. The general idea is that you
> pass a ReadableData instances into the constructor of you DoFn, and
> then you can access the contents of the underlying PCollection by
> iterating over the ReadableData within the initialize method of your
> DoFn.
>
> - Gabriel
>
>
> On Wed, Sep 23, 2015 at 9:56 AM, Tahir Hameed <ta...@gmail.com> wrote:
> > Hi,
> >
> > I've a PTable which I store as an Avro file. The PTable file is later to
> be
> > used in another DoFn after it is converted into a HashMap.
> >
> > PTable<String, MyClass> myClassData = table.parallelDo(new
> >
> MyClassDoFN(),Avros.tableOf(Avros.strings(),Avros.reflects(MyClass.class)));
> > Target target=To.avroFile("/user/xyz/output/");
> > myClassData.write(target,Target.WriteMode.OVERWRITE);
> >
> > Can you please tell me how this file maybe read in another DoFn?
> >
> > Best,
> >
> > Tahir
> >
> >
>

Re: Apache Crunch Passing a Hash Map to DoFn

Posted by Gabriel Reid <ga...@gmail.com>.
Hi Tahir,

If I understand correctly, then you're trying to load the contents of
a PTable into memory within a DoFn.

This can be done via the PCollection.asReadable method. A couple of
examples of this can be seen in the BloomFilterJoinStrategy.join and
MapsideJoinStrategy.joinInternal methods. The general idea is that you
pass a ReadableData instances into the constructor of you DoFn, and
then you can access the contents of the underlying PCollection by
iterating over the ReadableData within the initialize method of your
DoFn.

- Gabriel


On Wed, Sep 23, 2015 at 9:56 AM, Tahir Hameed <ta...@gmail.com> wrote:
> Hi,
>
> I've a PTable which I store as an Avro file. The PTable file is later to be
> used in another DoFn after it is converted into a HashMap.
>
> PTable<String, MyClass> myClassData = table.parallelDo(new
> MyClassDoFN(),Avros.tableOf(Avros.strings(),Avros.reflects(MyClass.class)));
> Target target=To.avroFile("/user/xyz/output/");
> myClassData.write(target,Target.WriteMode.OVERWRITE);
>
> Can you please tell me how this file maybe read in another DoFn?
>
> Best,
>
> Tahir
>
>