You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Ashish <pa...@gmail.com> on 2012/11/22 10:45:06 UTC

Finding Input Split from DoFn

Hi All,

Is there a way to find the InputSplit from within an implementation of DoFn?

I am trying to implement Inverted Index example using crunch. Have tried
peeking in DoFn code, but couldn't find a way to retrieve InputSplit. Can
someone point me in right direction.

-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: Finding Input Split from DoFn

Posted by Ashish <pa...@gmail.com>.
Post completed. Here is the link
http://goo.gl/tBptp<http://www.linkedin.com/nus-trk?trkact=viewShareLink&pk=network_update_snippet&pp=0&poster=13896620&uid=5677389813070716928&ut=NUS_UNIU_SHARE&r=&f=0&url=http%3A%2F%2Fwww%2Elinkedin%2Ecom%2Fshare%3FviewLink%3D%26sid%3Ds1331190160%26url%3Dhttp%253A%252F%252Fgoo%252Egl%252FtBptp%26urlhash%3DtLSO%26uid%3D5677389813070716928%26trk%3DNUS_UNIU_SHARE-lnk&urlhash=F1bf>

Comments/suggestion and more ideas are welcome.


On Thu, Nov 22, 2012 at 7:28 PM, Ashish <pa...@gmail.com> wrote:

> Thanks Josh !
>
> It worked, my inverted index example using Crunch is complete. Slowly
> getting addicted to crunch coding style.
>
>
> On Thu, Nov 22, 2012 at 4:05 PM, Josh Wills <jw...@cloudera.com> wrote:
>
>> getContext() from inside of a DoFn during or after initialize() will
>> return the TaskInputOutputContext, which will be a MapContext when you call
>> it from a Mapper, and MapContext has a getInputSplit() method. We don't
>> normally want a DoFn to worry about whether it's on the map-side or the
>> reduce-side of a MapReduce job, so we don't indicate the distinction by
>> default, which means you need to do something like:
>>
>> if (getContext() instanceof MapContext) {
>>   InputSplit split = ((MapContext) getContext()).getInputSplit()
>> }
>>
>> which is a little ugly-- sorry about that.
>>
>> J
>>
>>
>> On Thu, Nov 22, 2012 at 1:45 AM, Ashish <pa...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Is there a way to find the InputSplit from within an implementation of
>>> DoFn?
>>>
>>>  I am trying to implement Inverted Index example using crunch. Have
>>> tried peeking in DoFn code, but couldn't find a way to retrieve InputSplit.
>>> Can someone point me in right direction.
>>>
>>> --
>>> thanks
>>> ashish
>>>
>>> Blog: http://www.ashishpaliwal.com/blog
>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera <http://www.cloudera.com>
>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>
>>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: Finding Input Split from DoFn

Posted by Ashish <pa...@gmail.com>.
Thanks Josh !

It worked, my inverted index example using Crunch is complete. Slowly
getting addicted to crunch coding style.


On Thu, Nov 22, 2012 at 4:05 PM, Josh Wills <jw...@cloudera.com> wrote:

> getContext() from inside of a DoFn during or after initialize() will
> return the TaskInputOutputContext, which will be a MapContext when you call
> it from a Mapper, and MapContext has a getInputSplit() method. We don't
> normally want a DoFn to worry about whether it's on the map-side or the
> reduce-side of a MapReduce job, so we don't indicate the distinction by
> default, which means you need to do something like:
>
> if (getContext() instanceof MapContext) {
>   InputSplit split = ((MapContext) getContext()).getInputSplit()
> }
>
> which is a little ugly-- sorry about that.
>
> J
>
>
> On Thu, Nov 22, 2012 at 1:45 AM, Ashish <pa...@gmail.com> wrote:
>
>> Hi All,
>>
>> Is there a way to find the InputSplit from within an implementation of
>> DoFn?
>>
>>  I am trying to implement Inverted Index example using crunch. Have
>> tried peeking in DoFn code, but couldn't find a way to retrieve InputSplit.
>> Can someone point me in right direction.
>>
>> --
>> thanks
>> ashish
>>
>> Blog: http://www.ashishpaliwal.com/blog
>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>>
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>
>


-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: Finding Input Split from DoFn

Posted by Josh Wills <jw...@cloudera.com>.
getContext() from inside of a DoFn during or after initialize() will return
the TaskInputOutputContext, which will be a MapContext when you call it
from a Mapper, and MapContext has a getInputSplit() method. We don't
normally want a DoFn to worry about whether it's on the map-side or the
reduce-side of a MapReduce job, so we don't indicate the distinction by
default, which means you need to do something like:

if (getContext() instanceof MapContext) {
  InputSplit split = ((MapContext) getContext()).getInputSplit()
}

which is a little ugly-- sorry about that.

J


On Thu, Nov 22, 2012 at 1:45 AM, Ashish <pa...@gmail.com> wrote:

> Hi All,
>
> Is there a way to find the InputSplit from within an implementation of
> DoFn?
>
> I am trying to implement Inverted Index example using crunch. Have tried
> peeking in DoFn code, but couldn't find a way to retrieve InputSplit. Can
> someone point me in right direction.
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>