You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Mridul Muralidharan <mr...@yahoo-inc.com> on 2009/03/12 12:49:52 UTC
Custom partitioner in pig
Hi,
Is there a way to specify or write a custom partitioner in pig ?
Not split - partition data in a specific way - for some custom job.
Thanks,
Mridul
Re: Custom partitioner in pig
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Hi,
I had gone through Hama wiki, and the basic idea of what we were
thinking off was quite similar to the dense matrix multiplication case
in hama.
Addition of another dependency (on hbase) was something we wanted to
avoid - hence looking at ways to bring "relevant blocks together" -
looked like custom partitioning was a simple enough trick for this ...
Thanks,
Mridul
Edward J. Yoon wrote:
> Hi,
>
> Interesting. FYI, We're use the Hbase. Check this out --
> http://wiki.apache.org/hama/Architecture#head-29381b028f7a92606e6a3a59722e1ca084a91ab8
>
> I think there is no way to sequentially collect the blocks at once.
> Iterative jobs or Hbase will be needed.
>
> If you have more good idea, pls let us know, too.
> Thanks.
>
Re: Custom partitioner in pig
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Hi,
I had gone through Hama wiki, and the basic idea of what we were
thinking off was quite similar to the dense matrix multiplication case
in hama.
Addition of another dependency (on hbase) was something we wanted to
avoid - hence looking at ways to bring "relevant blocks together" -
looked like custom partitioning was a simple enough trick for this ...
Thanks,
Mridul
Edward J. Yoon wrote:
> Hi,
>
> Interesting. FYI, We're use the Hbase. Check this out --
> http://wiki.apache.org/hama/Architecture#head-29381b028f7a92606e6a3a59722e1ca084a91ab8
>
> I think there is no way to sequentially collect the blocks at once.
> Iterative jobs or Hbase will be needed.
>
> If you have more good idea, pls let us know, too.
> Thanks.
>
Re: Custom partitioner in pig
Posted by "Edward J. Yoon" <ed...@apache.org>.
Hi,
Interesting. FYI, We're use the Hbase. Check this out --
http://wiki.apache.org/hama/Architecture#head-29381b028f7a92606e6a3a59722e1ca084a91ab8
I think there is no way to sequentially collect the blocks at once.
Iterative jobs or Hbase will be needed.
If you have more good idea, pls let us know, too.
Thanks.
--
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org
On Tue, Mar 17, 2009 at 1:08 AM, Mridul Muralidharan
<mr...@yahoo-inc.com> wrote:
>
> Hi,
>
> My goal here is slightly different - and might not very 'nicely' fit into
> pig.
> Idea is to implement dense block matrix multiplication - and an efficient
> way to do that is to pair up the blocks as required in the partitioner
> itself (as opposed to partitioning for a single table as the jira seems to
> suggest).
>
> Once this is supported, matrix mult becomes just block division udf, custom
> partitioning, block multiplication of the partitioned blocks, summation of
> the results to get result.
> Hama guys are attempting something similar, though I did not see too much as
> 'code'.
>
>
> So assuming there is no way to do this currently in pig, I will need to
> investigate other options I guess.
>
> Thanks,
> Mridul
>
> Alan Gates wrote:
>>
>> Not yet, but we've had other requests for it.
>> https://issues.apache.org/jira/browse/PIG-282
>>
>> Alan.
>>
>> On Mar 12, 2009, at 4:49 AM, Mridul Muralidharan wrote:
>>
>>> Hi,
>>>
>>> Is there a way to specify or write a custom partitioner in pig ?
>>> Not split - partition data in a specific way - for some custom job.
>>>
>>> Thanks,
>>> Mridul
>>
>
>
Re: Custom partitioner in pig
Posted by "Edward J. Yoon" <ed...@apache.org>.
Hi,
Interesting. FYI, We're use the Hbase. Check this out --
http://wiki.apache.org/hama/Architecture#head-29381b028f7a92606e6a3a59722e1ca084a91ab8
I think there is no way to sequentially collect the blocks at once.
Iterative jobs or Hbase will be needed.
If you have more good idea, pls let us know, too.
Thanks.
--
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org
On Tue, Mar 17, 2009 at 1:08 AM, Mridul Muralidharan
<mr...@yahoo-inc.com> wrote:
>
> Hi,
>
> My goal here is slightly different - and might not very 'nicely' fit into
> pig.
> Idea is to implement dense block matrix multiplication - and an efficient
> way to do that is to pair up the blocks as required in the partitioner
> itself (as opposed to partitioning for a single table as the jira seems to
> suggest).
>
> Once this is supported, matrix mult becomes just block division udf, custom
> partitioning, block multiplication of the partitioned blocks, summation of
> the results to get result.
> Hama guys are attempting something similar, though I did not see too much as
> 'code'.
>
>
> So assuming there is no way to do this currently in pig, I will need to
> investigate other options I guess.
>
> Thanks,
> Mridul
>
> Alan Gates wrote:
>>
>> Not yet, but we've had other requests for it.
>> https://issues.apache.org/jira/browse/PIG-282
>>
>> Alan.
>>
>> On Mar 12, 2009, at 4:49 AM, Mridul Muralidharan wrote:
>>
>>> Hi,
>>>
>>> Is there a way to specify or write a custom partitioner in pig ?
>>> Not split - partition data in a specific way - for some custom job.
>>>
>>> Thanks,
>>> Mridul
>>
>
>
Re: Custom partitioner in pig
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Hi,
My goal here is slightly different - and might not very 'nicely' fit
into pig.
Idea is to implement dense block matrix multiplication - and an
efficient way to do that is to pair up the blocks as required in the
partitioner itself (as opposed to partitioning for a single table as the
jira seems to suggest).
Once this is supported, matrix mult becomes just block division udf,
custom partitioning, block multiplication of the partitioned blocks,
summation of the results to get result.
Hama guys are attempting something similar, though I did not see too
much as 'code'.
So assuming there is no way to do this currently in pig, I will need to
investigate other options I guess.
Thanks,
Mridul
Alan Gates wrote:
> Not yet, but we've had other requests for it.
> https://issues.apache.org/jira/browse/PIG-282
>
> Alan.
>
> On Mar 12, 2009, at 4:49 AM, Mridul Muralidharan wrote:
>
>> Hi,
>>
>> Is there a way to specify or write a custom partitioner in pig ?
>> Not split - partition data in a specific way - for some custom job.
>>
>> Thanks,
>> Mridul
>
Re: Custom partitioner in pig
Posted by Alan Gates <ga...@yahoo-inc.com>.
Not yet, but we've had other requests for it. https://issues.apache.org/jira/browse/PIG-282
Alan.
On Mar 12, 2009, at 4:49 AM, Mridul Muralidharan wrote:
> Hi,
>
> Is there a way to specify or write a custom partitioner in pig ?
> Not split - partition data in a specific way - for some custom job.
>
> Thanks,
> Mridul