You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Mridul Muralidharan <mr...@yahoo-inc.com> on 2009/03/12 12:49:52 UTC

Custom partitioner in pig

Hi,

   Is there a way to specify or write a custom partitioner in pig ?
Not split - partition data in a specific way - for some custom job.

Thanks,
Mridul

Re: Custom partitioner in pig

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Hi,

   I had gone through Hama wiki, and the basic idea of what we were 
thinking off was quite similar to the dense matrix multiplication case 
in hama.
Addition of another dependency (on hbase) was something we wanted to 
avoid - hence looking at ways to bring "relevant blocks together" - 
looked like custom partitioning was a simple enough trick for this ...


Thanks,
Mridul

Edward J. Yoon wrote:
> Hi,
> 
> Interesting. FYI, We're use the Hbase. Check this out --
> http://wiki.apache.org/hama/Architecture#head-29381b028f7a92606e6a3a59722e1ca084a91ab8
> 
> I think there is no way to sequentially collect the blocks at once.
> Iterative jobs or Hbase will be needed.
> 
> If you have more good idea, pls let us know, too.
> Thanks.
> 


Re: Custom partitioner in pig

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Hi,

   I had gone through Hama wiki, and the basic idea of what we were 
thinking off was quite similar to the dense matrix multiplication case 
in hama.
Addition of another dependency (on hbase) was something we wanted to 
avoid - hence looking at ways to bring "relevant blocks together" - 
looked like custom partitioning was a simple enough trick for this ...


Thanks,
Mridul

Edward J. Yoon wrote:
> Hi,
> 
> Interesting. FYI, We're use the Hbase. Check this out --
> http://wiki.apache.org/hama/Architecture#head-29381b028f7a92606e6a3a59722e1ca084a91ab8
> 
> I think there is no way to sequentially collect the blocks at once.
> Iterative jobs or Hbase will be needed.
> 
> If you have more good idea, pls let us know, too.
> Thanks.
> 


Re: Custom partitioner in pig

Posted by "Edward J. Yoon" <ed...@apache.org>.
Hi,

Interesting. FYI, We're use the Hbase. Check this out --
http://wiki.apache.org/hama/Architecture#head-29381b028f7a92606e6a3a59722e1ca084a91ab8

I think there is no way to sequentially collect the blocks at once.
Iterative jobs or Hbase will be needed.

If you have more good idea, pls let us know, too.
Thanks.

-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

On Tue, Mar 17, 2009 at 1:08 AM, Mridul Muralidharan
<mr...@yahoo-inc.com> wrote:
>
> Hi,
>
>  My goal here is slightly different - and might not very 'nicely' fit into
> pig.
> Idea is to implement dense block matrix multiplication - and an efficient
> way to do that is to pair up the blocks as required in the partitioner
> itself (as opposed to partitioning for a single table as the jira seems to
> suggest).
>
> Once this is supported, matrix mult becomes just block division udf, custom
> partitioning, block multiplication of the partitioned blocks, summation of
> the results to get result.
> Hama guys are attempting something similar, though I did not see too much as
> 'code'.
>
>
> So assuming there is no way to do this currently in pig, I will need to
> investigate other options I guess.
>
> Thanks,
> Mridul
>
> Alan Gates wrote:
>>
>> Not yet, but we've had other requests for it.
>>  https://issues.apache.org/jira/browse/PIG-282
>>
>> Alan.
>>
>> On Mar 12, 2009, at 4:49 AM, Mridul Muralidharan wrote:
>>
>>> Hi,
>>>
>>>  Is there a way to specify or write a custom partitioner in pig ?
>>> Not split - partition data in a specific way - for some custom job.
>>>
>>> Thanks,
>>> Mridul
>>
>
>

Re: Custom partitioner in pig

Posted by "Edward J. Yoon" <ed...@apache.org>.
Hi,

Interesting. FYI, We're use the Hbase. Check this out --
http://wiki.apache.org/hama/Architecture#head-29381b028f7a92606e6a3a59722e1ca084a91ab8

I think there is no way to sequentially collect the blocks at once.
Iterative jobs or Hbase will be needed.

If you have more good idea, pls let us know, too.
Thanks.

-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

On Tue, Mar 17, 2009 at 1:08 AM, Mridul Muralidharan
<mr...@yahoo-inc.com> wrote:
>
> Hi,
>
>  My goal here is slightly different - and might not very 'nicely' fit into
> pig.
> Idea is to implement dense block matrix multiplication - and an efficient
> way to do that is to pair up the blocks as required in the partitioner
> itself (as opposed to partitioning for a single table as the jira seems to
> suggest).
>
> Once this is supported, matrix mult becomes just block division udf, custom
> partitioning, block multiplication of the partitioned blocks, summation of
> the results to get result.
> Hama guys are attempting something similar, though I did not see too much as
> 'code'.
>
>
> So assuming there is no way to do this currently in pig, I will need to
> investigate other options I guess.
>
> Thanks,
> Mridul
>
> Alan Gates wrote:
>>
>> Not yet, but we've had other requests for it.
>>  https://issues.apache.org/jira/browse/PIG-282
>>
>> Alan.
>>
>> On Mar 12, 2009, at 4:49 AM, Mridul Muralidharan wrote:
>>
>>> Hi,
>>>
>>>  Is there a way to specify or write a custom partitioner in pig ?
>>> Not split - partition data in a specific way - for some custom job.
>>>
>>> Thanks,
>>> Mridul
>>
>
>

Re: Custom partitioner in pig

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Hi,

   My goal here is slightly different - and might not very 'nicely' fit 
into pig.
Idea is to implement dense block matrix multiplication - and an 
efficient way to do that is to pair up the blocks as required in the 
partitioner itself (as opposed to partitioning for a single table as the 
jira seems to suggest).

Once this is supported, matrix mult becomes just block division udf, 
custom partitioning, block multiplication of the partitioned blocks, 
summation of the results to get result.
Hama guys are attempting something similar, though I did not see too 
much as 'code'.


So assuming there is no way to do this currently in pig, I will need to 
investigate other options I guess.

Thanks,
Mridul

Alan Gates wrote:
> Not yet, but we've had other requests for it.  
> https://issues.apache.org/jira/browse/PIG-282
> 
> Alan.
> 
> On Mar 12, 2009, at 4:49 AM, Mridul Muralidharan wrote:
> 
>> Hi,
>>
>>  Is there a way to specify or write a custom partitioner in pig ?
>> Not split - partition data in a specific way - for some custom job.
>>
>> Thanks,
>> Mridul
> 


Re: Custom partitioner in pig

Posted by Alan Gates <ga...@yahoo-inc.com>.
Not yet, but we've had other requests for it.  https://issues.apache.org/jira/browse/PIG-282

Alan.

On Mar 12, 2009, at 4:49 AM, Mridul Muralidharan wrote:

> Hi,
>
>  Is there a way to specify or write a custom partitioner in pig ?
> Not split - partition data in a specific way - for some custom job.
>
> Thanks,
> Mridul