You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "yingnan.ma" <yi...@ipinyou.com> on 2012/11/13 08:46:17 UTC

distributed cache

Hi , 

I used the distributed cache in the hadoop though the "setup" and "static" store an hashset in the mem;

and I try to use the distributed cache in the Pig, and I don't know how to store an hashset in the mem,I just can cache the file in the mem.

Any advise would be fine, Thank you so much!

Best Regards

Malone

2012-11-13 

  

Re: Re: Re: distributed cache

Posted by "yingnan.ma" <yi...@ipinyou.com>.
when I use the distributed cache , I found that when the file is more than 100MB or the number of records are more than 10 million , the file can not be cache in the memory; and I try to set the io.sort.mb is 200MB ; it still can not work, Any suggestion would be fine! Thank you !
 


2012-11-16 




发件人: yingnan.ma 
发送时间: 2012-11-15  11:48:04 
收件人: user 
抄送: 
主题: Re: Re: distributed cache 
 
Thank you so much! Both Replicated join and UDF to use
distributed cache are useful for me, I am already done it , Thank you again.
2012-11-15 
yingnan.ma 
发件人: Prashant Kommireddi 
发送时间: 2012-11-15  03:52:09 
收件人: user@pig.apache.org 
抄送: 
主题: Re: distributed cache 

If it's for purposes other than a Join, you could write a UDF to use
distributed cache. Look at the section "Loading the Distributed Cache"
http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html
On Wed, Nov 14, 2012 at 11:44 AM, Ruslan Al-Fakikh <me...@gmail.com>wrote:
> Maybe this is what you are looking for:
> http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html
> see "Replicated join"
>
>
> On Tue, Nov 13, 2012 at 11:46 AM, yingnan.ma <yi...@ipinyou.com>
> wrote:
>
> > Hi ,
> >
> > I used the distributed cache in the hadoop though the "setup" and
> "static"
> > store an hashset in the mem;
> >
> > and I try to use the distributed cache in the Pig, and I don't know how
> to
> > store an hashset in the mem,I just can cache the file in the mem.
> >
> > Any advise would be fine, Thank you so much!
> >
> > Best Regards
> >
> > Malone
> >
> > 2012-11-13
> >
> >
> >
>

Re: Re: distributed cache

Posted by "yingnan.ma" <yi...@ipinyou.com>.
Thank you so much! Both Replicated join and UDF to use
distributed cache are useful for me, I am already done it , Thank you again.


2012-11-15 



yingnan.ma 



发件人: Prashant Kommireddi 
发送时间: 2012-11-15  03:52:09 
收件人: user@pig.apache.org 
抄送: 
主题: Re: distributed cache 
 
If it's for purposes other than a Join, you could write a UDF to use
distributed cache. Look at the section "Loading the Distributed Cache"
http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html
On Wed, Nov 14, 2012 at 11:44 AM, Ruslan Al-Fakikh <me...@gmail.com>wrote:
> Maybe this is what you are looking for:
> http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html
> see "Replicated join"
>
>
> On Tue, Nov 13, 2012 at 11:46 AM, yingnan.ma <yi...@ipinyou.com>
> wrote:
>
> > Hi ,
> >
> > I used the distributed cache in the hadoop though the "setup" and
> "static"
> > store an hashset in the mem;
> >
> > and I try to use the distributed cache in the Pig, and I don't know how
> to
> > store an hashset in the mem,I just can cache the file in the mem.
> >
> > Any advise would be fine, Thank you so much!
> >
> > Best Regards
> >
> > Malone
> >
> > 2012-11-13
> >
> >
> >
>

Re: distributed cache

Posted by Prashant Kommireddi <pr...@gmail.com>.
If it's for purposes other than a Join, you could write a UDF to use
distributed cache. Look at the section "Loading the Distributed Cache"
http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html


On Wed, Nov 14, 2012 at 11:44 AM, Ruslan Al-Fakikh <me...@gmail.com>wrote:

> Maybe this is what you are looking for:
> http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html
> see "Replicated join"
>
>
> On Tue, Nov 13, 2012 at 11:46 AM, yingnan.ma <yi...@ipinyou.com>
> wrote:
>
> > Hi ,
> >
> > I used the distributed cache in the hadoop though the "setup" and
> "static"
> > store an hashset in the mem;
> >
> > and I try to use the distributed cache in the Pig, and I don't know how
> to
> > store an hashset in the mem,I just can cache the file in the mem.
> >
> > Any advise would be fine, Thank you so much!
> >
> > Best Regards
> >
> > Malone
> >
> > 2012-11-13
> >
> >
> >
>

Re: distributed cache

Posted by Ruslan Al-Fakikh <me...@gmail.com>.
Maybe this is what you are looking for:
http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html
see "Replicated join"


On Tue, Nov 13, 2012 at 11:46 AM, yingnan.ma <yi...@ipinyou.com> wrote:

> Hi ,
>
> I used the distributed cache in the hadoop though the "setup" and "static"
> store an hashset in the mem;
>
> and I try to use the distributed cache in the Pig, and I don't know how to
> store an hashset in the mem,I just can cache the file in the mem.
>
> Any advise would be fine, Thank you so much!
>
> Best Regards
>
> Malone
>
> 2012-11-13
>
>
>