You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by Gang Luo <lg...@yahoo.com.cn> on 2010/06/08 00:40:06 UTC

distributed cache in pig

HI all,
I notice that whether pig use distributed cache depends on the context (local or mapreduce). When running in mapreduce mode, the distributed cache is always enable (e.g. replicated join). However, I never find such method, DistributedCache.getLocalCacheFiles(job), which get the cached file from the local disk. So, how does pig read these files from local disk? I am looking at the pig 0.7 source code.

Thanks,
-Gang

Re: distributed cache in pig

Posted by Gang Luo <lg...@yahoo.com.cn>.

Thanks Olga. But what it running in mapreduce mode? Once the distributed cache is enable in this mode, there should still be some way to read these cached files. Actually, searching all the source files in pig-0.7, I can't find 'DistributedCache.getLocalCacheFiles' anywhere and I soppose there is no other way to read cached files. This is what confuse me. Any other ideas?

-Gang




----- 原始邮件 ----
发件人： Olga Natkovich <ol...@yahoo-inc.com>
收件人： pig-dev@hadoop.apache.org
发送日期： 2010/6/7 (周一) 6:50:01 下午
主   题： RE: distributed cache in pig

This is because Hadoop 20 does not support distributed cache in local
mode. My understanding is that it would be part of Hadoop 22.

Olga

-----Original Message-----
From: Gang Luo [mailto:lgpublic@yahoo.com.cn] 
Sent: Monday, June 07, 2010 3:40 PM
To: pig-dev@hadoop.apache.org
Subject: distributed cache in pig

HI all,
I notice that whether pig use distributed cache depends on the context
(local or mapreduce). When running in mapreduce mode, the distributed
cache is always enable (e.g. replicated join). However, I never find
such method, DistributedCache.getLocalCacheFiles(job), which get the
cached file from the local disk. So, how does pig read these files from
local disk? I am looking at the pig 0.7 source code.

Thanks,
-Gang

RE: distributed cache in pig

Posted by Olga Natkovich <ol...@yahoo-inc.com>.

This is because Hadoop 20 does not support distributed cache in local
mode. My understanding is that it would be part of Hadoop 22.

Olga

-----Original Message-----
From: Gang Luo [mailto:lgpublic@yahoo.com.cn] 
Sent: Monday, June 07, 2010 3:40 PM
To: pig-dev@hadoop.apache.org
Subject: distributed cache in pig

HI all,
I notice that whether pig use distributed cache depends on the context
(local or mapreduce). When running in mapreduce mode, the distributed
cache is always enable (e.g. replicated join). However, I never find
such method, DistributedCache.getLocalCacheFiles(job), which get the
cached file from the local disk. So, how does pig read these files from
local disk? I am looking at the pig 0.7 source code.

Thanks,
-Gang