You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Peter Cogan <pe...@gmail.com> on 2012/12/06 17:59:53 UTC
Problem using distributed cache
Hi ,
I want to use the distributed cache to allow my mappers to access data. In
main, I'm using the command
DistributedCache.addCacheFile(new
URI("/user/peter/cacheFile/testCache1"), conf);
Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
Then, my setup function looks like this:
public void setup(Context context) throws IOException, InterruptedException{
Configuration conf = context.getConfiguration();
Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
//etc
}
However, this localFiles array is always null.
I was initially running on a single-host cluster for testing, but I read
that this will prevent the distributed cache from working. I tried with a
pseudo-distributed, but that didn't work either
I'm using hadoop 1.0.3
thanks Peter
Re: Problem using distributed cache
Posted by be...@gmail.com.
Hi Peter
Can you try the following in your code
1. Driver class to implement Tools interface
2. Do a getConfiguration() rather than creating a new conf instance.
DC should be working with the above mentioned modifications to code.
Sent on my BlackBerry® from Vodafone
-----Original Message-----
From: Peter Cogan <pe...@gmail.com>
Date: Fri, 7 Dec 2012 14:06:41
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Problem using distributed cache
Hi,
any thoughts on this would be much appreciated
thanks
Peter
On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
> Job job = new Job(conf, "wordcount");
>
>
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> > Configuration conf = context.getConfiguration();
>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> > //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
Re: Problem using distributed cache
Posted by Dhaval Shah <pr...@yahoo.co.in>.
You will need to add the cache file to distributed cache before creating the Job object.. Give that a spin and see if that works
Regards,
Dhaval
________________________________
From: Peter Cogan <pe...@gmail.com>
To: user@hadoop.apache.org
Sent: Friday, 7 December 2012 9:06 AM
Subject: Re: Problem using distributed cache
Hi,
any thoughts on this would be much appreciated
thanks
Peter
On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
Hi,
>
>
>It's an instance created at the start of the program like this:
>
>
>public static void main(String[] args) throws Exception {
>Configuration conf = new Configuration();
>
>
>Job job = new Job(conf, "wordcount");
>
>
>
>
>DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
>
>
>
>
>
>On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>What is your conf object there? Is it job.getConfiguration() or an
>>independent instance?
>>
>>
>>On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>> Hi ,
>>>
>>> I want to use the distributed cache to allow my mappers to access data. In
>>> main, I'm using the command
>>>
>>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>>> conf);
>>>
>>> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>>
>>> Then, my setup function looks like this:
>>>
>>> public void setup(Context context) throws IOException, InterruptedException{
>>> Configuration conf = context.getConfiguration();
>>> Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>> //etc
>>> }
>>>
>>> However, this localFiles array is always null.
>>>
>>> I was initially running on a single-host cluster for testing, but I read
>>> that this will prevent the distributed cache from working. I tried with a
>>> pseudo-distributed, but that didn't work either
>>>
>>> I'm using hadoop 1.0.3
>>>
>>> thanks Peter
>>>
>>>
>>
>>
>>
>>--
>>Harsh J
>>
>
Re: Problem using distributed cache
Posted by Dhaval Shah <pr...@yahoo.co.in>.
You will need to add the cache file to distributed cache before creating the Job object.. Give that a spin and see if that works
Regards,
Dhaval
________________________________
From: Peter Cogan <pe...@gmail.com>
To: user@hadoop.apache.org
Sent: Friday, 7 December 2012 9:06 AM
Subject: Re: Problem using distributed cache
Hi,
any thoughts on this would be much appreciated
thanks
Peter
On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
Hi,
>
>
>It's an instance created at the start of the program like this:
>
>
>public static void main(String[] args) throws Exception {
>Configuration conf = new Configuration();
>
>
>Job job = new Job(conf, "wordcount");
>
>
>
>
>DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
>
>
>
>
>
>On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>What is your conf object there? Is it job.getConfiguration() or an
>>independent instance?
>>
>>
>>On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>> Hi ,
>>>
>>> I want to use the distributed cache to allow my mappers to access data. In
>>> main, I'm using the command
>>>
>>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>>> conf);
>>>
>>> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>>
>>> Then, my setup function looks like this:
>>>
>>> public void setup(Context context) throws IOException, InterruptedException{
>>> Configuration conf = context.getConfiguration();
>>> Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>> //etc
>>> }
>>>
>>> However, this localFiles array is always null.
>>>
>>> I was initially running on a single-host cluster for testing, but I read
>>> that this will prevent the distributed cache from working. I tried with a
>>> pseudo-distributed, but that didn't work either
>>>
>>> I'm using hadoop 1.0.3
>>>
>>> thanks Peter
>>>
>>>
>>
>>
>>
>>--
>>Harsh J
>>
>
Re: Problem using distributed cache
Posted by Peter Cogan <pe...@gmail.com>.
Hi Dhaval & Harsh,
thanks for coming back to the thread - you're both right I was doing things
in the wrong order. I hadn't realised that Job constructor clones the
configuration - that's very interesting!
thanks again
Peter
On Fri, Dec 7, 2012 at 2:25 PM, Harsh J <ha...@cloudera.com> wrote:
> Please try using job.getConfiguration() instead of the pre-job conf
> instance, cause the constructor clones it.
>
> On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> > Hi,
> >
> > any thoughts on this would be much appreciated
> >
> > thanks
> > Peter
> >
> >
> > On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> It's an instance created at the start of the program like this:
> >>
> >> public static void main(String[] args) throws Exception {
> >>
> >> Configuration conf = new Configuration();
> >>
> >>
> >> Job job = new Job(conf, "wordcount");
> >>
> >>
> >>
> >> DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> >> conf);
> >>
> >>
> >>
> >>
> >> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
> >>>
> >>> What is your conf object there? Is it job.getConfiguration() or an
> >>> independent instance?
> >>>
> >>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> >>> wrote:
> >>> > Hi ,
> >>> >
> >>> > I want to use the distributed cache to allow my mappers to access
> data.
> >>> > In
> >>> > main, I'm using the command
> >>> >
> >>> > DistributedCache.addCacheFile(new
> >>> > URI("/user/peter/cacheFile/testCache1"),
> >>> > conf);
> >>> >
> >>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >>> >
> >>> > Then, my setup function looks like this:
> >>> >
> >>> > public void setup(Context context) throws IOException,
> >>> > InterruptedException{
> >>> > Configuration conf = context.getConfiguration();
> >>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> >>> > //etc
> >>> > }
> >>> >
> >>> > However, this localFiles array is always null.
> >>> >
> >>> > I was initially running on a single-host cluster for testing, but I
> >>> > read
> >>> > that this will prevent the distributed cache from working. I tried
> with
> >>> > a
> >>> > pseudo-distributed, but that didn't work either
> >>> >
> >>> > I'm using hadoop 1.0.3
> >>> >
> >>> > thanks Peter
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>
> >>
> >
>
>
>
> --
> Harsh J
>
Re: Problem using distributed cache
Posted by Peter Cogan <pe...@gmail.com>.
Hi Dhaval & Harsh,
thanks for coming back to the thread - you're both right I was doing things
in the wrong order. I hadn't realised that Job constructor clones the
configuration - that's very interesting!
thanks again
Peter
On Fri, Dec 7, 2012 at 2:25 PM, Harsh J <ha...@cloudera.com> wrote:
> Please try using job.getConfiguration() instead of the pre-job conf
> instance, cause the constructor clones it.
>
> On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> > Hi,
> >
> > any thoughts on this would be much appreciated
> >
> > thanks
> > Peter
> >
> >
> > On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> It's an instance created at the start of the program like this:
> >>
> >> public static void main(String[] args) throws Exception {
> >>
> >> Configuration conf = new Configuration();
> >>
> >>
> >> Job job = new Job(conf, "wordcount");
> >>
> >>
> >>
> >> DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> >> conf);
> >>
> >>
> >>
> >>
> >> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
> >>>
> >>> What is your conf object there? Is it job.getConfiguration() or an
> >>> independent instance?
> >>>
> >>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> >>> wrote:
> >>> > Hi ,
> >>> >
> >>> > I want to use the distributed cache to allow my mappers to access
> data.
> >>> > In
> >>> > main, I'm using the command
> >>> >
> >>> > DistributedCache.addCacheFile(new
> >>> > URI("/user/peter/cacheFile/testCache1"),
> >>> > conf);
> >>> >
> >>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >>> >
> >>> > Then, my setup function looks like this:
> >>> >
> >>> > public void setup(Context context) throws IOException,
> >>> > InterruptedException{
> >>> > Configuration conf = context.getConfiguration();
> >>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> >>> > //etc
> >>> > }
> >>> >
> >>> > However, this localFiles array is always null.
> >>> >
> >>> > I was initially running on a single-host cluster for testing, but I
> >>> > read
> >>> > that this will prevent the distributed cache from working. I tried
> with
> >>> > a
> >>> > pseudo-distributed, but that didn't work either
> >>> >
> >>> > I'm using hadoop 1.0.3
> >>> >
> >>> > thanks Peter
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>
> >>
> >
>
>
>
> --
> Harsh J
>
Re: Problem using distributed cache
Posted by Peter Cogan <pe...@gmail.com>.
Hi Dhaval & Harsh,
thanks for coming back to the thread - you're both right I was doing things
in the wrong order. I hadn't realised that Job constructor clones the
configuration - that's very interesting!
thanks again
Peter
On Fri, Dec 7, 2012 at 2:25 PM, Harsh J <ha...@cloudera.com> wrote:
> Please try using job.getConfiguration() instead of the pre-job conf
> instance, cause the constructor clones it.
>
> On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> > Hi,
> >
> > any thoughts on this would be much appreciated
> >
> > thanks
> > Peter
> >
> >
> > On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> It's an instance created at the start of the program like this:
> >>
> >> public static void main(String[] args) throws Exception {
> >>
> >> Configuration conf = new Configuration();
> >>
> >>
> >> Job job = new Job(conf, "wordcount");
> >>
> >>
> >>
> >> DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> >> conf);
> >>
> >>
> >>
> >>
> >> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
> >>>
> >>> What is your conf object there? Is it job.getConfiguration() or an
> >>> independent instance?
> >>>
> >>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> >>> wrote:
> >>> > Hi ,
> >>> >
> >>> > I want to use the distributed cache to allow my mappers to access
> data.
> >>> > In
> >>> > main, I'm using the command
> >>> >
> >>> > DistributedCache.addCacheFile(new
> >>> > URI("/user/peter/cacheFile/testCache1"),
> >>> > conf);
> >>> >
> >>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >>> >
> >>> > Then, my setup function looks like this:
> >>> >
> >>> > public void setup(Context context) throws IOException,
> >>> > InterruptedException{
> >>> > Configuration conf = context.getConfiguration();
> >>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> >>> > //etc
> >>> > }
> >>> >
> >>> > However, this localFiles array is always null.
> >>> >
> >>> > I was initially running on a single-host cluster for testing, but I
> >>> > read
> >>> > that this will prevent the distributed cache from working. I tried
> with
> >>> > a
> >>> > pseudo-distributed, but that didn't work either
> >>> >
> >>> > I'm using hadoop 1.0.3
> >>> >
> >>> > thanks Peter
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>
> >>
> >
>
>
>
> --
> Harsh J
>
Re: Problem using distributed cache
Posted by Peter Cogan <pe...@gmail.com>.
Hi Dhaval & Harsh,
thanks for coming back to the thread - you're both right I was doing things
in the wrong order. I hadn't realised that Job constructor clones the
configuration - that's very interesting!
thanks again
Peter
On Fri, Dec 7, 2012 at 2:25 PM, Harsh J <ha...@cloudera.com> wrote:
> Please try using job.getConfiguration() instead of the pre-job conf
> instance, cause the constructor clones it.
>
> On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> > Hi,
> >
> > any thoughts on this would be much appreciated
> >
> > thanks
> > Peter
> >
> >
> > On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> It's an instance created at the start of the program like this:
> >>
> >> public static void main(String[] args) throws Exception {
> >>
> >> Configuration conf = new Configuration();
> >>
> >>
> >> Job job = new Job(conf, "wordcount");
> >>
> >>
> >>
> >> DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> >> conf);
> >>
> >>
> >>
> >>
> >> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
> >>>
> >>> What is your conf object there? Is it job.getConfiguration() or an
> >>> independent instance?
> >>>
> >>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> >>> wrote:
> >>> > Hi ,
> >>> >
> >>> > I want to use the distributed cache to allow my mappers to access
> data.
> >>> > In
> >>> > main, I'm using the command
> >>> >
> >>> > DistributedCache.addCacheFile(new
> >>> > URI("/user/peter/cacheFile/testCache1"),
> >>> > conf);
> >>> >
> >>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >>> >
> >>> > Then, my setup function looks like this:
> >>> >
> >>> > public void setup(Context context) throws IOException,
> >>> > InterruptedException{
> >>> > Configuration conf = context.getConfiguration();
> >>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> >>> > //etc
> >>> > }
> >>> >
> >>> > However, this localFiles array is always null.
> >>> >
> >>> > I was initially running on a single-host cluster for testing, but I
> >>> > read
> >>> > that this will prevent the distributed cache from working. I tried
> with
> >>> > a
> >>> > pseudo-distributed, but that didn't work either
> >>> >
> >>> > I'm using hadoop 1.0.3
> >>> >
> >>> > thanks Peter
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>
> >>
> >
>
>
>
> --
> Harsh J
>
Re: Problem using distributed cache
Posted by Harsh J <ha...@cloudera.com>.
Please try using job.getConfiguration() instead of the pre-job conf
instance, cause the constructor clones it.
On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> any thoughts on this would be much appreciated
>
> thanks
> Peter
>
>
> On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>
>> Hi,
>>
>> It's an instance created at the start of the program like this:
>>
>> public static void main(String[] args) throws Exception {
>>
>> Configuration conf = new Configuration();
>>
>>
>> Job job = new Job(conf, "wordcount");
>>
>>
>>
>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>> conf);
>>
>>
>>
>>
>> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> What is your conf object there? Is it job.getConfiguration() or an
>>> independent instance?
>>>
>>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>>> wrote:
>>> > Hi ,
>>> >
>>> > I want to use the distributed cache to allow my mappers to access data.
>>> > In
>>> > main, I'm using the command
>>> >
>>> > DistributedCache.addCacheFile(new
>>> > URI("/user/peter/cacheFile/testCache1"),
>>> > conf);
>>> >
>>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>> >
>>> > Then, my setup function looks like this:
>>> >
>>> > public void setup(Context context) throws IOException,
>>> > InterruptedException{
>>> > Configuration conf = context.getConfiguration();
>>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>> > //etc
>>> > }
>>> >
>>> > However, this localFiles array is always null.
>>> >
>>> > I was initially running on a single-host cluster for testing, but I
>>> > read
>>> > that this will prevent the distributed cache from working. I tried with
>>> > a
>>> > pseudo-distributed, but that didn't work either
>>> >
>>> > I'm using hadoop 1.0.3
>>> >
>>> > thanks Peter
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>
--
Harsh J
Re: Problem using distributed cache
Posted by be...@gmail.com.
Hi Peter
Can you try the following in your code
1. Driver class to implement Tools interface
2. Do a getConfiguration() rather than creating a new conf instance.
DC should be working with the above mentioned modifications to code.
Sent on my BlackBerry® from Vodafone
-----Original Message-----
From: Peter Cogan <pe...@gmail.com>
Date: Fri, 7 Dec 2012 14:06:41
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Problem using distributed cache
Hi,
any thoughts on this would be much appreciated
thanks
Peter
On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
> Job job = new Job(conf, "wordcount");
>
>
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> > Configuration conf = context.getConfiguration();
>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> > //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
Re: Problem using distributed cache
Posted by Dhaval Shah <pr...@yahoo.co.in>.
You will need to add the cache file to distributed cache before creating the Job object.. Give that a spin and see if that works
Regards,
Dhaval
________________________________
From: Peter Cogan <pe...@gmail.com>
To: user@hadoop.apache.org
Sent: Friday, 7 December 2012 9:06 AM
Subject: Re: Problem using distributed cache
Hi,
any thoughts on this would be much appreciated
thanks
Peter
On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
Hi,
>
>
>It's an instance created at the start of the program like this:
>
>
>public static void main(String[] args) throws Exception {
>Configuration conf = new Configuration();
>
>
>Job job = new Job(conf, "wordcount");
>
>
>
>
>DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
>
>
>
>
>
>On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>What is your conf object there? Is it job.getConfiguration() or an
>>independent instance?
>>
>>
>>On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>> Hi ,
>>>
>>> I want to use the distributed cache to allow my mappers to access data. In
>>> main, I'm using the command
>>>
>>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>>> conf);
>>>
>>> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>>
>>> Then, my setup function looks like this:
>>>
>>> public void setup(Context context) throws IOException, InterruptedException{
>>> Configuration conf = context.getConfiguration();
>>> Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>> //etc
>>> }
>>>
>>> However, this localFiles array is always null.
>>>
>>> I was initially running on a single-host cluster for testing, but I read
>>> that this will prevent the distributed cache from working. I tried with a
>>> pseudo-distributed, but that didn't work either
>>>
>>> I'm using hadoop 1.0.3
>>>
>>> thanks Peter
>>>
>>>
>>
>>
>>
>>--
>>Harsh J
>>
>
Re: Problem using distributed cache
Posted by Harsh J <ha...@cloudera.com>.
Please try using job.getConfiguration() instead of the pre-job conf
instance, cause the constructor clones it.
On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> any thoughts on this would be much appreciated
>
> thanks
> Peter
>
>
> On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>
>> Hi,
>>
>> It's an instance created at the start of the program like this:
>>
>> public static void main(String[] args) throws Exception {
>>
>> Configuration conf = new Configuration();
>>
>>
>> Job job = new Job(conf, "wordcount");
>>
>>
>>
>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>> conf);
>>
>>
>>
>>
>> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> What is your conf object there? Is it job.getConfiguration() or an
>>> independent instance?
>>>
>>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>>> wrote:
>>> > Hi ,
>>> >
>>> > I want to use the distributed cache to allow my mappers to access data.
>>> > In
>>> > main, I'm using the command
>>> >
>>> > DistributedCache.addCacheFile(new
>>> > URI("/user/peter/cacheFile/testCache1"),
>>> > conf);
>>> >
>>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>> >
>>> > Then, my setup function looks like this:
>>> >
>>> > public void setup(Context context) throws IOException,
>>> > InterruptedException{
>>> > Configuration conf = context.getConfiguration();
>>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>> > //etc
>>> > }
>>> >
>>> > However, this localFiles array is always null.
>>> >
>>> > I was initially running on a single-host cluster for testing, but I
>>> > read
>>> > that this will prevent the distributed cache from working. I tried with
>>> > a
>>> > pseudo-distributed, but that didn't work either
>>> >
>>> > I'm using hadoop 1.0.3
>>> >
>>> > thanks Peter
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>
--
Harsh J
Re: Problem using distributed cache
Posted by be...@gmail.com.
Hi Peter
Can you try the following in your code
1. Driver class to implement Tools interface
2. Do a getConfiguration() rather than creating a new conf instance.
DC should be working with the above mentioned modifications to code.
Sent on my BlackBerry® from Vodafone
-----Original Message-----
From: Peter Cogan <pe...@gmail.com>
Date: Fri, 7 Dec 2012 14:06:41
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Problem using distributed cache
Hi,
any thoughts on this would be much appreciated
thanks
Peter
On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
> Job job = new Job(conf, "wordcount");
>
>
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> > Configuration conf = context.getConfiguration();
>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> > //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
Re: Problem using distributed cache
Posted by Harsh J <ha...@cloudera.com>.
Please try using job.getConfiguration() instead of the pre-job conf
instance, cause the constructor clones it.
On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> any thoughts on this would be much appreciated
>
> thanks
> Peter
>
>
> On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>
>> Hi,
>>
>> It's an instance created at the start of the program like this:
>>
>> public static void main(String[] args) throws Exception {
>>
>> Configuration conf = new Configuration();
>>
>>
>> Job job = new Job(conf, "wordcount");
>>
>>
>>
>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>> conf);
>>
>>
>>
>>
>> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> What is your conf object there? Is it job.getConfiguration() or an
>>> independent instance?
>>>
>>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>>> wrote:
>>> > Hi ,
>>> >
>>> > I want to use the distributed cache to allow my mappers to access data.
>>> > In
>>> > main, I'm using the command
>>> >
>>> > DistributedCache.addCacheFile(new
>>> > URI("/user/peter/cacheFile/testCache1"),
>>> > conf);
>>> >
>>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>> >
>>> > Then, my setup function looks like this:
>>> >
>>> > public void setup(Context context) throws IOException,
>>> > InterruptedException{
>>> > Configuration conf = context.getConfiguration();
>>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>> > //etc
>>> > }
>>> >
>>> > However, this localFiles array is always null.
>>> >
>>> > I was initially running on a single-host cluster for testing, but I
>>> > read
>>> > that this will prevent the distributed cache from working. I tried with
>>> > a
>>> > pseudo-distributed, but that didn't work either
>>> >
>>> > I'm using hadoop 1.0.3
>>> >
>>> > thanks Peter
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>
--
Harsh J
Re: Problem using distributed cache
Posted by Harsh J <ha...@cloudera.com>.
Please try using job.getConfiguration() instead of the pre-job conf
instance, cause the constructor clones it.
On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> any thoughts on this would be much appreciated
>
> thanks
> Peter
>
>
> On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>
>> Hi,
>>
>> It's an instance created at the start of the program like this:
>>
>> public static void main(String[] args) throws Exception {
>>
>> Configuration conf = new Configuration();
>>
>>
>> Job job = new Job(conf, "wordcount");
>>
>>
>>
>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>> conf);
>>
>>
>>
>>
>> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> What is your conf object there? Is it job.getConfiguration() or an
>>> independent instance?
>>>
>>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>>> wrote:
>>> > Hi ,
>>> >
>>> > I want to use the distributed cache to allow my mappers to access data.
>>> > In
>>> > main, I'm using the command
>>> >
>>> > DistributedCache.addCacheFile(new
>>> > URI("/user/peter/cacheFile/testCache1"),
>>> > conf);
>>> >
>>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>> >
>>> > Then, my setup function looks like this:
>>> >
>>> > public void setup(Context context) throws IOException,
>>> > InterruptedException{
>>> > Configuration conf = context.getConfiguration();
>>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>> > //etc
>>> > }
>>> >
>>> > However, this localFiles array is always null.
>>> >
>>> > I was initially running on a single-host cluster for testing, but I
>>> > read
>>> > that this will prevent the distributed cache from working. I tried with
>>> > a
>>> > pseudo-distributed, but that didn't work either
>>> >
>>> > I'm using hadoop 1.0.3
>>> >
>>> > thanks Peter
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>
--
Harsh J
Re: Problem using distributed cache
Posted by be...@gmail.com.
Hi Peter
Can you try the following in your code
1. Driver class to implement Tools interface
2. Do a getConfiguration() rather than creating a new conf instance.
DC should be working with the above mentioned modifications to code.
Sent on my BlackBerry® from Vodafone
-----Original Message-----
From: Peter Cogan <pe...@gmail.com>
Date: Fri, 7 Dec 2012 14:06:41
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Problem using distributed cache
Hi,
any thoughts on this would be much appreciated
thanks
Peter
On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
> Job job = new Job(conf, "wordcount");
>
>
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> > Configuration conf = context.getConfiguration();
>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> > //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
Re: Problem using distributed cache
Posted by Dhaval Shah <pr...@yahoo.co.in>.
You will need to add the cache file to distributed cache before creating the Job object.. Give that a spin and see if that works
Regards,
Dhaval
________________________________
From: Peter Cogan <pe...@gmail.com>
To: user@hadoop.apache.org
Sent: Friday, 7 December 2012 9:06 AM
Subject: Re: Problem using distributed cache
Hi,
any thoughts on this would be much appreciated
thanks
Peter
On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
Hi,
>
>
>It's an instance created at the start of the program like this:
>
>
>public static void main(String[] args) throws Exception {
>Configuration conf = new Configuration();
>
>
>Job job = new Job(conf, "wordcount");
>
>
>
>
>DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
>
>
>
>
>
>On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>What is your conf object there? Is it job.getConfiguration() or an
>>independent instance?
>>
>>
>>On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>> Hi ,
>>>
>>> I want to use the distributed cache to allow my mappers to access data. In
>>> main, I'm using the command
>>>
>>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>>> conf);
>>>
>>> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>>
>>> Then, my setup function looks like this:
>>>
>>> public void setup(Context context) throws IOException, InterruptedException{
>>> Configuration conf = context.getConfiguration();
>>> Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>> //etc
>>> }
>>>
>>> However, this localFiles array is always null.
>>>
>>> I was initially running on a single-host cluster for testing, but I read
>>> that this will prevent the distributed cache from working. I tried with a
>>> pseudo-distributed, but that didn't work either
>>>
>>> I'm using hadoop 1.0.3
>>>
>>> thanks Peter
>>>
>>>
>>
>>
>>
>>--
>>Harsh J
>>
>
Re: Problem using distributed cache
Posted by Peter Cogan <pe...@gmail.com>.
Hi,
any thoughts on this would be much appreciated
thanks
Peter
On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
> Job job = new Job(conf, "wordcount");
>
>
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> > Configuration conf = context.getConfiguration();
>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> > //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
Re: Problem using distributed cache
Posted by Peter Cogan <pe...@gmail.com>.
Hi,
any thoughts on this would be much appreciated
thanks
Peter
On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
> Job job = new Job(conf, "wordcount");
>
>
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> > Configuration conf = context.getConfiguration();
>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> > //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
Re: Problem using distributed cache
Posted by Peter Cogan <pe...@gmail.com>.
Hi,
any thoughts on this would be much appreciated
thanks
Peter
On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
> Job job = new Job(conf, "wordcount");
>
>
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> > Configuration conf = context.getConfiguration();
>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> > //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
Re: Problem using distributed cache
Posted by Peter Cogan <pe...@gmail.com>.
Hi,
any thoughts on this would be much appreciated
thanks
Peter
On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
> Job job = new Job(conf, "wordcount");
>
>
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> > Configuration conf = context.getConfiguration();
>> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> > //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
Re: Problem using distributed cache
Posted by Peter Cogan <pe...@gmail.com>.
Hi,
It's an instance created at the start of the program like this:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
conf);
On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
> What is your conf object there? Is it job.getConfiguration() or an
> independent instance?
>
> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> > Hi ,
> >
> > I want to use the distributed cache to allow my mappers to access data.
> In
> > main, I'm using the command
> >
> > DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> > conf);
> >
> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >
> > Then, my setup function looks like this:
> >
> > public void setup(Context context) throws IOException,
> InterruptedException{
> > Configuration conf = context.getConfiguration();
> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> > //etc
> > }
> >
> > However, this localFiles array is always null.
> >
> > I was initially running on a single-host cluster for testing, but I read
> > that this will prevent the distributed cache from working. I tried with a
> > pseudo-distributed, but that didn't work either
> >
> > I'm using hadoop 1.0.3
> >
> > thanks Peter
> >
> >
>
>
>
> --
> Harsh J
>
Re: Problem using distributed cache
Posted by Peter Cogan <pe...@gmail.com>.
Hi,
It's an instance created at the start of the program like this:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
conf);
On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
> What is your conf object there? Is it job.getConfiguration() or an
> independent instance?
>
> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> > Hi ,
> >
> > I want to use the distributed cache to allow my mappers to access data.
> In
> > main, I'm using the command
> >
> > DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> > conf);
> >
> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >
> > Then, my setup function looks like this:
> >
> > public void setup(Context context) throws IOException,
> InterruptedException{
> > Configuration conf = context.getConfiguration();
> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> > //etc
> > }
> >
> > However, this localFiles array is always null.
> >
> > I was initially running on a single-host cluster for testing, but I read
> > that this will prevent the distributed cache from working. I tried with a
> > pseudo-distributed, but that didn't work either
> >
> > I'm using hadoop 1.0.3
> >
> > thanks Peter
> >
> >
>
>
>
> --
> Harsh J
>
Re: Problem using distributed cache
Posted by Peter Cogan <pe...@gmail.com>.
Hi,
It's an instance created at the start of the program like this:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
conf);
On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
> What is your conf object there? Is it job.getConfiguration() or an
> independent instance?
>
> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> > Hi ,
> >
> > I want to use the distributed cache to allow my mappers to access data.
> In
> > main, I'm using the command
> >
> > DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> > conf);
> >
> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >
> > Then, my setup function looks like this:
> >
> > public void setup(Context context) throws IOException,
> InterruptedException{
> > Configuration conf = context.getConfiguration();
> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> > //etc
> > }
> >
> > However, this localFiles array is always null.
> >
> > I was initially running on a single-host cluster for testing, but I read
> > that this will prevent the distributed cache from working. I tried with a
> > pseudo-distributed, but that didn't work either
> >
> > I'm using hadoop 1.0.3
> >
> > thanks Peter
> >
> >
>
>
>
> --
> Harsh J
>
Re: Problem using distributed cache
Posted by Peter Cogan <pe...@gmail.com>.
Hi,
It's an instance created at the start of the program like this:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
conf);
On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
> What is your conf object there? Is it job.getConfiguration() or an
> independent instance?
>
> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> > Hi ,
> >
> > I want to use the distributed cache to allow my mappers to access data.
> In
> > main, I'm using the command
> >
> > DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> > conf);
> >
> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >
> > Then, my setup function looks like this:
> >
> > public void setup(Context context) throws IOException,
> InterruptedException{
> > Configuration conf = context.getConfiguration();
> > Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> > //etc
> > }
> >
> > However, this localFiles array is always null.
> >
> > I was initially running on a single-host cluster for testing, but I read
> > that this will prevent the distributed cache from working. I tried with a
> > pseudo-distributed, but that didn't work either
> >
> > I'm using hadoop 1.0.3
> >
> > thanks Peter
> >
> >
>
>
>
> --
> Harsh J
>
Re: Problem using distributed cache
Posted by Harsh J <ha...@cloudera.com>.
What is your conf object there? Is it job.getConfiguration() or an
independent instance?
On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access data. In
> main, I'm using the command
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> public void setup(Context context) throws IOException, InterruptedException{
> Configuration conf = context.getConfiguration();
> Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> //etc
> }
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I read
> that this will prevent the distributed cache from working. I tried with a
> pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>
--
Harsh J
Re: Problem using distributed cache
Posted by Harsh J <ha...@cloudera.com>.
What is your conf object there? Is it job.getConfiguration() or an
independent instance?
On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access data. In
> main, I'm using the command
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> public void setup(Context context) throws IOException, InterruptedException{
> Configuration conf = context.getConfiguration();
> Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> //etc
> }
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I read
> that this will prevent the distributed cache from working. I tried with a
> pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>
--
Harsh J
Re: Problem using distributed cache
Posted by surfer <su...@crs4.it>.
On 12/07/2012 03:49 PM, surfer wrote:
> Hello Peter
> In my, humble, experience I never get hadoop 1.0.3 to work with
> distributed cache and the new api (mapreduce). with the old api it works.
> giovanni
>
> P.S. I already tried the approaches suggested by both Dhaval and Harsh J
I'm writing this hoping that maybe could help someone else....
the missing piece was that I was doing as a test for the DC a datajoin
job and the parameter that overrides the default hadoop key/value
separator (\t) for the KeyValueFromTextInputFormat is changed in the new
api.
old api syntax: conf.set("key.value.separator.in.input.line", ",");
new api syntax:
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator",
",");
about the addCacheFile method location. It works both by placing it
right after the conf creation (Dhaval approach):
|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
|
or after the the job creation (Harsh approach):
|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),job.getConfiguration());|
regards
giovanni
Re: Problem using distributed cache
Posted by surfer <su...@crs4.it>.
On 12/07/2012 03:49 PM, surfer wrote:
> Hello Peter
> In my, humble, experience I never get hadoop 1.0.3 to work with
> distributed cache and the new api (mapreduce). with the old api it works.
> giovanni
>
> P.S. I already tried the approaches suggested by both Dhaval and Harsh J
I'm writing this hoping that maybe could help someone else....
the missing piece was that I was doing as a test for the DC a datajoin
job and the parameter that overrides the default hadoop key/value
separator (\t) for the KeyValueFromTextInputFormat is changed in the new
api.
old api syntax: conf.set("key.value.separator.in.input.line", ",");
new api syntax:
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator",
",");
about the addCacheFile method location. It works both by placing it
right after the conf creation (Dhaval approach):
|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
|
or after the the job creation (Harsh approach):
|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),job.getConfiguration());|
regards
giovanni
Re: Problem using distributed cache
Posted by surfer <su...@crs4.it>.
On 12/07/2012 03:49 PM, surfer wrote:
> Hello Peter
> In my, humble, experience I never get hadoop 1.0.3 to work with
> distributed cache and the new api (mapreduce). with the old api it works.
> giovanni
>
> P.S. I already tried the approaches suggested by both Dhaval and Harsh J
I'm writing this hoping that maybe could help someone else....
the missing piece was that I was doing as a test for the DC a datajoin
job and the parameter that overrides the default hadoop key/value
separator (\t) for the KeyValueFromTextInputFormat is changed in the new
api.
old api syntax: conf.set("key.value.separator.in.input.line", ",");
new api syntax:
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator",
",");
about the addCacheFile method location. It works both by placing it
right after the conf creation (Dhaval approach):
|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
|
or after the the job creation (Harsh approach):
|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),job.getConfiguration());|
regards
giovanni
Re: Problem using distributed cache
Posted by surfer <su...@crs4.it>.
On 12/07/2012 03:49 PM, surfer wrote:
> Hello Peter
> In my, humble, experience I never get hadoop 1.0.3 to work with
> distributed cache and the new api (mapreduce). with the old api it works.
> giovanni
>
> P.S. I already tried the approaches suggested by both Dhaval and Harsh J
I'm writing this hoping that maybe could help someone else....
the missing piece was that I was doing as a test for the DC a datajoin
job and the parameter that overrides the default hadoop key/value
separator (\t) for the KeyValueFromTextInputFormat is changed in the new
api.
old api syntax: conf.set("key.value.separator.in.input.line", ",");
new api syntax:
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator",
",");
about the addCacheFile method location. It works both by placing it
right after the conf creation (Dhaval approach):
|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
|
or after the the job creation (Harsh approach):
|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),job.getConfiguration());|
regards
giovanni
Re: Problem using distributed cache
Posted by surfer <su...@crs4.it>.
Hello Peter
In my, humble, experience I never get hadoop 1.0.3 to work with
distributed cache and the new api (mapreduce). with the old api it works.
giovanni
P.S. I already tried the approaches suggested by both Dhaval and Harsh J
On 12/06/2012 05:59 PM, Peter Cogan wrote:
>
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access
> data. In main, I'm using the command
>
> |DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
> |
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> |public void setup(Context context) throws IOException, InterruptedException{
> Configuration conf = context.getConfiguration();
> Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> //etc
> }
> |
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I
> read that this will prevent the distributed cache from working. I
> tried with a pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>
Re: Problem using distributed cache
Posted by Harsh J <ha...@cloudera.com>.
What is your conf object there? Is it job.getConfiguration() or an
independent instance?
On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access data. In
> main, I'm using the command
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> public void setup(Context context) throws IOException, InterruptedException{
> Configuration conf = context.getConfiguration();
> Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> //etc
> }
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I read
> that this will prevent the distributed cache from working. I tried with a
> pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>
--
Harsh J
Re: Problem using distributed cache
Posted by Harsh J <ha...@cloudera.com>.
What is your conf object there? Is it job.getConfiguration() or an
independent instance?
On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access data. In
> main, I'm using the command
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> public void setup(Context context) throws IOException, InterruptedException{
> Configuration conf = context.getConfiguration();
> Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> //etc
> }
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I read
> that this will prevent the distributed cache from working. I tried with a
> pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>
--
Harsh J
Re: Problem using distributed cache
Posted by surfer <su...@crs4.it>.
Hello Peter
In my, humble, experience I never get hadoop 1.0.3 to work with
distributed cache and the new api (mapreduce). with the old api it works.
giovanni
P.S. I already tried the approaches suggested by both Dhaval and Harsh J
On 12/06/2012 05:59 PM, Peter Cogan wrote:
>
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access
> data. In main, I'm using the command
>
> |DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
> |
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> |public void setup(Context context) throws IOException, InterruptedException{
> Configuration conf = context.getConfiguration();
> Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> //etc
> }
> |
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I
> read that this will prevent the distributed cache from working. I
> tried with a pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>
Re: Problem using distributed cache
Posted by surfer <su...@crs4.it>.
Hello Peter
In my, humble, experience I never get hadoop 1.0.3 to work with
distributed cache and the new api (mapreduce). with the old api it works.
giovanni
P.S. I already tried the approaches suggested by both Dhaval and Harsh J
On 12/06/2012 05:59 PM, Peter Cogan wrote:
>
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access
> data. In main, I'm using the command
>
> |DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
> |
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> |public void setup(Context context) throws IOException, InterruptedException{
> Configuration conf = context.getConfiguration();
> Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> //etc
> }
> |
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I
> read that this will prevent the distributed cache from working. I
> tried with a pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>
Re: Problem using distributed cache
Posted by surfer <su...@crs4.it>.
Hello Peter
In my, humble, experience I never get hadoop 1.0.3 to work with
distributed cache and the new api (mapreduce). with the old api it works.
giovanni
P.S. I already tried the approaches suggested by both Dhaval and Harsh J
On 12/06/2012 05:59 PM, Peter Cogan wrote:
>
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access
> data. In main, I'm using the command
>
> |DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
> |
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> |public void setup(Context context) throws IOException, InterruptedException{
> Configuration conf = context.getConfiguration();
> Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> //etc
> }
> |
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I
> read that this will prevent the distributed cache from working. I
> tried with a pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>