You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Peter Cogan <pe...@gmail.com> on 2012/12/06 17:59:53 UTC

Problem using distributed cache

Hi ,

I want to use the distributed cache to allow my mappers to access data. In
main, I'm using the command

DistributedCache.addCacheFile(new
URI("/user/peter/cacheFile/testCache1"), conf);

Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs

Then, my setup function looks like this:

public void setup(Context context) throws IOException, InterruptedException{
    Configuration conf = context.getConfiguration();
    Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
    //etc
}

However, this localFiles array is always null.

I was initially running on a single-host cluster for testing, but I read
that this will prevent the distributed cache from working. I tried with a
pseudo-distributed, but that didn't work either

I'm using hadoop 1.0.3

thanks Peter

Re: Problem using distributed cache

Posted by be...@gmail.com.
Hi Peter

Can you try the following in your code
1. Driver class to implement Tools interface
2. Do a getConfiguration() rather than creating a new conf instance.

DC should be working with the above mentioned modifications to code.

Sent on my BlackBerry® from Vodafone

-----Original Message-----
From: Peter Cogan <pe...@gmail.com>
Date: Fri, 7 Dec 2012 14:06:41 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Problem using distributed cache

Hi,

any thoughts on this would be much appreciated

thanks
Peter


On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:

> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
>  Job job = new Job(conf, "wordcount");
>
>
>
>  DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> >     Configuration conf = context.getConfiguration();
>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> >     //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>


Re: Problem using distributed cache

Posted by Dhaval Shah <pr...@yahoo.co.in>.
You will need to add the cache file to distributed cache before creating the Job object.. Give that a spin and see if that works
 
Regards,
Dhaval


________________________________
 From: Peter Cogan <pe...@gmail.com>
To: user@hadoop.apache.org 
Sent: Friday, 7 December 2012 9:06 AM
Subject: Re: Problem using distributed cache
 

Hi,

any thoughts on this would be much appreciated

thanks
Peter



On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:

Hi,
>
>
>It's an instance created at the start of the program like this:
>
>
>public static void main(String[] args) throws Exception {
>Configuration conf = new Configuration();
>
>
>Job job = new Job(conf, "wordcount");
>
>
>
>
>DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
>
>
>
>
>
>On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>What is your conf object there? Is it job.getConfiguration() or an
>>independent instance?
>>
>>
>>On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>> Hi ,
>>>
>>> I want to use the distributed cache to allow my mappers to access data. In
>>> main, I'm using the command
>>>
>>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>>> conf);
>>>
>>> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>>
>>> Then, my setup function looks like this:
>>>
>>> public void setup(Context context) throws IOException, InterruptedException{
>>>     Configuration conf = context.getConfiguration();
>>>     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>>     //etc
>>> }
>>>
>>> However, this localFiles array is always null.
>>>
>>> I was initially running on a single-host cluster for testing, but I read
>>> that this will prevent the distributed cache from working. I tried with a
>>> pseudo-distributed, but that didn't work either
>>>
>>> I'm using hadoop 1.0.3
>>>
>>> thanks Peter
>>>
>>>
>>
>>
>>
>>--
>>Harsh J
>>
>

Re: Problem using distributed cache

Posted by Dhaval Shah <pr...@yahoo.co.in>.
You will need to add the cache file to distributed cache before creating the Job object.. Give that a spin and see if that works
 
Regards,
Dhaval


________________________________
 From: Peter Cogan <pe...@gmail.com>
To: user@hadoop.apache.org 
Sent: Friday, 7 December 2012 9:06 AM
Subject: Re: Problem using distributed cache
 

Hi,

any thoughts on this would be much appreciated

thanks
Peter



On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:

Hi,
>
>
>It's an instance created at the start of the program like this:
>
>
>public static void main(String[] args) throws Exception {
>Configuration conf = new Configuration();
>
>
>Job job = new Job(conf, "wordcount");
>
>
>
>
>DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
>
>
>
>
>
>On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>What is your conf object there? Is it job.getConfiguration() or an
>>independent instance?
>>
>>
>>On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>> Hi ,
>>>
>>> I want to use the distributed cache to allow my mappers to access data. In
>>> main, I'm using the command
>>>
>>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>>> conf);
>>>
>>> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>>
>>> Then, my setup function looks like this:
>>>
>>> public void setup(Context context) throws IOException, InterruptedException{
>>>     Configuration conf = context.getConfiguration();
>>>     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>>     //etc
>>> }
>>>
>>> However, this localFiles array is always null.
>>>
>>> I was initially running on a single-host cluster for testing, but I read
>>> that this will prevent the distributed cache from working. I tried with a
>>> pseudo-distributed, but that didn't work either
>>>
>>> I'm using hadoop 1.0.3
>>>
>>> thanks Peter
>>>
>>>
>>
>>
>>
>>--
>>Harsh J
>>
>

Re: Problem using distributed cache

Posted by Peter Cogan <pe...@gmail.com>.
Hi Dhaval & Harsh,

thanks for coming back to the thread - you're both right I was doing things
in the wrong order. I hadn't realised that Job constructor clones the
configuration - that's very interesting!

thanks again
Peter


On Fri, Dec 7, 2012 at 2:25 PM, Harsh J <ha...@cloudera.com> wrote:

> Please try using job.getConfiguration() instead of the pre-job conf
> instance, cause the constructor clones it.
>
> On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> > Hi,
> >
> > any thoughts on this would be much appreciated
> >
> > thanks
> > Peter
> >
> >
> > On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> It's an instance created at the start of the program like this:
> >>
> >> public static void main(String[] args) throws Exception {
> >>
> >> Configuration conf = new Configuration();
> >>
> >>
> >> Job job = new Job(conf, "wordcount");
> >>
> >>
> >>
> >> DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> >> conf);
> >>
> >>
> >>
> >>
> >> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
> >>>
> >>> What is your conf object there? Is it job.getConfiguration() or an
> >>> independent instance?
> >>>
> >>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> >>> wrote:
> >>> > Hi ,
> >>> >
> >>> > I want to use the distributed cache to allow my mappers to access
> data.
> >>> > In
> >>> > main, I'm using the command
> >>> >
> >>> > DistributedCache.addCacheFile(new
> >>> > URI("/user/peter/cacheFile/testCache1"),
> >>> > conf);
> >>> >
> >>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >>> >
> >>> > Then, my setup function looks like this:
> >>> >
> >>> > public void setup(Context context) throws IOException,
> >>> > InterruptedException{
> >>> >     Configuration conf = context.getConfiguration();
> >>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> >>> >     //etc
> >>> > }
> >>> >
> >>> > However, this localFiles array is always null.
> >>> >
> >>> > I was initially running on a single-host cluster for testing, but I
> >>> > read
> >>> > that this will prevent the distributed cache from working. I tried
> with
> >>> > a
> >>> > pseudo-distributed, but that didn't work either
> >>> >
> >>> > I'm using hadoop 1.0.3
> >>> >
> >>> > thanks Peter
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: Problem using distributed cache

Posted by Peter Cogan <pe...@gmail.com>.
Hi Dhaval & Harsh,

thanks for coming back to the thread - you're both right I was doing things
in the wrong order. I hadn't realised that Job constructor clones the
configuration - that's very interesting!

thanks again
Peter


On Fri, Dec 7, 2012 at 2:25 PM, Harsh J <ha...@cloudera.com> wrote:

> Please try using job.getConfiguration() instead of the pre-job conf
> instance, cause the constructor clones it.
>
> On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> > Hi,
> >
> > any thoughts on this would be much appreciated
> >
> > thanks
> > Peter
> >
> >
> > On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> It's an instance created at the start of the program like this:
> >>
> >> public static void main(String[] args) throws Exception {
> >>
> >> Configuration conf = new Configuration();
> >>
> >>
> >> Job job = new Job(conf, "wordcount");
> >>
> >>
> >>
> >> DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> >> conf);
> >>
> >>
> >>
> >>
> >> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
> >>>
> >>> What is your conf object there? Is it job.getConfiguration() or an
> >>> independent instance?
> >>>
> >>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> >>> wrote:
> >>> > Hi ,
> >>> >
> >>> > I want to use the distributed cache to allow my mappers to access
> data.
> >>> > In
> >>> > main, I'm using the command
> >>> >
> >>> > DistributedCache.addCacheFile(new
> >>> > URI("/user/peter/cacheFile/testCache1"),
> >>> > conf);
> >>> >
> >>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >>> >
> >>> > Then, my setup function looks like this:
> >>> >
> >>> > public void setup(Context context) throws IOException,
> >>> > InterruptedException{
> >>> >     Configuration conf = context.getConfiguration();
> >>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> >>> >     //etc
> >>> > }
> >>> >
> >>> > However, this localFiles array is always null.
> >>> >
> >>> > I was initially running on a single-host cluster for testing, but I
> >>> > read
> >>> > that this will prevent the distributed cache from working. I tried
> with
> >>> > a
> >>> > pseudo-distributed, but that didn't work either
> >>> >
> >>> > I'm using hadoop 1.0.3
> >>> >
> >>> > thanks Peter
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: Problem using distributed cache

Posted by Peter Cogan <pe...@gmail.com>.
Hi Dhaval & Harsh,

thanks for coming back to the thread - you're both right I was doing things
in the wrong order. I hadn't realised that Job constructor clones the
configuration - that's very interesting!

thanks again
Peter


On Fri, Dec 7, 2012 at 2:25 PM, Harsh J <ha...@cloudera.com> wrote:

> Please try using job.getConfiguration() instead of the pre-job conf
> instance, cause the constructor clones it.
>
> On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> > Hi,
> >
> > any thoughts on this would be much appreciated
> >
> > thanks
> > Peter
> >
> >
> > On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> It's an instance created at the start of the program like this:
> >>
> >> public static void main(String[] args) throws Exception {
> >>
> >> Configuration conf = new Configuration();
> >>
> >>
> >> Job job = new Job(conf, "wordcount");
> >>
> >>
> >>
> >> DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> >> conf);
> >>
> >>
> >>
> >>
> >> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
> >>>
> >>> What is your conf object there? Is it job.getConfiguration() or an
> >>> independent instance?
> >>>
> >>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> >>> wrote:
> >>> > Hi ,
> >>> >
> >>> > I want to use the distributed cache to allow my mappers to access
> data.
> >>> > In
> >>> > main, I'm using the command
> >>> >
> >>> > DistributedCache.addCacheFile(new
> >>> > URI("/user/peter/cacheFile/testCache1"),
> >>> > conf);
> >>> >
> >>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >>> >
> >>> > Then, my setup function looks like this:
> >>> >
> >>> > public void setup(Context context) throws IOException,
> >>> > InterruptedException{
> >>> >     Configuration conf = context.getConfiguration();
> >>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> >>> >     //etc
> >>> > }
> >>> >
> >>> > However, this localFiles array is always null.
> >>> >
> >>> > I was initially running on a single-host cluster for testing, but I
> >>> > read
> >>> > that this will prevent the distributed cache from working. I tried
> with
> >>> > a
> >>> > pseudo-distributed, but that didn't work either
> >>> >
> >>> > I'm using hadoop 1.0.3
> >>> >
> >>> > thanks Peter
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: Problem using distributed cache

Posted by Peter Cogan <pe...@gmail.com>.
Hi Dhaval & Harsh,

thanks for coming back to the thread - you're both right I was doing things
in the wrong order. I hadn't realised that Job constructor clones the
configuration - that's very interesting!

thanks again
Peter


On Fri, Dec 7, 2012 at 2:25 PM, Harsh J <ha...@cloudera.com> wrote:

> Please try using job.getConfiguration() instead of the pre-job conf
> instance, cause the constructor clones it.
>
> On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> > Hi,
> >
> > any thoughts on this would be much appreciated
> >
> > thanks
> > Peter
> >
> >
> > On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> It's an instance created at the start of the program like this:
> >>
> >> public static void main(String[] args) throws Exception {
> >>
> >> Configuration conf = new Configuration();
> >>
> >>
> >> Job job = new Job(conf, "wordcount");
> >>
> >>
> >>
> >> DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> >> conf);
> >>
> >>
> >>
> >>
> >> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
> >>>
> >>> What is your conf object there? Is it job.getConfiguration() or an
> >>> independent instance?
> >>>
> >>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> >>> wrote:
> >>> > Hi ,
> >>> >
> >>> > I want to use the distributed cache to allow my mappers to access
> data.
> >>> > In
> >>> > main, I'm using the command
> >>> >
> >>> > DistributedCache.addCacheFile(new
> >>> > URI("/user/peter/cacheFile/testCache1"),
> >>> > conf);
> >>> >
> >>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >>> >
> >>> > Then, my setup function looks like this:
> >>> >
> >>> > public void setup(Context context) throws IOException,
> >>> > InterruptedException{
> >>> >     Configuration conf = context.getConfiguration();
> >>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> >>> >     //etc
> >>> > }
> >>> >
> >>> > However, this localFiles array is always null.
> >>> >
> >>> > I was initially running on a single-host cluster for testing, but I
> >>> > read
> >>> > that this will prevent the distributed cache from working. I tried
> with
> >>> > a
> >>> > pseudo-distributed, but that didn't work either
> >>> >
> >>> > I'm using hadoop 1.0.3
> >>> >
> >>> > thanks Peter
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: Problem using distributed cache

Posted by Harsh J <ha...@cloudera.com>.
Please try using job.getConfiguration() instead of the pre-job conf
instance, cause the constructor clones it.

On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> any thoughts on this would be much appreciated
>
> thanks
> Peter
>
>
> On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>
>> Hi,
>>
>> It's an instance created at the start of the program like this:
>>
>> public static void main(String[] args) throws Exception {
>>
>> Configuration conf = new Configuration();
>>
>>
>> Job job = new Job(conf, "wordcount");
>>
>>
>>
>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>> conf);
>>
>>
>>
>>
>> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> What is your conf object there? Is it job.getConfiguration() or an
>>> independent instance?
>>>
>>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>>> wrote:
>>> > Hi ,
>>> >
>>> > I want to use the distributed cache to allow my mappers to access data.
>>> > In
>>> > main, I'm using the command
>>> >
>>> > DistributedCache.addCacheFile(new
>>> > URI("/user/peter/cacheFile/testCache1"),
>>> > conf);
>>> >
>>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>> >
>>> > Then, my setup function looks like this:
>>> >
>>> > public void setup(Context context) throws IOException,
>>> > InterruptedException{
>>> >     Configuration conf = context.getConfiguration();
>>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>> >     //etc
>>> > }
>>> >
>>> > However, this localFiles array is always null.
>>> >
>>> > I was initially running on a single-host cluster for testing, but I
>>> > read
>>> > that this will prevent the distributed cache from working. I tried with
>>> > a
>>> > pseudo-distributed, but that didn't work either
>>> >
>>> > I'm using hadoop 1.0.3
>>> >
>>> > thanks Peter
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>



-- 
Harsh J

Re: Problem using distributed cache

Posted by be...@gmail.com.
Hi Peter

Can you try the following in your code
1. Driver class to implement Tools interface
2. Do a getConfiguration() rather than creating a new conf instance.

DC should be working with the above mentioned modifications to code.

Sent on my BlackBerry® from Vodafone

-----Original Message-----
From: Peter Cogan <pe...@gmail.com>
Date: Fri, 7 Dec 2012 14:06:41 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Problem using distributed cache

Hi,

any thoughts on this would be much appreciated

thanks
Peter


On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:

> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
>  Job job = new Job(conf, "wordcount");
>
>
>
>  DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> >     Configuration conf = context.getConfiguration();
>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> >     //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>


Re: Problem using distributed cache

Posted by Dhaval Shah <pr...@yahoo.co.in>.
You will need to add the cache file to distributed cache before creating the Job object.. Give that a spin and see if that works
 
Regards,
Dhaval


________________________________
 From: Peter Cogan <pe...@gmail.com>
To: user@hadoop.apache.org 
Sent: Friday, 7 December 2012 9:06 AM
Subject: Re: Problem using distributed cache
 

Hi,

any thoughts on this would be much appreciated

thanks
Peter



On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:

Hi,
>
>
>It's an instance created at the start of the program like this:
>
>
>public static void main(String[] args) throws Exception {
>Configuration conf = new Configuration();
>
>
>Job job = new Job(conf, "wordcount");
>
>
>
>
>DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
>
>
>
>
>
>On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>What is your conf object there? Is it job.getConfiguration() or an
>>independent instance?
>>
>>
>>On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>> Hi ,
>>>
>>> I want to use the distributed cache to allow my mappers to access data. In
>>> main, I'm using the command
>>>
>>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>>> conf);
>>>
>>> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>>
>>> Then, my setup function looks like this:
>>>
>>> public void setup(Context context) throws IOException, InterruptedException{
>>>     Configuration conf = context.getConfiguration();
>>>     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>>     //etc
>>> }
>>>
>>> However, this localFiles array is always null.
>>>
>>> I was initially running on a single-host cluster for testing, but I read
>>> that this will prevent the distributed cache from working. I tried with a
>>> pseudo-distributed, but that didn't work either
>>>
>>> I'm using hadoop 1.0.3
>>>
>>> thanks Peter
>>>
>>>
>>
>>
>>
>>--
>>Harsh J
>>
>

Re: Problem using distributed cache

Posted by Harsh J <ha...@cloudera.com>.
Please try using job.getConfiguration() instead of the pre-job conf
instance, cause the constructor clones it.

On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> any thoughts on this would be much appreciated
>
> thanks
> Peter
>
>
> On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>
>> Hi,
>>
>> It's an instance created at the start of the program like this:
>>
>> public static void main(String[] args) throws Exception {
>>
>> Configuration conf = new Configuration();
>>
>>
>> Job job = new Job(conf, "wordcount");
>>
>>
>>
>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>> conf);
>>
>>
>>
>>
>> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> What is your conf object there? Is it job.getConfiguration() or an
>>> independent instance?
>>>
>>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>>> wrote:
>>> > Hi ,
>>> >
>>> > I want to use the distributed cache to allow my mappers to access data.
>>> > In
>>> > main, I'm using the command
>>> >
>>> > DistributedCache.addCacheFile(new
>>> > URI("/user/peter/cacheFile/testCache1"),
>>> > conf);
>>> >
>>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>> >
>>> > Then, my setup function looks like this:
>>> >
>>> > public void setup(Context context) throws IOException,
>>> > InterruptedException{
>>> >     Configuration conf = context.getConfiguration();
>>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>> >     //etc
>>> > }
>>> >
>>> > However, this localFiles array is always null.
>>> >
>>> > I was initially running on a single-host cluster for testing, but I
>>> > read
>>> > that this will prevent the distributed cache from working. I tried with
>>> > a
>>> > pseudo-distributed, but that didn't work either
>>> >
>>> > I'm using hadoop 1.0.3
>>> >
>>> > thanks Peter
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>



-- 
Harsh J

Re: Problem using distributed cache

Posted by be...@gmail.com.
Hi Peter

Can you try the following in your code
1. Driver class to implement Tools interface
2. Do a getConfiguration() rather than creating a new conf instance.

DC should be working with the above mentioned modifications to code.

Sent on my BlackBerry® from Vodafone

-----Original Message-----
From: Peter Cogan <pe...@gmail.com>
Date: Fri, 7 Dec 2012 14:06:41 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Problem using distributed cache

Hi,

any thoughts on this would be much appreciated

thanks
Peter


On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:

> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
>  Job job = new Job(conf, "wordcount");
>
>
>
>  DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> >     Configuration conf = context.getConfiguration();
>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> >     //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>


Re: Problem using distributed cache

Posted by Harsh J <ha...@cloudera.com>.
Please try using job.getConfiguration() instead of the pre-job conf
instance, cause the constructor clones it.

On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> any thoughts on this would be much appreciated
>
> thanks
> Peter
>
>
> On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>
>> Hi,
>>
>> It's an instance created at the start of the program like this:
>>
>> public static void main(String[] args) throws Exception {
>>
>> Configuration conf = new Configuration();
>>
>>
>> Job job = new Job(conf, "wordcount");
>>
>>
>>
>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>> conf);
>>
>>
>>
>>
>> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> What is your conf object there? Is it job.getConfiguration() or an
>>> independent instance?
>>>
>>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>>> wrote:
>>> > Hi ,
>>> >
>>> > I want to use the distributed cache to allow my mappers to access data.
>>> > In
>>> > main, I'm using the command
>>> >
>>> > DistributedCache.addCacheFile(new
>>> > URI("/user/peter/cacheFile/testCache1"),
>>> > conf);
>>> >
>>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>> >
>>> > Then, my setup function looks like this:
>>> >
>>> > public void setup(Context context) throws IOException,
>>> > InterruptedException{
>>> >     Configuration conf = context.getConfiguration();
>>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>> >     //etc
>>> > }
>>> >
>>> > However, this localFiles array is always null.
>>> >
>>> > I was initially running on a single-host cluster for testing, but I
>>> > read
>>> > that this will prevent the distributed cache from working. I tried with
>>> > a
>>> > pseudo-distributed, but that didn't work either
>>> >
>>> > I'm using hadoop 1.0.3
>>> >
>>> > thanks Peter
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>



-- 
Harsh J

Re: Problem using distributed cache

Posted by Harsh J <ha...@cloudera.com>.
Please try using job.getConfiguration() instead of the pre-job conf
instance, cause the constructor clones it.

On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi,
>
> any thoughts on this would be much appreciated
>
> thanks
> Peter
>
>
> On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>
>> Hi,
>>
>> It's an instance created at the start of the program like this:
>>
>> public static void main(String[] args) throws Exception {
>>
>> Configuration conf = new Configuration();
>>
>>
>> Job job = new Job(conf, "wordcount");
>>
>>
>>
>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>> conf);
>>
>>
>>
>>
>> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> What is your conf object there? Is it job.getConfiguration() or an
>>> independent instance?
>>>
>>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>>> wrote:
>>> > Hi ,
>>> >
>>> > I want to use the distributed cache to allow my mappers to access data.
>>> > In
>>> > main, I'm using the command
>>> >
>>> > DistributedCache.addCacheFile(new
>>> > URI("/user/peter/cacheFile/testCache1"),
>>> > conf);
>>> >
>>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>> >
>>> > Then, my setup function looks like this:
>>> >
>>> > public void setup(Context context) throws IOException,
>>> > InterruptedException{
>>> >     Configuration conf = context.getConfiguration();
>>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>> >     //etc
>>> > }
>>> >
>>> > However, this localFiles array is always null.
>>> >
>>> > I was initially running on a single-host cluster for testing, but I
>>> > read
>>> > that this will prevent the distributed cache from working. I tried with
>>> > a
>>> > pseudo-distributed, but that didn't work either
>>> >
>>> > I'm using hadoop 1.0.3
>>> >
>>> > thanks Peter
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>



-- 
Harsh J

Re: Problem using distributed cache

Posted by be...@gmail.com.
Hi Peter

Can you try the following in your code
1. Driver class to implement Tools interface
2. Do a getConfiguration() rather than creating a new conf instance.

DC should be working with the above mentioned modifications to code.

Sent on my BlackBerry® from Vodafone

-----Original Message-----
From: Peter Cogan <pe...@gmail.com>
Date: Fri, 7 Dec 2012 14:06:41 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Problem using distributed cache

Hi,

any thoughts on this would be much appreciated

thanks
Peter


On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:

> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
>  Job job = new Job(conf, "wordcount");
>
>
>
>  DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> >     Configuration conf = context.getConfiguration();
>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> >     //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>


Re: Problem using distributed cache

Posted by Dhaval Shah <pr...@yahoo.co.in>.
You will need to add the cache file to distributed cache before creating the Job object.. Give that a spin and see if that works
 
Regards,
Dhaval


________________________________
 From: Peter Cogan <pe...@gmail.com>
To: user@hadoop.apache.org 
Sent: Friday, 7 December 2012 9:06 AM
Subject: Re: Problem using distributed cache
 

Hi,

any thoughts on this would be much appreciated

thanks
Peter



On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:

Hi,
>
>
>It's an instance created at the start of the program like this:
>
>
>public static void main(String[] args) throws Exception {
>Configuration conf = new Configuration();
>
>
>Job job = new Job(conf, "wordcount");
>
>
>
>
>DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
>
>
>
>
>
>On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>What is your conf object there? Is it job.getConfiguration() or an
>>independent instance?
>>
>>
>>On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
>>> Hi ,
>>>
>>> I want to use the distributed cache to allow my mappers to access data. In
>>> main, I'm using the command
>>>
>>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>>> conf);
>>>
>>> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>>
>>> Then, my setup function looks like this:
>>>
>>> public void setup(Context context) throws IOException, InterruptedException{
>>>     Configuration conf = context.getConfiguration();
>>>     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>>     //etc
>>> }
>>>
>>> However, this localFiles array is always null.
>>>
>>> I was initially running on a single-host cluster for testing, but I read
>>> that this will prevent the distributed cache from working. I tried with a
>>> pseudo-distributed, but that didn't work either
>>>
>>> I'm using hadoop 1.0.3
>>>
>>> thanks Peter
>>>
>>>
>>
>>
>>
>>--
>>Harsh J
>>
>

Re: Problem using distributed cache

Posted by Peter Cogan <pe...@gmail.com>.
Hi,

any thoughts on this would be much appreciated

thanks
Peter


On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:

> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
>  Job job = new Job(conf, "wordcount");
>
>
>
>  DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> >     Configuration conf = context.getConfiguration();
>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> >     //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Problem using distributed cache

Posted by Peter Cogan <pe...@gmail.com>.
Hi,

any thoughts on this would be much appreciated

thanks
Peter


On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:

> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
>  Job job = new Job(conf, "wordcount");
>
>
>
>  DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> >     Configuration conf = context.getConfiguration();
>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> >     //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Problem using distributed cache

Posted by Peter Cogan <pe...@gmail.com>.
Hi,

any thoughts on this would be much appreciated

thanks
Peter


On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:

> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
>  Job job = new Job(conf, "wordcount");
>
>
>
>  DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> >     Configuration conf = context.getConfiguration();
>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> >     //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Problem using distributed cache

Posted by Peter Cogan <pe...@gmail.com>.
Hi,

any thoughts on this would be much appreciated

thanks
Peter


On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <pe...@gmail.com> wrote:

> Hi,
>
> It's an instance created at the start of the program like this:
>
> public static void main(String[] args) throws Exception {
>
> Configuration conf = new Configuration();
>
>
>  Job job = new Job(conf, "wordcount");
>
>
>
>  DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
>
>
>
> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to access data.
>> In
>> > main, I'm using the command
>> >
>> > DistributedCache.addCacheFile(new
>> URI("/user/peter/cacheFile/testCache1"),
>> > conf);
>> >
>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>> >
>> > Then, my setup function looks like this:
>> >
>> > public void setup(Context context) throws IOException,
>> InterruptedException{
>> >     Configuration conf = context.getConfiguration();
>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>> >     //etc
>> > }
>> >
>> > However, this localFiles array is always null.
>> >
>> > I was initially running on a single-host cluster for testing, but I read
>> > that this will prevent the distributed cache from working. I tried with
>> a
>> > pseudo-distributed, but that didn't work either
>> >
>> > I'm using hadoop 1.0.3
>> >
>> > thanks Peter
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Problem using distributed cache

Posted by Peter Cogan <pe...@gmail.com>.
Hi,

It's an instance created at the start of the program like this:

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();


 Job job = new Job(conf, "wordcount");



 DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
conf);




On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:

> What is your conf object there? Is it job.getConfiguration() or an
> independent instance?
>
> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> > Hi ,
> >
> > I want to use the distributed cache to allow my mappers to access data.
> In
> > main, I'm using the command
> >
> > DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> > conf);
> >
> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >
> > Then, my setup function looks like this:
> >
> > public void setup(Context context) throws IOException,
> InterruptedException{
> >     Configuration conf = context.getConfiguration();
> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> >     //etc
> > }
> >
> > However, this localFiles array is always null.
> >
> > I was initially running on a single-host cluster for testing, but I read
> > that this will prevent the distributed cache from working. I tried with a
> > pseudo-distributed, but that didn't work either
> >
> > I'm using hadoop 1.0.3
> >
> > thanks Peter
> >
> >
>
>
>
> --
> Harsh J
>

Re: Problem using distributed cache

Posted by Peter Cogan <pe...@gmail.com>.
Hi,

It's an instance created at the start of the program like this:

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();


 Job job = new Job(conf, "wordcount");



 DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
conf);




On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:

> What is your conf object there? Is it job.getConfiguration() or an
> independent instance?
>
> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> > Hi ,
> >
> > I want to use the distributed cache to allow my mappers to access data.
> In
> > main, I'm using the command
> >
> > DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> > conf);
> >
> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >
> > Then, my setup function looks like this:
> >
> > public void setup(Context context) throws IOException,
> InterruptedException{
> >     Configuration conf = context.getConfiguration();
> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> >     //etc
> > }
> >
> > However, this localFiles array is always null.
> >
> > I was initially running on a single-host cluster for testing, but I read
> > that this will prevent the distributed cache from working. I tried with a
> > pseudo-distributed, but that didn't work either
> >
> > I'm using hadoop 1.0.3
> >
> > thanks Peter
> >
> >
>
>
>
> --
> Harsh J
>

Re: Problem using distributed cache

Posted by Peter Cogan <pe...@gmail.com>.
Hi,

It's an instance created at the start of the program like this:

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();


 Job job = new Job(conf, "wordcount");



 DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
conf);




On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:

> What is your conf object there? Is it job.getConfiguration() or an
> independent instance?
>
> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> > Hi ,
> >
> > I want to use the distributed cache to allow my mappers to access data.
> In
> > main, I'm using the command
> >
> > DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> > conf);
> >
> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >
> > Then, my setup function looks like this:
> >
> > public void setup(Context context) throws IOException,
> InterruptedException{
> >     Configuration conf = context.getConfiguration();
> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> >     //etc
> > }
> >
> > However, this localFiles array is always null.
> >
> > I was initially running on a single-host cluster for testing, but I read
> > that this will prevent the distributed cache from working. I tried with a
> > pseudo-distributed, but that didn't work either
> >
> > I'm using hadoop 1.0.3
> >
> > thanks Peter
> >
> >
>
>
>
> --
> Harsh J
>

Re: Problem using distributed cache

Posted by Peter Cogan <pe...@gmail.com>.
Hi,

It's an instance created at the start of the program like this:

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();


 Job job = new Job(conf, "wordcount");



 DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
conf);




On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <ha...@cloudera.com> wrote:

> What is your conf object there? Is it job.getConfiguration() or an
> independent instance?
>
> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com>
> wrote:
> > Hi ,
> >
> > I want to use the distributed cache to allow my mappers to access data.
> In
> > main, I'm using the command
> >
> > DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> > conf);
> >
> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >
> > Then, my setup function looks like this:
> >
> > public void setup(Context context) throws IOException,
> InterruptedException{
> >     Configuration conf = context.getConfiguration();
> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> >     //etc
> > }
> >
> > However, this localFiles array is always null.
> >
> > I was initially running on a single-host cluster for testing, but I read
> > that this will prevent the distributed cache from working. I tried with a
> > pseudo-distributed, but that didn't work either
> >
> > I'm using hadoop 1.0.3
> >
> > thanks Peter
> >
> >
>
>
>
> --
> Harsh J
>

Re: Problem using distributed cache

Posted by Harsh J <ha...@cloudera.com>.
What is your conf object there? Is it job.getConfiguration() or an
independent instance?

On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access data. In
> main, I'm using the command
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> public void setup(Context context) throws IOException, InterruptedException{
>     Configuration conf = context.getConfiguration();
>     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>     //etc
> }
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I read
> that this will prevent the distributed cache from working. I tried with a
> pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>



-- 
Harsh J

Re: Problem using distributed cache

Posted by Harsh J <ha...@cloudera.com>.
What is your conf object there? Is it job.getConfiguration() or an
independent instance?

On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access data. In
> main, I'm using the command
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> public void setup(Context context) throws IOException, InterruptedException{
>     Configuration conf = context.getConfiguration();
>     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>     //etc
> }
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I read
> that this will prevent the distributed cache from working. I tried with a
> pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>



-- 
Harsh J

Re: Problem using distributed cache

Posted by surfer <su...@crs4.it>.
On 12/07/2012 03:49 PM, surfer wrote:
> Hello Peter
> In my, humble, experience I never get hadoop 1.0.3 to work with
> distributed cache and the new api (mapreduce). with the old api it works.
> giovanni
>
> P.S. I already tried the approaches suggested by both Dhaval and  Harsh J
I'm writing this hoping that maybe could help someone else....

the missing piece was that I was doing as a test for the DC a datajoin
job and the parameter that overrides the default hadoop key/value
separator (\t) for the KeyValueFromTextInputFormat is changed in the new
api.
old api syntax:   conf.set("key.value.separator.in.input.line", ",");
new api syntax:
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator",
",");

about the addCacheFile method location. It works both by placing it
right after the conf creation (Dhaval approach):

|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
|


or after the the job creation (Harsh approach):

|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),job.getConfiguration());|


regards
giovanni

Re: Problem using distributed cache

Posted by surfer <su...@crs4.it>.
On 12/07/2012 03:49 PM, surfer wrote:
> Hello Peter
> In my, humble, experience I never get hadoop 1.0.3 to work with
> distributed cache and the new api (mapreduce). with the old api it works.
> giovanni
>
> P.S. I already tried the approaches suggested by both Dhaval and  Harsh J
I'm writing this hoping that maybe could help someone else....

the missing piece was that I was doing as a test for the DC a datajoin
job and the parameter that overrides the default hadoop key/value
separator (\t) for the KeyValueFromTextInputFormat is changed in the new
api.
old api syntax:   conf.set("key.value.separator.in.input.line", ",");
new api syntax:
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator",
",");

about the addCacheFile method location. It works both by placing it
right after the conf creation (Dhaval approach):

|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
|


or after the the job creation (Harsh approach):

|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),job.getConfiguration());|


regards
giovanni

Re: Problem using distributed cache

Posted by surfer <su...@crs4.it>.
On 12/07/2012 03:49 PM, surfer wrote:
> Hello Peter
> In my, humble, experience I never get hadoop 1.0.3 to work with
> distributed cache and the new api (mapreduce). with the old api it works.
> giovanni
>
> P.S. I already tried the approaches suggested by both Dhaval and  Harsh J
I'm writing this hoping that maybe could help someone else....

the missing piece was that I was doing as a test for the DC a datajoin
job and the parameter that overrides the default hadoop key/value
separator (\t) for the KeyValueFromTextInputFormat is changed in the new
api.
old api syntax:   conf.set("key.value.separator.in.input.line", ",");
new api syntax:
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator",
",");

about the addCacheFile method location. It works both by placing it
right after the conf creation (Dhaval approach):

|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
|


or after the the job creation (Harsh approach):

|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),job.getConfiguration());|


regards
giovanni

Re: Problem using distributed cache

Posted by surfer <su...@crs4.it>.
On 12/07/2012 03:49 PM, surfer wrote:
> Hello Peter
> In my, humble, experience I never get hadoop 1.0.3 to work with
> distributed cache and the new api (mapreduce). with the old api it works.
> giovanni
>
> P.S. I already tried the approaches suggested by both Dhaval and  Harsh J
I'm writing this hoping that maybe could help someone else....

the missing piece was that I was doing as a test for the DC a datajoin
job and the parameter that overrides the default hadoop key/value
separator (\t) for the KeyValueFromTextInputFormat is changed in the new
api.
old api syntax:   conf.set("key.value.separator.in.input.line", ",");
new api syntax:
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator",
",");

about the addCacheFile method location. It works both by placing it
right after the conf creation (Dhaval approach):

|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
|


or after the the job creation (Harsh approach):

|DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),job.getConfiguration());|


regards
giovanni

Re: Problem using distributed cache

Posted by surfer <su...@crs4.it>.
Hello Peter
In my, humble, experience I never get hadoop 1.0.3 to work with
distributed cache and the new api (mapreduce). with the old api it works.
giovanni

P.S. I already tried the approaches suggested by both Dhaval and  Harsh J


On 12/06/2012 05:59 PM, Peter Cogan wrote:
>
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access
> data. In main, I'm using the command
>
> |DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
> |
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> |public void setup(Context context) throws IOException, InterruptedException{
>     Configuration conf = context.getConfiguration();
>     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>     //etc
> }
> |
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I
> read that this will prevent the distributed cache from working. I
> tried with a pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>


Re: Problem using distributed cache

Posted by Harsh J <ha...@cloudera.com>.
What is your conf object there? Is it job.getConfiguration() or an
independent instance?

On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access data. In
> main, I'm using the command
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> public void setup(Context context) throws IOException, InterruptedException{
>     Configuration conf = context.getConfiguration();
>     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>     //etc
> }
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I read
> that this will prevent the distributed cache from working. I tried with a
> pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>



-- 
Harsh J

Re: Problem using distributed cache

Posted by Harsh J <ha...@cloudera.com>.
What is your conf object there? Is it job.getConfiguration() or an
independent instance?

On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <pe...@gmail.com> wrote:
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access data. In
> main, I'm using the command
>
> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
> conf);
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> public void setup(Context context) throws IOException, InterruptedException{
>     Configuration conf = context.getConfiguration();
>     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>     //etc
> }
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I read
> that this will prevent the distributed cache from working. I tried with a
> pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>



-- 
Harsh J

Re: Problem using distributed cache

Posted by surfer <su...@crs4.it>.
Hello Peter
In my, humble, experience I never get hadoop 1.0.3 to work with
distributed cache and the new api (mapreduce). with the old api it works.
giovanni

P.S. I already tried the approaches suggested by both Dhaval and  Harsh J


On 12/06/2012 05:59 PM, Peter Cogan wrote:
>
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access
> data. In main, I'm using the command
>
> |DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
> |
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> |public void setup(Context context) throws IOException, InterruptedException{
>     Configuration conf = context.getConfiguration();
>     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>     //etc
> }
> |
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I
> read that this will prevent the distributed cache from working. I
> tried with a pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>


Re: Problem using distributed cache

Posted by surfer <su...@crs4.it>.
Hello Peter
In my, humble, experience I never get hadoop 1.0.3 to work with
distributed cache and the new api (mapreduce). with the old api it works.
giovanni

P.S. I already tried the approaches suggested by both Dhaval and  Harsh J


On 12/06/2012 05:59 PM, Peter Cogan wrote:
>
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access
> data. In main, I'm using the command
>
> |DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
> |
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> |public void setup(Context context) throws IOException, InterruptedException{
>     Configuration conf = context.getConfiguration();
>     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>     //etc
> }
> |
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I
> read that this will prevent the distributed cache from working. I
> tried with a pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>


Re: Problem using distributed cache

Posted by surfer <su...@crs4.it>.
Hello Peter
In my, humble, experience I never get hadoop 1.0.3 to work with
distributed cache and the new api (mapreduce). with the old api it works.
giovanni

P.S. I already tried the approaches suggested by both Dhaval and  Harsh J


On 12/06/2012 05:59 PM, Peter Cogan wrote:
>
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access
> data. In main, I'm using the command
>
> |DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
> |
>
> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>
> Then, my setup function looks like this:
>
> |public void setup(Context context) throws IOException, InterruptedException{
>     Configuration conf = context.getConfiguration();
>     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>     //etc
> }
> |
>
> However, this localFiles array is always null.
>
> I was initially running on a single-host cluster for testing, but I
> read that this will prevent the distributed cache from working. I
> tried with a pseudo-distributed, but that didn't work either
>
> I'm using hadoop 1.0.3
>
> thanks Peter
>
>