You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by "Botelho, Andrew" <An...@emc.com> on 2013/07/10 00:02:41 UTC

Distributed Cache

Hi,

I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5).
In my driver class, I use this code to try and add a file to the distributed cache:

import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);
Job job = Job.getInstance();
...

However, I keep getting warnings that the method addCacheFile() is deprecated.
Is there a more current way to add files to the distributed cache?

Thanks in advance,

Andrew

Re: Distributed Cache

Posted by Azuryy Yu <az...@gmail.com>.
It should be like this:
 Configuration conf = new Configuration();
 Job job = new Job(conf, "test");
  job.setJarByClass(Test.class);

 DistributedCache.addCacheFile(new Path("your hdfs path").toUri(),
    job.getConfiguration());


but the best example is test cases:
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/filecache/TestClientDistributedCacheManager.java?view=markup





On Wed, Jul 10, 2013 at 6:07 AM, Ted Yu <yu...@gmail.com> wrote:

> You should use Job#addCacheFile()
>
>
> Cheers
>
>
> On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>wrote:
>
>> Hi,****
>>
>> ** **
>>
>> I was wondering if I can still use the DistributedCache class in the
>> latest release of Hadoop (Version 2.0.5).****
>>
>> In my driver class, I use this code to try and add a file to the
>> distributed cache:****
>>
>> ** **
>>
>> import java.net.URI;****
>>
>> import org.apache.hadoop.conf.Configuration;****
>>
>> import org.apache.hadoop.filecache.DistributedCache;****
>>
>> import org.apache.hadoop.fs.*;****
>>
>> import org.apache.hadoop.io.*;****
>>
>> import org.apache.hadoop.mapreduce.*;****
>>
>> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>>
>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>>
>> ** **
>>
>> Configuration conf = new Configuration();****
>>
>> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>>
>> Job job = Job.getInstance(); ****
>>
>> …****
>>
>> ** **
>>
>> However, I keep getting warnings that the method addCacheFile() is
>> deprecated.****
>>
>> Is there a more current way to add files to the distributed cache?****
>>
>> ** **
>>
>> Thanks in advance,****
>>
>> ** **
>>
>> Andrew****
>>
>
>

Re: Distributed Cache

Posted by Azuryy Yu <az...@gmail.com>.
It should be like this:
 Configuration conf = new Configuration();
 Job job = new Job(conf, "test");
  job.setJarByClass(Test.class);

 DistributedCache.addCacheFile(new Path("your hdfs path").toUri(),
    job.getConfiguration());


but the best example is test cases:
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/filecache/TestClientDistributedCacheManager.java?view=markup





On Wed, Jul 10, 2013 at 6:07 AM, Ted Yu <yu...@gmail.com> wrote:

> You should use Job#addCacheFile()
>
>
> Cheers
>
>
> On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>wrote:
>
>> Hi,****
>>
>> ** **
>>
>> I was wondering if I can still use the DistributedCache class in the
>> latest release of Hadoop (Version 2.0.5).****
>>
>> In my driver class, I use this code to try and add a file to the
>> distributed cache:****
>>
>> ** **
>>
>> import java.net.URI;****
>>
>> import org.apache.hadoop.conf.Configuration;****
>>
>> import org.apache.hadoop.filecache.DistributedCache;****
>>
>> import org.apache.hadoop.fs.*;****
>>
>> import org.apache.hadoop.io.*;****
>>
>> import org.apache.hadoop.mapreduce.*;****
>>
>> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>>
>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>>
>> ** **
>>
>> Configuration conf = new Configuration();****
>>
>> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>>
>> Job job = Job.getInstance(); ****
>>
>> …****
>>
>> ** **
>>
>> However, I keep getting warnings that the method addCacheFile() is
>> deprecated.****
>>
>> Is there a more current way to add files to the distributed cache?****
>>
>> ** **
>>
>> Thanks in advance,****
>>
>> ** **
>>
>> Andrew****
>>
>
>

Re: Distributed Cache

Posted by Omkar Joshi <oj...@hortonworks.com>.
      Path[] cachedFilePaths =

          DistributedCache.getLocalCacheFiles(context.getConfiguration());

      for (Path cachedFilePath : cachedFilePaths) {

        File cachedFile = new File(cachedFilePath.toUri().getRawPath());

        System.out.println("cached fie path >> "

            + cachedFile.getAbsolutePath());

      }

I hope this helps for the time being.. JobContext was suppose to replace
DistributedCache api (it will be deprecated) however there is some problem
with that or I am missing something... Will reply if I find the solution to
it.

getCacheFiles will give you the uri used for localizing files... (original
uri used for adding it to cache).

getLocalCacheFiles .. will give you the actual file path on node manager.

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Wed, Jul 10, 2013 at 2:43 PM, Botelho, Andrew <An...@emc.com>wrote:

> Ok so JobContext.getCacheFiles() retures URI[].****
>
> Let’s say I only stored one folder in the cache that has several .txt
> files within it.  How do I use that returned URI to read each line of those
> .txt files?****
>
> ** **
>
> Basically, how do I read my cached file(s) after I call
> JobContext.getCacheFiles()?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Andrew****
>
> ** **
>
> *From:* Omkar Joshi [mailto:ojoshi@hortonworks.com]
> *Sent:* Wednesday, July 10, 2013 5:15 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributed Cache****
>
> ** **
>
> try JobContext.getCacheFiles()****
>
>
> ****
>
> Thanks,****
>
> Omkar Joshi****
>
> *Hortonworks Inc.* <http://www.hortonworks.com>****
>
> ** **
>
> On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <An...@emc.com>
> wrote:****
>
> Ok using job.addCacheFile() seems to compile correctly.****
>
> However, how do I then access the cached file in my Mapper code?  Is there
> a method that will look for any files in the cache?****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Andrew****
>
>  ****
>
> *From:* Ted Yu [mailto:yuzhihong@gmail.com]
> *Sent:* Tuesday, July 09, 2013 6:08 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributed Cache****
>
>  ****
>
> You should use Job#addCacheFile()****
>
>
> Cheers****
>
> On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>
> wrote:****
>
> Hi,****
>
>  ****
>
> I was wondering if I can still use the DistributedCache class in the
> latest release of Hadoop (Version 2.0.5).****
>
> In my driver class, I use this code to try and add a file to the
> distributed cache:****
>
>  ****
>
> import java.net.URI;****
>
> import org.apache.hadoop.conf.Configuration;****
>
> import org.apache.hadoop.filecache.DistributedCache;****
>
> import org.apache.hadoop.fs.*;****
>
> import org.apache.hadoop.io.*;****
>
> import org.apache.hadoop.mapreduce.*;****
>
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>
>  ****
>
> Configuration conf = new Configuration();****
>
> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>
> Job job = Job.getInstance(); ****
>
> …****
>
>  ****
>
> However, I keep getting warnings that the method addCacheFile() is
> deprecated.****
>
> Is there a more current way to add files to the distributed cache?****
>
>  ****
>
> Thanks in advance,****
>
>  ****
>
> Andrew****
>
>  ****
>
> ** **
>

Re: Distributed Cache

Posted by Omkar Joshi <oj...@hortonworks.com>.
      Path[] cachedFilePaths =

          DistributedCache.getLocalCacheFiles(context.getConfiguration());

      for (Path cachedFilePath : cachedFilePaths) {

        File cachedFile = new File(cachedFilePath.toUri().getRawPath());

        System.out.println("cached fie path >> "

            + cachedFile.getAbsolutePath());

      }

I hope this helps for the time being.. JobContext was suppose to replace
DistributedCache api (it will be deprecated) however there is some problem
with that or I am missing something... Will reply if I find the solution to
it.

getCacheFiles will give you the uri used for localizing files... (original
uri used for adding it to cache).

getLocalCacheFiles .. will give you the actual file path on node manager.

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Wed, Jul 10, 2013 at 2:43 PM, Botelho, Andrew <An...@emc.com>wrote:

> Ok so JobContext.getCacheFiles() retures URI[].****
>
> Let’s say I only stored one folder in the cache that has several .txt
> files within it.  How do I use that returned URI to read each line of those
> .txt files?****
>
> ** **
>
> Basically, how do I read my cached file(s) after I call
> JobContext.getCacheFiles()?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Andrew****
>
> ** **
>
> *From:* Omkar Joshi [mailto:ojoshi@hortonworks.com]
> *Sent:* Wednesday, July 10, 2013 5:15 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributed Cache****
>
> ** **
>
> try JobContext.getCacheFiles()****
>
>
> ****
>
> Thanks,****
>
> Omkar Joshi****
>
> *Hortonworks Inc.* <http://www.hortonworks.com>****
>
> ** **
>
> On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <An...@emc.com>
> wrote:****
>
> Ok using job.addCacheFile() seems to compile correctly.****
>
> However, how do I then access the cached file in my Mapper code?  Is there
> a method that will look for any files in the cache?****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Andrew****
>
>  ****
>
> *From:* Ted Yu [mailto:yuzhihong@gmail.com]
> *Sent:* Tuesday, July 09, 2013 6:08 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributed Cache****
>
>  ****
>
> You should use Job#addCacheFile()****
>
>
> Cheers****
>
> On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>
> wrote:****
>
> Hi,****
>
>  ****
>
> I was wondering if I can still use the DistributedCache class in the
> latest release of Hadoop (Version 2.0.5).****
>
> In my driver class, I use this code to try and add a file to the
> distributed cache:****
>
>  ****
>
> import java.net.URI;****
>
> import org.apache.hadoop.conf.Configuration;****
>
> import org.apache.hadoop.filecache.DistributedCache;****
>
> import org.apache.hadoop.fs.*;****
>
> import org.apache.hadoop.io.*;****
>
> import org.apache.hadoop.mapreduce.*;****
>
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>
>  ****
>
> Configuration conf = new Configuration();****
>
> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>
> Job job = Job.getInstance(); ****
>
> …****
>
>  ****
>
> However, I keep getting warnings that the method addCacheFile() is
> deprecated.****
>
> Is there a more current way to add files to the distributed cache?****
>
>  ****
>
> Thanks in advance,****
>
>  ****
>
> Andrew****
>
>  ****
>
> ** **
>

Re: Distributed Cache

Posted by Omkar Joshi <oj...@hortonworks.com>.
      Path[] cachedFilePaths =

          DistributedCache.getLocalCacheFiles(context.getConfiguration());

      for (Path cachedFilePath : cachedFilePaths) {

        File cachedFile = new File(cachedFilePath.toUri().getRawPath());

        System.out.println("cached fie path >> "

            + cachedFile.getAbsolutePath());

      }

I hope this helps for the time being.. JobContext was suppose to replace
DistributedCache api (it will be deprecated) however there is some problem
with that or I am missing something... Will reply if I find the solution to
it.

getCacheFiles will give you the uri used for localizing files... (original
uri used for adding it to cache).

getLocalCacheFiles .. will give you the actual file path on node manager.

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Wed, Jul 10, 2013 at 2:43 PM, Botelho, Andrew <An...@emc.com>wrote:

> Ok so JobContext.getCacheFiles() retures URI[].****
>
> Let’s say I only stored one folder in the cache that has several .txt
> files within it.  How do I use that returned URI to read each line of those
> .txt files?****
>
> ** **
>
> Basically, how do I read my cached file(s) after I call
> JobContext.getCacheFiles()?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Andrew****
>
> ** **
>
> *From:* Omkar Joshi [mailto:ojoshi@hortonworks.com]
> *Sent:* Wednesday, July 10, 2013 5:15 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributed Cache****
>
> ** **
>
> try JobContext.getCacheFiles()****
>
>
> ****
>
> Thanks,****
>
> Omkar Joshi****
>
> *Hortonworks Inc.* <http://www.hortonworks.com>****
>
> ** **
>
> On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <An...@emc.com>
> wrote:****
>
> Ok using job.addCacheFile() seems to compile correctly.****
>
> However, how do I then access the cached file in my Mapper code?  Is there
> a method that will look for any files in the cache?****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Andrew****
>
>  ****
>
> *From:* Ted Yu [mailto:yuzhihong@gmail.com]
> *Sent:* Tuesday, July 09, 2013 6:08 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributed Cache****
>
>  ****
>
> You should use Job#addCacheFile()****
>
>
> Cheers****
>
> On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>
> wrote:****
>
> Hi,****
>
>  ****
>
> I was wondering if I can still use the DistributedCache class in the
> latest release of Hadoop (Version 2.0.5).****
>
> In my driver class, I use this code to try and add a file to the
> distributed cache:****
>
>  ****
>
> import java.net.URI;****
>
> import org.apache.hadoop.conf.Configuration;****
>
> import org.apache.hadoop.filecache.DistributedCache;****
>
> import org.apache.hadoop.fs.*;****
>
> import org.apache.hadoop.io.*;****
>
> import org.apache.hadoop.mapreduce.*;****
>
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>
>  ****
>
> Configuration conf = new Configuration();****
>
> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>
> Job job = Job.getInstance(); ****
>
> …****
>
>  ****
>
> However, I keep getting warnings that the method addCacheFile() is
> deprecated.****
>
> Is there a more current way to add files to the distributed cache?****
>
>  ****
>
> Thanks in advance,****
>
>  ****
>
> Andrew****
>
>  ****
>
> ** **
>

Re: Distributed Cache

Posted by Omkar Joshi <oj...@hortonworks.com>.
      Path[] cachedFilePaths =

          DistributedCache.getLocalCacheFiles(context.getConfiguration());

      for (Path cachedFilePath : cachedFilePaths) {

        File cachedFile = new File(cachedFilePath.toUri().getRawPath());

        System.out.println("cached fie path >> "

            + cachedFile.getAbsolutePath());

      }

I hope this helps for the time being.. JobContext was suppose to replace
DistributedCache api (it will be deprecated) however there is some problem
with that or I am missing something... Will reply if I find the solution to
it.

getCacheFiles will give you the uri used for localizing files... (original
uri used for adding it to cache).

getLocalCacheFiles .. will give you the actual file path on node manager.

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Wed, Jul 10, 2013 at 2:43 PM, Botelho, Andrew <An...@emc.com>wrote:

> Ok so JobContext.getCacheFiles() retures URI[].****
>
> Let’s say I only stored one folder in the cache that has several .txt
> files within it.  How do I use that returned URI to read each line of those
> .txt files?****
>
> ** **
>
> Basically, how do I read my cached file(s) after I call
> JobContext.getCacheFiles()?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Andrew****
>
> ** **
>
> *From:* Omkar Joshi [mailto:ojoshi@hortonworks.com]
> *Sent:* Wednesday, July 10, 2013 5:15 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributed Cache****
>
> ** **
>
> try JobContext.getCacheFiles()****
>
>
> ****
>
> Thanks,****
>
> Omkar Joshi****
>
> *Hortonworks Inc.* <http://www.hortonworks.com>****
>
> ** **
>
> On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <An...@emc.com>
> wrote:****
>
> Ok using job.addCacheFile() seems to compile correctly.****
>
> However, how do I then access the cached file in my Mapper code?  Is there
> a method that will look for any files in the cache?****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Andrew****
>
>  ****
>
> *From:* Ted Yu [mailto:yuzhihong@gmail.com]
> *Sent:* Tuesday, July 09, 2013 6:08 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributed Cache****
>
>  ****
>
> You should use Job#addCacheFile()****
>
>
> Cheers****
>
> On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>
> wrote:****
>
> Hi,****
>
>  ****
>
> I was wondering if I can still use the DistributedCache class in the
> latest release of Hadoop (Version 2.0.5).****
>
> In my driver class, I use this code to try and add a file to the
> distributed cache:****
>
>  ****
>
> import java.net.URI;****
>
> import org.apache.hadoop.conf.Configuration;****
>
> import org.apache.hadoop.filecache.DistributedCache;****
>
> import org.apache.hadoop.fs.*;****
>
> import org.apache.hadoop.io.*;****
>
> import org.apache.hadoop.mapreduce.*;****
>
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>
>  ****
>
> Configuration conf = new Configuration();****
>
> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>
> Job job = Job.getInstance(); ****
>
> …****
>
>  ****
>
> However, I keep getting warnings that the method addCacheFile() is
> deprecated.****
>
> Is there a more current way to add files to the distributed cache?****
>
>  ****
>
> Thanks in advance,****
>
>  ****
>
> Andrew****
>
>  ****
>
> ** **
>

RE: Distributed Cache

Posted by "Botelho, Andrew" <An...@emc.com>.
Ok so JobContext.getCacheFiles() retures URI[].
Let's say I only stored one folder in the cache that has several .txt files within it.  How do I use that returned URI to read each line of those .txt files?

Basically, how do I read my cached file(s) after I call JobContext.getCacheFiles()?

Thanks,

Andrew

From: Omkar Joshi [mailto:ojoshi@hortonworks.com]
Sent: Wednesday, July 10, 2013 5:15 PM
To: user@hadoop.apache.org
Subject: Re: Distributed Cache

try JobContext.getCacheFiles()

Thanks,
Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <An...@emc.com>> wrote:
Ok using job.addCacheFile() seems to compile correctly.
However, how do I then access the cached file in my Mapper code?  Is there a method that will look for any files in the cache?

Thanks,

Andrew

From: Ted Yu [mailto:yuzhihong@gmail.com<ma...@gmail.com>]
Sent: Tuesday, July 09, 2013 6:08 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Distributed Cache

You should use Job#addCacheFile()

Cheers
On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>> wrote:
Hi,

I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5).
In my driver class, I use this code to try and add a file to the distributed cache:

import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);
Job job = Job.getInstance();
...

However, I keep getting warnings that the method addCacheFile() is deprecated.
Is there a more current way to add files to the distributed cache?

Thanks in advance,

Andrew



RE: Distributed Cache

Posted by "Botelho, Andrew" <An...@emc.com>.
Ok so JobContext.getCacheFiles() retures URI[].
Let's say I only stored one folder in the cache that has several .txt files within it.  How do I use that returned URI to read each line of those .txt files?

Basically, how do I read my cached file(s) after I call JobContext.getCacheFiles()?

Thanks,

Andrew

From: Omkar Joshi [mailto:ojoshi@hortonworks.com]
Sent: Wednesday, July 10, 2013 5:15 PM
To: user@hadoop.apache.org
Subject: Re: Distributed Cache

try JobContext.getCacheFiles()

Thanks,
Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <An...@emc.com>> wrote:
Ok using job.addCacheFile() seems to compile correctly.
However, how do I then access the cached file in my Mapper code?  Is there a method that will look for any files in the cache?

Thanks,

Andrew

From: Ted Yu [mailto:yuzhihong@gmail.com<ma...@gmail.com>]
Sent: Tuesday, July 09, 2013 6:08 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Distributed Cache

You should use Job#addCacheFile()

Cheers
On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>> wrote:
Hi,

I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5).
In my driver class, I use this code to try and add a file to the distributed cache:

import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);
Job job = Job.getInstance();
...

However, I keep getting warnings that the method addCacheFile() is deprecated.
Is there a more current way to add files to the distributed cache?

Thanks in advance,

Andrew



RE: Distributed Cache

Posted by "Botelho, Andrew" <An...@emc.com>.
Ok so JobContext.getCacheFiles() retures URI[].
Let's say I only stored one folder in the cache that has several .txt files within it.  How do I use that returned URI to read each line of those .txt files?

Basically, how do I read my cached file(s) after I call JobContext.getCacheFiles()?

Thanks,

Andrew

From: Omkar Joshi [mailto:ojoshi@hortonworks.com]
Sent: Wednesday, July 10, 2013 5:15 PM
To: user@hadoop.apache.org
Subject: Re: Distributed Cache

try JobContext.getCacheFiles()

Thanks,
Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <An...@emc.com>> wrote:
Ok using job.addCacheFile() seems to compile correctly.
However, how do I then access the cached file in my Mapper code?  Is there a method that will look for any files in the cache?

Thanks,

Andrew

From: Ted Yu [mailto:yuzhihong@gmail.com<ma...@gmail.com>]
Sent: Tuesday, July 09, 2013 6:08 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Distributed Cache

You should use Job#addCacheFile()

Cheers
On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>> wrote:
Hi,

I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5).
In my driver class, I use this code to try and add a file to the distributed cache:

import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);
Job job = Job.getInstance();
...

However, I keep getting warnings that the method addCacheFile() is deprecated.
Is there a more current way to add files to the distributed cache?

Thanks in advance,

Andrew



RE: Distributed Cache

Posted by "Botelho, Andrew" <An...@emc.com>.
Ok so JobContext.getCacheFiles() retures URI[].
Let's say I only stored one folder in the cache that has several .txt files within it.  How do I use that returned URI to read each line of those .txt files?

Basically, how do I read my cached file(s) after I call JobContext.getCacheFiles()?

Thanks,

Andrew

From: Omkar Joshi [mailto:ojoshi@hortonworks.com]
Sent: Wednesday, July 10, 2013 5:15 PM
To: user@hadoop.apache.org
Subject: Re: Distributed Cache

try JobContext.getCacheFiles()

Thanks,
Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <An...@emc.com>> wrote:
Ok using job.addCacheFile() seems to compile correctly.
However, how do I then access the cached file in my Mapper code?  Is there a method that will look for any files in the cache?

Thanks,

Andrew

From: Ted Yu [mailto:yuzhihong@gmail.com<ma...@gmail.com>]
Sent: Tuesday, July 09, 2013 6:08 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Distributed Cache

You should use Job#addCacheFile()

Cheers
On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>> wrote:
Hi,

I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5).
In my driver class, I use this code to try and add a file to the distributed cache:

import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);
Job job = Job.getInstance();
...

However, I keep getting warnings that the method addCacheFile() is deprecated.
Is there a more current way to add files to the distributed cache?

Thanks in advance,

Andrew



Re: Distributed Cache

Posted by Omkar Joshi <oj...@hortonworks.com>.
try JobContext.getCacheFiles()

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <An...@emc.com>wrote:

> Ok using job.addCacheFile() seems to compile correctly.****
>
> However, how do I then access the cached file in my Mapper code?  Is there
> a method that will look for any files in the cache?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Andrew****
>
> ** **
>
> *From:* Ted Yu [mailto:yuzhihong@gmail.com]
> *Sent:* Tuesday, July 09, 2013 6:08 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributed Cache****
>
> ** **
>
> You should use Job#addCacheFile()****
>
>
> Cheers****
>
> On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>
> wrote:****
>
> Hi,****
>
>  ****
>
> I was wondering if I can still use the DistributedCache class in the
> latest release of Hadoop (Version 2.0.5).****
>
> In my driver class, I use this code to try and add a file to the
> distributed cache:****
>
>  ****
>
> import java.net.URI;****
>
> import org.apache.hadoop.conf.Configuration;****
>
> import org.apache.hadoop.filecache.DistributedCache;****
>
> import org.apache.hadoop.fs.*;****
>
> import org.apache.hadoop.io.*;****
>
> import org.apache.hadoop.mapreduce.*;****
>
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>
>  ****
>
> Configuration conf = new Configuration();****
>
> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>
> Job job = Job.getInstance(); ****
>
> …****
>
>  ****
>
> However, I keep getting warnings that the method addCacheFile() is
> deprecated.****
>
> Is there a more current way to add files to the distributed cache?****
>
>  ****
>
> Thanks in advance,****
>
>  ****
>
> Andrew****
>
> ** **
>

Re: Distributed Cache

Posted by Omkar Joshi <oj...@hortonworks.com>.
try JobContext.getCacheFiles()

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <An...@emc.com>wrote:

> Ok using job.addCacheFile() seems to compile correctly.****
>
> However, how do I then access the cached file in my Mapper code?  Is there
> a method that will look for any files in the cache?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Andrew****
>
> ** **
>
> *From:* Ted Yu [mailto:yuzhihong@gmail.com]
> *Sent:* Tuesday, July 09, 2013 6:08 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributed Cache****
>
> ** **
>
> You should use Job#addCacheFile()****
>
>
> Cheers****
>
> On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>
> wrote:****
>
> Hi,****
>
>  ****
>
> I was wondering if I can still use the DistributedCache class in the
> latest release of Hadoop (Version 2.0.5).****
>
> In my driver class, I use this code to try and add a file to the
> distributed cache:****
>
>  ****
>
> import java.net.URI;****
>
> import org.apache.hadoop.conf.Configuration;****
>
> import org.apache.hadoop.filecache.DistributedCache;****
>
> import org.apache.hadoop.fs.*;****
>
> import org.apache.hadoop.io.*;****
>
> import org.apache.hadoop.mapreduce.*;****
>
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>
>  ****
>
> Configuration conf = new Configuration();****
>
> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>
> Job job = Job.getInstance(); ****
>
> …****
>
>  ****
>
> However, I keep getting warnings that the method addCacheFile() is
> deprecated.****
>
> Is there a more current way to add files to the distributed cache?****
>
>  ****
>
> Thanks in advance,****
>
>  ****
>
> Andrew****
>
> ** **
>

Re: Distributed Cache

Posted by Omkar Joshi <oj...@hortonworks.com>.
try JobContext.getCacheFiles()

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <An...@emc.com>wrote:

> Ok using job.addCacheFile() seems to compile correctly.****
>
> However, how do I then access the cached file in my Mapper code?  Is there
> a method that will look for any files in the cache?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Andrew****
>
> ** **
>
> *From:* Ted Yu [mailto:yuzhihong@gmail.com]
> *Sent:* Tuesday, July 09, 2013 6:08 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributed Cache****
>
> ** **
>
> You should use Job#addCacheFile()****
>
>
> Cheers****
>
> On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>
> wrote:****
>
> Hi,****
>
>  ****
>
> I was wondering if I can still use the DistributedCache class in the
> latest release of Hadoop (Version 2.0.5).****
>
> In my driver class, I use this code to try and add a file to the
> distributed cache:****
>
>  ****
>
> import java.net.URI;****
>
> import org.apache.hadoop.conf.Configuration;****
>
> import org.apache.hadoop.filecache.DistributedCache;****
>
> import org.apache.hadoop.fs.*;****
>
> import org.apache.hadoop.io.*;****
>
> import org.apache.hadoop.mapreduce.*;****
>
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>
>  ****
>
> Configuration conf = new Configuration();****
>
> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>
> Job job = Job.getInstance(); ****
>
> …****
>
>  ****
>
> However, I keep getting warnings that the method addCacheFile() is
> deprecated.****
>
> Is there a more current way to add files to the distributed cache?****
>
>  ****
>
> Thanks in advance,****
>
>  ****
>
> Andrew****
>
> ** **
>

Re: Distributed Cache

Posted by Omkar Joshi <oj...@hortonworks.com>.
try JobContext.getCacheFiles()

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <An...@emc.com>wrote:

> Ok using job.addCacheFile() seems to compile correctly.****
>
> However, how do I then access the cached file in my Mapper code?  Is there
> a method that will look for any files in the cache?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Andrew****
>
> ** **
>
> *From:* Ted Yu [mailto:yuzhihong@gmail.com]
> *Sent:* Tuesday, July 09, 2013 6:08 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributed Cache****
>
> ** **
>
> You should use Job#addCacheFile()****
>
>
> Cheers****
>
> On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>
> wrote:****
>
> Hi,****
>
>  ****
>
> I was wondering if I can still use the DistributedCache class in the
> latest release of Hadoop (Version 2.0.5).****
>
> In my driver class, I use this code to try and add a file to the
> distributed cache:****
>
>  ****
>
> import java.net.URI;****
>
> import org.apache.hadoop.conf.Configuration;****
>
> import org.apache.hadoop.filecache.DistributedCache;****
>
> import org.apache.hadoop.fs.*;****
>
> import org.apache.hadoop.io.*;****
>
> import org.apache.hadoop.mapreduce.*;****
>
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>
>  ****
>
> Configuration conf = new Configuration();****
>
> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>
> Job job = Job.getInstance(); ****
>
> …****
>
>  ****
>
> However, I keep getting warnings that the method addCacheFile() is
> deprecated.****
>
> Is there a more current way to add files to the distributed cache?****
>
>  ****
>
> Thanks in advance,****
>
>  ****
>
> Andrew****
>
> ** **
>

RE: Distributed Cache

Posted by "Botelho, Andrew" <An...@emc.com>.
Ok using job.addCacheFile() seems to compile correctly.
However, how do I then access the cached file in my Mapper code?  Is there a method that will look for any files in the cache?

Thanks,

Andrew

From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Tuesday, July 09, 2013 6:08 PM
To: user@hadoop.apache.org
Subject: Re: Distributed Cache

You should use Job#addCacheFile()

Cheers
On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>> wrote:
Hi,

I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5).
In my driver class, I use this code to try and add a file to the distributed cache:

import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);
Job job = Job.getInstance();
...

However, I keep getting warnings that the method addCacheFile() is deprecated.
Is there a more current way to add files to the distributed cache?

Thanks in advance,

Andrew


RE: Distributed Cache

Posted by "Botelho, Andrew" <An...@emc.com>.
Ok using job.addCacheFile() seems to compile correctly.
However, how do I then access the cached file in my Mapper code?  Is there a method that will look for any files in the cache?

Thanks,

Andrew

From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Tuesday, July 09, 2013 6:08 PM
To: user@hadoop.apache.org
Subject: Re: Distributed Cache

You should use Job#addCacheFile()

Cheers
On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>> wrote:
Hi,

I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5).
In my driver class, I use this code to try and add a file to the distributed cache:

import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);
Job job = Job.getInstance();
...

However, I keep getting warnings that the method addCacheFile() is deprecated.
Is there a more current way to add files to the distributed cache?

Thanks in advance,

Andrew


RE: Distributed Cache

Posted by "Botelho, Andrew" <An...@emc.com>.
Ok using job.addCacheFile() seems to compile correctly.
However, how do I then access the cached file in my Mapper code?  Is there a method that will look for any files in the cache?

Thanks,

Andrew

From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Tuesday, July 09, 2013 6:08 PM
To: user@hadoop.apache.org
Subject: Re: Distributed Cache

You should use Job#addCacheFile()

Cheers
On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>> wrote:
Hi,

I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5).
In my driver class, I use this code to try and add a file to the distributed cache:

import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);
Job job = Job.getInstance();
...

However, I keep getting warnings that the method addCacheFile() is deprecated.
Is there a more current way to add files to the distributed cache?

Thanks in advance,

Andrew


Re: Distributed Cache

Posted by Azuryy Yu <az...@gmail.com>.
It should be like this:
 Configuration conf = new Configuration();
 Job job = new Job(conf, "test");
  job.setJarByClass(Test.class);

 DistributedCache.addCacheFile(new Path("your hdfs path").toUri(),
    job.getConfiguration());


but the best example is test cases:
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/filecache/TestClientDistributedCacheManager.java?view=markup





On Wed, Jul 10, 2013 at 6:07 AM, Ted Yu <yu...@gmail.com> wrote:

> You should use Job#addCacheFile()
>
>
> Cheers
>
>
> On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>wrote:
>
>> Hi,****
>>
>> ** **
>>
>> I was wondering if I can still use the DistributedCache class in the
>> latest release of Hadoop (Version 2.0.5).****
>>
>> In my driver class, I use this code to try and add a file to the
>> distributed cache:****
>>
>> ** **
>>
>> import java.net.URI;****
>>
>> import org.apache.hadoop.conf.Configuration;****
>>
>> import org.apache.hadoop.filecache.DistributedCache;****
>>
>> import org.apache.hadoop.fs.*;****
>>
>> import org.apache.hadoop.io.*;****
>>
>> import org.apache.hadoop.mapreduce.*;****
>>
>> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>>
>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>>
>> ** **
>>
>> Configuration conf = new Configuration();****
>>
>> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>>
>> Job job = Job.getInstance(); ****
>>
>> …****
>>
>> ** **
>>
>> However, I keep getting warnings that the method addCacheFile() is
>> deprecated.****
>>
>> Is there a more current way to add files to the distributed cache?****
>>
>> ** **
>>
>> Thanks in advance,****
>>
>> ** **
>>
>> Andrew****
>>
>
>

RE: Distributed Cache

Posted by "Botelho, Andrew" <An...@emc.com>.
Ok using job.addCacheFile() seems to compile correctly.
However, how do I then access the cached file in my Mapper code?  Is there a method that will look for any files in the cache?

Thanks,

Andrew

From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Tuesday, July 09, 2013 6:08 PM
To: user@hadoop.apache.org
Subject: Re: Distributed Cache

You should use Job#addCacheFile()

Cheers
On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>> wrote:
Hi,

I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5).
In my driver class, I use this code to try and add a file to the distributed cache:

import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);
Job job = Job.getInstance();
...

However, I keep getting warnings that the method addCacheFile() is deprecated.
Is there a more current way to add files to the distributed cache?

Thanks in advance,

Andrew


Re: Distributed Cache

Posted by Azuryy Yu <az...@gmail.com>.
It should be like this:
 Configuration conf = new Configuration();
 Job job = new Job(conf, "test");
  job.setJarByClass(Test.class);

 DistributedCache.addCacheFile(new Path("your hdfs path").toUri(),
    job.getConfiguration());


but the best example is test cases:
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/filecache/TestClientDistributedCacheManager.java?view=markup





On Wed, Jul 10, 2013 at 6:07 AM, Ted Yu <yu...@gmail.com> wrote:

> You should use Job#addCacheFile()
>
>
> Cheers
>
>
> On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>wrote:
>
>> Hi,****
>>
>> ** **
>>
>> I was wondering if I can still use the DistributedCache class in the
>> latest release of Hadoop (Version 2.0.5).****
>>
>> In my driver class, I use this code to try and add a file to the
>> distributed cache:****
>>
>> ** **
>>
>> import java.net.URI;****
>>
>> import org.apache.hadoop.conf.Configuration;****
>>
>> import org.apache.hadoop.filecache.DistributedCache;****
>>
>> import org.apache.hadoop.fs.*;****
>>
>> import org.apache.hadoop.io.*;****
>>
>> import org.apache.hadoop.mapreduce.*;****
>>
>> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>>
>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>>
>> ** **
>>
>> Configuration conf = new Configuration();****
>>
>> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>>
>> Job job = Job.getInstance(); ****
>>
>> …****
>>
>> ** **
>>
>> However, I keep getting warnings that the method addCacheFile() is
>> deprecated.****
>>
>> Is there a more current way to add files to the distributed cache?****
>>
>> ** **
>>
>> Thanks in advance,****
>>
>> ** **
>>
>> Andrew****
>>
>
>

Re: Distributed Cache

Posted by Ted Yu <yu...@gmail.com>.
You should use Job#addCacheFile()


Cheers

On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>wrote:

> Hi,****
>
> ** **
>
> I was wondering if I can still use the DistributedCache class in the
> latest release of Hadoop (Version 2.0.5).****
>
> In my driver class, I use this code to try and add a file to the
> distributed cache:****
>
> ** **
>
> import java.net.URI;****
>
> import org.apache.hadoop.conf.Configuration;****
>
> import org.apache.hadoop.filecache.DistributedCache;****
>
> import org.apache.hadoop.fs.*;****
>
> import org.apache.hadoop.io.*;****
>
> import org.apache.hadoop.mapreduce.*;****
>
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>
> ** **
>
> Configuration conf = new Configuration();****
>
> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>
> Job job = Job.getInstance(); ****
>
> …****
>
> ** **
>
> However, I keep getting warnings that the method addCacheFile() is
> deprecated.****
>
> Is there a more current way to add files to the distributed cache?****
>
> ** **
>
> Thanks in advance,****
>
> ** **
>
> Andrew****
>

Re: Distributed Cache

Posted by Ted Yu <yu...@gmail.com>.
You should use Job#addCacheFile()


Cheers

On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>wrote:

> Hi,****
>
> ** **
>
> I was wondering if I can still use the DistributedCache class in the
> latest release of Hadoop (Version 2.0.5).****
>
> In my driver class, I use this code to try and add a file to the
> distributed cache:****
>
> ** **
>
> import java.net.URI;****
>
> import org.apache.hadoop.conf.Configuration;****
>
> import org.apache.hadoop.filecache.DistributedCache;****
>
> import org.apache.hadoop.fs.*;****
>
> import org.apache.hadoop.io.*;****
>
> import org.apache.hadoop.mapreduce.*;****
>
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>
> ** **
>
> Configuration conf = new Configuration();****
>
> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>
> Job job = Job.getInstance(); ****
>
> …****
>
> ** **
>
> However, I keep getting warnings that the method addCacheFile() is
> deprecated.****
>
> Is there a more current way to add files to the distributed cache?****
>
> ** **
>
> Thanks in advance,****
>
> ** **
>
> Andrew****
>

Re: Distributed Cache

Posted by Ted Yu <yu...@gmail.com>.
You should use Job#addCacheFile()


Cheers

On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>wrote:

> Hi,****
>
> ** **
>
> I was wondering if I can still use the DistributedCache class in the
> latest release of Hadoop (Version 2.0.5).****
>
> In my driver class, I use this code to try and add a file to the
> distributed cache:****
>
> ** **
>
> import java.net.URI;****
>
> import org.apache.hadoop.conf.Configuration;****
>
> import org.apache.hadoop.filecache.DistributedCache;****
>
> import org.apache.hadoop.fs.*;****
>
> import org.apache.hadoop.io.*;****
>
> import org.apache.hadoop.mapreduce.*;****
>
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>
> ** **
>
> Configuration conf = new Configuration();****
>
> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>
> Job job = Job.getInstance(); ****
>
> …****
>
> ** **
>
> However, I keep getting warnings that the method addCacheFile() is
> deprecated.****
>
> Is there a more current way to add files to the distributed cache?****
>
> ** **
>
> Thanks in advance,****
>
> ** **
>
> Andrew****
>

Re: Distributed Cache

Posted by Ted Yu <yu...@gmail.com>.
You should use Job#addCacheFile()


Cheers

On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <An...@emc.com>wrote:

> Hi,****
>
> ** **
>
> I was wondering if I can still use the DistributedCache class in the
> latest release of Hadoop (Version 2.0.5).****
>
> In my driver class, I use this code to try and add a file to the
> distributed cache:****
>
> ** **
>
> import java.net.URI;****
>
> import org.apache.hadoop.conf.Configuration;****
>
> import org.apache.hadoop.filecache.DistributedCache;****
>
> import org.apache.hadoop.fs.*;****
>
> import org.apache.hadoop.io.*;****
>
> import org.apache.hadoop.mapreduce.*;****
>
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>
> ** **
>
> Configuration conf = new Configuration();****
>
> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>
> Job job = Job.getInstance(); ****
>
> …****
>
> ** **
>
> However, I keep getting warnings that the method addCacheFile() is
> deprecated.****
>
> Is there a more current way to add files to the distributed cache?****
>
> ** **
>
> Thanks in advance,****
>
> ** **
>
> Andrew****
>