You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by "Giordano, Michael" <Mi...@Vistronix.com> on 2014/01/29 22:13:46 UTC

DistributedCache deprecated

I noticed that in Hadoop 2.2.0 org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.



(http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)



Is there a class that provides equivalent functionality? My application relies heavily on DistributedCache.



Thanks,

Mike G.

This communication, along with its attachments, is considered confidential and proprietary to Vistronix.  It is intended only for the use of the person(s) named above.  Note that unauthorized disclosure or distribution of information not generally known to the public is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately.

Re: DistributedCache deprecated

Posted by Amit Mittal <am...@gmail.com>.
Hi Prav,

You are correct, thanks for the explanation. As per below link, I can see
that Job's method internally calls to DistributedCache itself (
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.addCacheFile%28java.net.URI%29)
after ensuring state, I think that might be the reason. Here is one of the
method:

1067 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1067>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

  public void  <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>addCacheFile(URI
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/net/URI.java#URI>
uri) {

1068 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1068>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

    ensureState
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.ensureState%28org.apache.hadoop.mapreduce.Job.JobState%29>(JobState
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.JobState.0DEFINE>.DEFINE
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.JobState.0DEFINE>);

1069 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1069>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

    DistributedCache.addCacheFile
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/filecache/DistributedCache.java#DistributedCache.addCacheFile%28java.net.URI%2Corg.apache.hadoop.conf.Configuration%29>(uri,
conf <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/task/JobContextImpl.java#JobContextImpl.0conf>);

1070 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1070>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

  }


Thanks
Amit


On Thu, Jan 30, 2014 at 6:19 PM, praveenesh kumar <pr...@gmail.com>wrote:

> Hi Amit,
>
> Side data distribution is altogether a different concept at all. Its when
> you set custom (key,value) pairs and use Job object for doing that, so that
> you can use them in your mappers/reducers. It is good when you want to pass
> some small information to your mappers/reducers like extra command line
> arguments that is required by mappers/reducers.
> We were not discussing Side data distribution at all.
>
> The question was DistributedCache gets deprecated, where we can find the
> right methods which DistributedCache delivers.
> If you see the DistributedCache class in MR v1 -
>
> https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/filecache/DistributedCache.html
>
> and compare it with Job class in MR v2 -
>
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>
> You would see the methods of DistributedCache class has been added to Job
> class. Since DistributedCache is deprecated, my guess was that we can use
> Job class to use distributed cache using the same methods which
> DistributedCache used to provide.
>
> Everything else is same, its just that you use Job class to set your files
> for Distributed cache inside your job configuration. Well I am sorry. I
> don't have any nice article as I said that I also did this as part of my
> experiment and I was able to use it without any issues, so that's why I
> suggested it.
>
> Since most of the developers still using MRv1 on hadoop 2.0, that is why
> these changes have not been come into highlights so far. I am hoping a new
> documentation on how to use MRv2 would come soon, but if you understand
> MRv1, I don't see any reasons why can't you just move around a bit in API
> and find your relevant classes that you want to use by yourself.  Again, as
> I said, I don't have any valid statements of what I am saying, they are
> just the results of my own experiments, which you are most welcome to
> conduct and play with. Happy Coding..!!
>
> Regards
> Prav
>
>
>
>
> On Thu, Jan 30, 2014 at 12:27 PM, Amit Mittal <am...@gmail.com>wrote:
>
>> Hi Prav,
>>
>> Yes, you are correct that DistributedCache does not upload file into
>> memory. Also using job configuration and DistributedCache are 2 different
>> approaches. I am referring based on "Hadoop: The definitive guide"
>> Chapter:8 > Side Data Distribution (Page 288-295).
>> As you are saying that now methods of DistributedCache moved to Job, I
>> request if you please share some article or document on that for my better
>> understanding, it will be great help.
>>
>> Thanks
>>  Amit
>>
>>
>> On Thu, Jan 30, 2014 at 5:35 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>
>>> Hi Amit,
>>>
>>> I am not sure how are they linked with DistributedCache.. Job
>>> configuration is not uploading any data in memory.. As far as I am aware of
>>> how DistributedCache works, nothing get loaded in memory. Distributed cache
>>> just copies the files into slave nodes, so that they are accessible to
>>> mappers/reducers. Usually the location is
>>> ${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
>>> distribution to distribution) You always have to read the files in your
>>> mapper or reducer when ever you want to use them.
>>>
>>> What has happened is the method of DistributedCache class has now been
>>> added to Job class, and I am assuming they won't change the functionality
>>> of how distributed cache methods used to work, otherwise there would have
>>> been some nice articles on that, plus I don't see any reason of changing
>>> that as well too..  so everything works still the same way.. Its just that
>>> you use the new Job class to use distributed cache features.
>>>
>>> I am not sure what entries you are exactly pointing to. Am I missing
>>> anything here ?
>>>
>>>
>>> Regards
>>> Prav
>>>
>>>
>>> On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com>wrote:
>>>
>>>> Hi Mike & Prav,
>>>>
>>>> Although I am new to Hadoop, but would like to add my 2 cents if that
>>>> helps.
>>>> We are having 2 ways for distribution of shared data, one is using Job
>>>> configuration and other is DistributedCache.
>>>> As job configuration is read by the JT, TT and child JVMs, and each
>>>> time the configuration is read, all of its entries are read in memory, even
>>>> if they are not used. So using job configuration is not advised if the data
>>>> is more than few kilobytes. So it is not alternative to DistributedCache
>>>> unless some modifications are done in Job configuration to address this
>>>> limitation.
>>>> So I am also curious to know the alternatative to DistributedCache
>>>> class.
>>>>
>>>> Thanks
>>>> Amit
>>>>
>>>>
>>>>
>>>> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
>>>> Michael.Giordano@vistronix.com> wrote:
>>>>
>>>>>  I noticed that in Hadoop 2.2.0
>>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>>
>>>>>
>>>>>
>>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>>
>>>>>
>>>>>
>>>>> Is there a class that provides equivalent functionality? My
>>>>> application relies heavily on DistributedCache.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mike G.
>>>>>
>>>>> This communication, along with its attachments, is considered
>>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>>> distribution of information not generally known to the public is strictly
>>>>> prohibited.  If you are not the intended recipient, please notify the
>>>>> sender immediately.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: DistributedCache deprecated

Posted by Amit Mittal <am...@gmail.com>.
Hi Prav,

You are correct, thanks for the explanation. As per below link, I can see
that Job's method internally calls to DistributedCache itself (
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.addCacheFile%28java.net.URI%29)
after ensuring state, I think that might be the reason. Here is one of the
method:

1067 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1067>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

  public void  <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>addCacheFile(URI
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/net/URI.java#URI>
uri) {

1068 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1068>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

    ensureState
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.ensureState%28org.apache.hadoop.mapreduce.Job.JobState%29>(JobState
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.JobState.0DEFINE>.DEFINE
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.JobState.0DEFINE>);

1069 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1069>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

    DistributedCache.addCacheFile
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/filecache/DistributedCache.java#DistributedCache.addCacheFile%28java.net.URI%2Corg.apache.hadoop.conf.Configuration%29>(uri,
conf <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/task/JobContextImpl.java#JobContextImpl.0conf>);

1070 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1070>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

  }


Thanks
Amit


On Thu, Jan 30, 2014 at 6:19 PM, praveenesh kumar <pr...@gmail.com>wrote:

> Hi Amit,
>
> Side data distribution is altogether a different concept at all. Its when
> you set custom (key,value) pairs and use Job object for doing that, so that
> you can use them in your mappers/reducers. It is good when you want to pass
> some small information to your mappers/reducers like extra command line
> arguments that is required by mappers/reducers.
> We were not discussing Side data distribution at all.
>
> The question was DistributedCache gets deprecated, where we can find the
> right methods which DistributedCache delivers.
> If you see the DistributedCache class in MR v1 -
>
> https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/filecache/DistributedCache.html
>
> and compare it with Job class in MR v2 -
>
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>
> You would see the methods of DistributedCache class has been added to Job
> class. Since DistributedCache is deprecated, my guess was that we can use
> Job class to use distributed cache using the same methods which
> DistributedCache used to provide.
>
> Everything else is same, its just that you use Job class to set your files
> for Distributed cache inside your job configuration. Well I am sorry. I
> don't have any nice article as I said that I also did this as part of my
> experiment and I was able to use it without any issues, so that's why I
> suggested it.
>
> Since most of the developers still using MRv1 on hadoop 2.0, that is why
> these changes have not been come into highlights so far. I am hoping a new
> documentation on how to use MRv2 would come soon, but if you understand
> MRv1, I don't see any reasons why can't you just move around a bit in API
> and find your relevant classes that you want to use by yourself.  Again, as
> I said, I don't have any valid statements of what I am saying, they are
> just the results of my own experiments, which you are most welcome to
> conduct and play with. Happy Coding..!!
>
> Regards
> Prav
>
>
>
>
> On Thu, Jan 30, 2014 at 12:27 PM, Amit Mittal <am...@gmail.com>wrote:
>
>> Hi Prav,
>>
>> Yes, you are correct that DistributedCache does not upload file into
>> memory. Also using job configuration and DistributedCache are 2 different
>> approaches. I am referring based on "Hadoop: The definitive guide"
>> Chapter:8 > Side Data Distribution (Page 288-295).
>> As you are saying that now methods of DistributedCache moved to Job, I
>> request if you please share some article or document on that for my better
>> understanding, it will be great help.
>>
>> Thanks
>>  Amit
>>
>>
>> On Thu, Jan 30, 2014 at 5:35 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>
>>> Hi Amit,
>>>
>>> I am not sure how are they linked with DistributedCache.. Job
>>> configuration is not uploading any data in memory.. As far as I am aware of
>>> how DistributedCache works, nothing get loaded in memory. Distributed cache
>>> just copies the files into slave nodes, so that they are accessible to
>>> mappers/reducers. Usually the location is
>>> ${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
>>> distribution to distribution) You always have to read the files in your
>>> mapper or reducer when ever you want to use them.
>>>
>>> What has happened is the method of DistributedCache class has now been
>>> added to Job class, and I am assuming they won't change the functionality
>>> of how distributed cache methods used to work, otherwise there would have
>>> been some nice articles on that, plus I don't see any reason of changing
>>> that as well too..  so everything works still the same way.. Its just that
>>> you use the new Job class to use distributed cache features.
>>>
>>> I am not sure what entries you are exactly pointing to. Am I missing
>>> anything here ?
>>>
>>>
>>> Regards
>>> Prav
>>>
>>>
>>> On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com>wrote:
>>>
>>>> Hi Mike & Prav,
>>>>
>>>> Although I am new to Hadoop, but would like to add my 2 cents if that
>>>> helps.
>>>> We are having 2 ways for distribution of shared data, one is using Job
>>>> configuration and other is DistributedCache.
>>>> As job configuration is read by the JT, TT and child JVMs, and each
>>>> time the configuration is read, all of its entries are read in memory, even
>>>> if they are not used. So using job configuration is not advised if the data
>>>> is more than few kilobytes. So it is not alternative to DistributedCache
>>>> unless some modifications are done in Job configuration to address this
>>>> limitation.
>>>> So I am also curious to know the alternatative to DistributedCache
>>>> class.
>>>>
>>>> Thanks
>>>> Amit
>>>>
>>>>
>>>>
>>>> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
>>>> Michael.Giordano@vistronix.com> wrote:
>>>>
>>>>>  I noticed that in Hadoop 2.2.0
>>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>>
>>>>>
>>>>>
>>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>>
>>>>>
>>>>>
>>>>> Is there a class that provides equivalent functionality? My
>>>>> application relies heavily on DistributedCache.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mike G.
>>>>>
>>>>> This communication, along with its attachments, is considered
>>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>>> distribution of information not generally known to the public is strictly
>>>>> prohibited.  If you are not the intended recipient, please notify the
>>>>> sender immediately.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: DistributedCache deprecated

Posted by Amit Mittal <am...@gmail.com>.
Hi Prav,

You are correct, thanks for the explanation. As per below link, I can see
that Job's method internally calls to DistributedCache itself (
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.addCacheFile%28java.net.URI%29)
after ensuring state, I think that might be the reason. Here is one of the
method:

1067 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1067>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

  public void  <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>addCacheFile(URI
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/net/URI.java#URI>
uri) {

1068 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1068>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

    ensureState
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.ensureState%28org.apache.hadoop.mapreduce.Job.JobState%29>(JobState
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.JobState.0DEFINE>.DEFINE
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.JobState.0DEFINE>);

1069 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1069>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

    DistributedCache.addCacheFile
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/filecache/DistributedCache.java#DistributedCache.addCacheFile%28java.net.URI%2Corg.apache.hadoop.conf.Configuration%29>(uri,
conf <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/task/JobContextImpl.java#JobContextImpl.0conf>);

1070 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1070>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

  }


Thanks
Amit


On Thu, Jan 30, 2014 at 6:19 PM, praveenesh kumar <pr...@gmail.com>wrote:

> Hi Amit,
>
> Side data distribution is altogether a different concept at all. Its when
> you set custom (key,value) pairs and use Job object for doing that, so that
> you can use them in your mappers/reducers. It is good when you want to pass
> some small information to your mappers/reducers like extra command line
> arguments that is required by mappers/reducers.
> We were not discussing Side data distribution at all.
>
> The question was DistributedCache gets deprecated, where we can find the
> right methods which DistributedCache delivers.
> If you see the DistributedCache class in MR v1 -
>
> https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/filecache/DistributedCache.html
>
> and compare it with Job class in MR v2 -
>
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>
> You would see the methods of DistributedCache class has been added to Job
> class. Since DistributedCache is deprecated, my guess was that we can use
> Job class to use distributed cache using the same methods which
> DistributedCache used to provide.
>
> Everything else is same, its just that you use Job class to set your files
> for Distributed cache inside your job configuration. Well I am sorry. I
> don't have any nice article as I said that I also did this as part of my
> experiment and I was able to use it without any issues, so that's why I
> suggested it.
>
> Since most of the developers still using MRv1 on hadoop 2.0, that is why
> these changes have not been come into highlights so far. I am hoping a new
> documentation on how to use MRv2 would come soon, but if you understand
> MRv1, I don't see any reasons why can't you just move around a bit in API
> and find your relevant classes that you want to use by yourself.  Again, as
> I said, I don't have any valid statements of what I am saying, they are
> just the results of my own experiments, which you are most welcome to
> conduct and play with. Happy Coding..!!
>
> Regards
> Prav
>
>
>
>
> On Thu, Jan 30, 2014 at 12:27 PM, Amit Mittal <am...@gmail.com>wrote:
>
>> Hi Prav,
>>
>> Yes, you are correct that DistributedCache does not upload file into
>> memory. Also using job configuration and DistributedCache are 2 different
>> approaches. I am referring based on "Hadoop: The definitive guide"
>> Chapter:8 > Side Data Distribution (Page 288-295).
>> As you are saying that now methods of DistributedCache moved to Job, I
>> request if you please share some article or document on that for my better
>> understanding, it will be great help.
>>
>> Thanks
>>  Amit
>>
>>
>> On Thu, Jan 30, 2014 at 5:35 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>
>>> Hi Amit,
>>>
>>> I am not sure how are they linked with DistributedCache.. Job
>>> configuration is not uploading any data in memory.. As far as I am aware of
>>> how DistributedCache works, nothing get loaded in memory. Distributed cache
>>> just copies the files into slave nodes, so that they are accessible to
>>> mappers/reducers. Usually the location is
>>> ${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
>>> distribution to distribution) You always have to read the files in your
>>> mapper or reducer when ever you want to use them.
>>>
>>> What has happened is the method of DistributedCache class has now been
>>> added to Job class, and I am assuming they won't change the functionality
>>> of how distributed cache methods used to work, otherwise there would have
>>> been some nice articles on that, plus I don't see any reason of changing
>>> that as well too..  so everything works still the same way.. Its just that
>>> you use the new Job class to use distributed cache features.
>>>
>>> I am not sure what entries you are exactly pointing to. Am I missing
>>> anything here ?
>>>
>>>
>>> Regards
>>> Prav
>>>
>>>
>>> On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com>wrote:
>>>
>>>> Hi Mike & Prav,
>>>>
>>>> Although I am new to Hadoop, but would like to add my 2 cents if that
>>>> helps.
>>>> We are having 2 ways for distribution of shared data, one is using Job
>>>> configuration and other is DistributedCache.
>>>> As job configuration is read by the JT, TT and child JVMs, and each
>>>> time the configuration is read, all of its entries are read in memory, even
>>>> if they are not used. So using job configuration is not advised if the data
>>>> is more than few kilobytes. So it is not alternative to DistributedCache
>>>> unless some modifications are done in Job configuration to address this
>>>> limitation.
>>>> So I am also curious to know the alternatative to DistributedCache
>>>> class.
>>>>
>>>> Thanks
>>>> Amit
>>>>
>>>>
>>>>
>>>> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
>>>> Michael.Giordano@vistronix.com> wrote:
>>>>
>>>>>  I noticed that in Hadoop 2.2.0
>>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>>
>>>>>
>>>>>
>>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>>
>>>>>
>>>>>
>>>>> Is there a class that provides equivalent functionality? My
>>>>> application relies heavily on DistributedCache.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mike G.
>>>>>
>>>>> This communication, along with its attachments, is considered
>>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>>> distribution of information not generally known to the public is strictly
>>>>> prohibited.  If you are not the intended recipient, please notify the
>>>>> sender immediately.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: DistributedCache deprecated

Posted by Amit Mittal <am...@gmail.com>.
Hi Prav,

You are correct, thanks for the explanation. As per below link, I can see
that Job's method internally calls to DistributedCache itself (
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.addCacheFile%28java.net.URI%29)
after ensuring state, I think that might be the reason. Here is one of the
method:

1067 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1067>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

  public void  <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>addCacheFile(URI
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/net/URI.java#URI>
uri) {

1068 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1068>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

    ensureState
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.ensureState%28org.apache.hadoop.mapreduce.Job.JobState%29>(JobState
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.JobState.0DEFINE>.DEFINE
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#Job.JobState.0DEFINE>);

1069 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1069>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

    DistributedCache.addCacheFile
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/filecache/DistributedCache.java#DistributedCache.addCacheFile%28java.net.URI%2Corg.apache.hadoop.conf.Configuration%29>(uri,
conf <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/task/JobContextImpl.java#JobContextImpl.0conf>);

1070 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#1070>

<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache/hadoop/mapreduce/Job.java#>

  }


Thanks
Amit


On Thu, Jan 30, 2014 at 6:19 PM, praveenesh kumar <pr...@gmail.com>wrote:

> Hi Amit,
>
> Side data distribution is altogether a different concept at all. Its when
> you set custom (key,value) pairs and use Job object for doing that, so that
> you can use them in your mappers/reducers. It is good when you want to pass
> some small information to your mappers/reducers like extra command line
> arguments that is required by mappers/reducers.
> We were not discussing Side data distribution at all.
>
> The question was DistributedCache gets deprecated, where we can find the
> right methods which DistributedCache delivers.
> If you see the DistributedCache class in MR v1 -
>
> https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/filecache/DistributedCache.html
>
> and compare it with Job class in MR v2 -
>
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>
> You would see the methods of DistributedCache class has been added to Job
> class. Since DistributedCache is deprecated, my guess was that we can use
> Job class to use distributed cache using the same methods which
> DistributedCache used to provide.
>
> Everything else is same, its just that you use Job class to set your files
> for Distributed cache inside your job configuration. Well I am sorry. I
> don't have any nice article as I said that I also did this as part of my
> experiment and I was able to use it without any issues, so that's why I
> suggested it.
>
> Since most of the developers still using MRv1 on hadoop 2.0, that is why
> these changes have not been come into highlights so far. I am hoping a new
> documentation on how to use MRv2 would come soon, but if you understand
> MRv1, I don't see any reasons why can't you just move around a bit in API
> and find your relevant classes that you want to use by yourself.  Again, as
> I said, I don't have any valid statements of what I am saying, they are
> just the results of my own experiments, which you are most welcome to
> conduct and play with. Happy Coding..!!
>
> Regards
> Prav
>
>
>
>
> On Thu, Jan 30, 2014 at 12:27 PM, Amit Mittal <am...@gmail.com>wrote:
>
>> Hi Prav,
>>
>> Yes, you are correct that DistributedCache does not upload file into
>> memory. Also using job configuration and DistributedCache are 2 different
>> approaches. I am referring based on "Hadoop: The definitive guide"
>> Chapter:8 > Side Data Distribution (Page 288-295).
>> As you are saying that now methods of DistributedCache moved to Job, I
>> request if you please share some article or document on that for my better
>> understanding, it will be great help.
>>
>> Thanks
>>  Amit
>>
>>
>> On Thu, Jan 30, 2014 at 5:35 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>
>>> Hi Amit,
>>>
>>> I am not sure how are they linked with DistributedCache.. Job
>>> configuration is not uploading any data in memory.. As far as I am aware of
>>> how DistributedCache works, nothing get loaded in memory. Distributed cache
>>> just copies the files into slave nodes, so that they are accessible to
>>> mappers/reducers. Usually the location is
>>> ${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
>>> distribution to distribution) You always have to read the files in your
>>> mapper or reducer when ever you want to use them.
>>>
>>> What has happened is the method of DistributedCache class has now been
>>> added to Job class, and I am assuming they won't change the functionality
>>> of how distributed cache methods used to work, otherwise there would have
>>> been some nice articles on that, plus I don't see any reason of changing
>>> that as well too..  so everything works still the same way.. Its just that
>>> you use the new Job class to use distributed cache features.
>>>
>>> I am not sure what entries you are exactly pointing to. Am I missing
>>> anything here ?
>>>
>>>
>>> Regards
>>> Prav
>>>
>>>
>>> On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com>wrote:
>>>
>>>> Hi Mike & Prav,
>>>>
>>>> Although I am new to Hadoop, but would like to add my 2 cents if that
>>>> helps.
>>>> We are having 2 ways for distribution of shared data, one is using Job
>>>> configuration and other is DistributedCache.
>>>> As job configuration is read by the JT, TT and child JVMs, and each
>>>> time the configuration is read, all of its entries are read in memory, even
>>>> if they are not used. So using job configuration is not advised if the data
>>>> is more than few kilobytes. So it is not alternative to DistributedCache
>>>> unless some modifications are done in Job configuration to address this
>>>> limitation.
>>>> So I am also curious to know the alternatative to DistributedCache
>>>> class.
>>>>
>>>> Thanks
>>>> Amit
>>>>
>>>>
>>>>
>>>> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
>>>> Michael.Giordano@vistronix.com> wrote:
>>>>
>>>>>  I noticed that in Hadoop 2.2.0
>>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>>
>>>>>
>>>>>
>>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>>
>>>>>
>>>>>
>>>>> Is there a class that provides equivalent functionality? My
>>>>> application relies heavily on DistributedCache.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mike G.
>>>>>
>>>>> This communication, along with its attachments, is considered
>>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>>> distribution of information not generally known to the public is strictly
>>>>> prohibited.  If you are not the intended recipient, please notify the
>>>>> sender immediately.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Amit,

Side data distribution is altogether a different concept at all. Its when
you set custom (key,value) pairs and use Job object for doing that, so that
you can use them in your mappers/reducers. It is good when you want to pass
some small information to your mappers/reducers like extra command line
arguments that is required by mappers/reducers.
We were not discussing Side data distribution at all.

The question was DistributedCache gets deprecated, where we can find the
right methods which DistributedCache delivers.
If you see the DistributedCache class in MR v1 -
https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/filecache/DistributedCache.html

and compare it with Job class in MR v2 -
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html

You would see the methods of DistributedCache class has been added to Job
class. Since DistributedCache is deprecated, my guess was that we can use
Job class to use distributed cache using the same methods which
DistributedCache used to provide.

Everything else is same, its just that you use Job class to set your files
for Distributed cache inside your job configuration. Well I am sorry. I
don't have any nice article as I said that I also did this as part of my
experiment and I was able to use it without any issues, so that's why I
suggested it.

Since most of the developers still using MRv1 on hadoop 2.0, that is why
these changes have not been come into highlights so far. I am hoping a new
documentation on how to use MRv2 would come soon, but if you understand
MRv1, I don't see any reasons why can't you just move around a bit in API
and find your relevant classes that you want to use by yourself.  Again, as
I said, I don't have any valid statements of what I am saying, they are
just the results of my own experiments, which you are most welcome to
conduct and play with. Happy Coding..!!

Regards
Prav




On Thu, Jan 30, 2014 at 12:27 PM, Amit Mittal <am...@gmail.com> wrote:

> Hi Prav,
>
> Yes, you are correct that DistributedCache does not upload file into
> memory. Also using job configuration and DistributedCache are 2 different
> approaches. I am referring based on "Hadoop: The definitive guide"
> Chapter:8 > Side Data Distribution (Page 288-295).
> As you are saying that now methods of DistributedCache moved to Job, I
> request if you please share some article or document on that for my better
> understanding, it will be great help.
>
> Thanks
> Amit
>
>
> On Thu, Jan 30, 2014 at 5:35 PM, praveenesh kumar <pr...@gmail.com>wrote:
>
>> Hi Amit,
>>
>> I am not sure how are they linked with DistributedCache.. Job
>> configuration is not uploading any data in memory.. As far as I am aware of
>> how DistributedCache works, nothing get loaded in memory. Distributed cache
>> just copies the files into slave nodes, so that they are accessible to
>> mappers/reducers. Usually the location is
>> ${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
>> distribution to distribution) You always have to read the files in your
>> mapper or reducer when ever you want to use them.
>>
>> What has happened is the method of DistributedCache class has now been
>> added to Job class, and I am assuming they won't change the functionality
>> of how distributed cache methods used to work, otherwise there would have
>> been some nice articles on that, plus I don't see any reason of changing
>> that as well too..  so everything works still the same way.. Its just that
>> you use the new Job class to use distributed cache features.
>>
>> I am not sure what entries you are exactly pointing to. Am I missing
>> anything here ?
>>
>>
>> Regards
>> Prav
>>
>>
>> On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com>wrote:
>>
>>> Hi Mike & Prav,
>>>
>>> Although I am new to Hadoop, but would like to add my 2 cents if that
>>> helps.
>>> We are having 2 ways for distribution of shared data, one is using Job
>>> configuration and other is DistributedCache.
>>> As job configuration is read by the JT, TT and child JVMs, and each time
>>> the configuration is read, all of its entries are read in memory, even if
>>> they are not used. So using job configuration is not advised if the data is
>>> more than few kilobytes. So it is not alternative to DistributedCache
>>> unless some modifications are done in Job configuration to address this
>>> limitation.
>>> So I am also curious to know the alternatative to DistributedCache class.
>>>
>>> Thanks
>>> Amit
>>>
>>>
>>>
>>> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
>>> Michael.Giordano@vistronix.com> wrote:
>>>
>>>>  I noticed that in Hadoop 2.2.0
>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>
>>>>
>>>>
>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>
>>>>
>>>>
>>>> Is there a class that provides equivalent functionality? My application
>>>> relies heavily on DistributedCache.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mike G.
>>>>
>>>> This communication, along with its attachments, is considered
>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>> distribution of information not generally known to the public is strictly
>>>> prohibited.  If you are not the intended recipient, please notify the
>>>> sender immediately.
>>>>
>>>
>>>
>>
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Amit,

Side data distribution is altogether a different concept at all. Its when
you set custom (key,value) pairs and use Job object for doing that, so that
you can use them in your mappers/reducers. It is good when you want to pass
some small information to your mappers/reducers like extra command line
arguments that is required by mappers/reducers.
We were not discussing Side data distribution at all.

The question was DistributedCache gets deprecated, where we can find the
right methods which DistributedCache delivers.
If you see the DistributedCache class in MR v1 -
https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/filecache/DistributedCache.html

and compare it with Job class in MR v2 -
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html

You would see the methods of DistributedCache class has been added to Job
class. Since DistributedCache is deprecated, my guess was that we can use
Job class to use distributed cache using the same methods which
DistributedCache used to provide.

Everything else is same, its just that you use Job class to set your files
for Distributed cache inside your job configuration. Well I am sorry. I
don't have any nice article as I said that I also did this as part of my
experiment and I was able to use it without any issues, so that's why I
suggested it.

Since most of the developers still using MRv1 on hadoop 2.0, that is why
these changes have not been come into highlights so far. I am hoping a new
documentation on how to use MRv2 would come soon, but if you understand
MRv1, I don't see any reasons why can't you just move around a bit in API
and find your relevant classes that you want to use by yourself.  Again, as
I said, I don't have any valid statements of what I am saying, they are
just the results of my own experiments, which you are most welcome to
conduct and play with. Happy Coding..!!

Regards
Prav




On Thu, Jan 30, 2014 at 12:27 PM, Amit Mittal <am...@gmail.com> wrote:

> Hi Prav,
>
> Yes, you are correct that DistributedCache does not upload file into
> memory. Also using job configuration and DistributedCache are 2 different
> approaches. I am referring based on "Hadoop: The definitive guide"
> Chapter:8 > Side Data Distribution (Page 288-295).
> As you are saying that now methods of DistributedCache moved to Job, I
> request if you please share some article or document on that for my better
> understanding, it will be great help.
>
> Thanks
> Amit
>
>
> On Thu, Jan 30, 2014 at 5:35 PM, praveenesh kumar <pr...@gmail.com>wrote:
>
>> Hi Amit,
>>
>> I am not sure how are they linked with DistributedCache.. Job
>> configuration is not uploading any data in memory.. As far as I am aware of
>> how DistributedCache works, nothing get loaded in memory. Distributed cache
>> just copies the files into slave nodes, so that they are accessible to
>> mappers/reducers. Usually the location is
>> ${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
>> distribution to distribution) You always have to read the files in your
>> mapper or reducer when ever you want to use them.
>>
>> What has happened is the method of DistributedCache class has now been
>> added to Job class, and I am assuming they won't change the functionality
>> of how distributed cache methods used to work, otherwise there would have
>> been some nice articles on that, plus I don't see any reason of changing
>> that as well too..  so everything works still the same way.. Its just that
>> you use the new Job class to use distributed cache features.
>>
>> I am not sure what entries you are exactly pointing to. Am I missing
>> anything here ?
>>
>>
>> Regards
>> Prav
>>
>>
>> On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com>wrote:
>>
>>> Hi Mike & Prav,
>>>
>>> Although I am new to Hadoop, but would like to add my 2 cents if that
>>> helps.
>>> We are having 2 ways for distribution of shared data, one is using Job
>>> configuration and other is DistributedCache.
>>> As job configuration is read by the JT, TT and child JVMs, and each time
>>> the configuration is read, all of its entries are read in memory, even if
>>> they are not used. So using job configuration is not advised if the data is
>>> more than few kilobytes. So it is not alternative to DistributedCache
>>> unless some modifications are done in Job configuration to address this
>>> limitation.
>>> So I am also curious to know the alternatative to DistributedCache class.
>>>
>>> Thanks
>>> Amit
>>>
>>>
>>>
>>> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
>>> Michael.Giordano@vistronix.com> wrote:
>>>
>>>>  I noticed that in Hadoop 2.2.0
>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>
>>>>
>>>>
>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>
>>>>
>>>>
>>>> Is there a class that provides equivalent functionality? My application
>>>> relies heavily on DistributedCache.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mike G.
>>>>
>>>> This communication, along with its attachments, is considered
>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>> distribution of information not generally known to the public is strictly
>>>> prohibited.  If you are not the intended recipient, please notify the
>>>> sender immediately.
>>>>
>>>
>>>
>>
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Amit,

Side data distribution is altogether a different concept at all. Its when
you set custom (key,value) pairs and use Job object for doing that, so that
you can use them in your mappers/reducers. It is good when you want to pass
some small information to your mappers/reducers like extra command line
arguments that is required by mappers/reducers.
We were not discussing Side data distribution at all.

The question was DistributedCache gets deprecated, where we can find the
right methods which DistributedCache delivers.
If you see the DistributedCache class in MR v1 -
https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/filecache/DistributedCache.html

and compare it with Job class in MR v2 -
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html

You would see the methods of DistributedCache class has been added to Job
class. Since DistributedCache is deprecated, my guess was that we can use
Job class to use distributed cache using the same methods which
DistributedCache used to provide.

Everything else is same, its just that you use Job class to set your files
for Distributed cache inside your job configuration. Well I am sorry. I
don't have any nice article as I said that I also did this as part of my
experiment and I was able to use it without any issues, so that's why I
suggested it.

Since most of the developers still using MRv1 on hadoop 2.0, that is why
these changes have not been come into highlights so far. I am hoping a new
documentation on how to use MRv2 would come soon, but if you understand
MRv1, I don't see any reasons why can't you just move around a bit in API
and find your relevant classes that you want to use by yourself.  Again, as
I said, I don't have any valid statements of what I am saying, they are
just the results of my own experiments, which you are most welcome to
conduct and play with. Happy Coding..!!

Regards
Prav




On Thu, Jan 30, 2014 at 12:27 PM, Amit Mittal <am...@gmail.com> wrote:

> Hi Prav,
>
> Yes, you are correct that DistributedCache does not upload file into
> memory. Also using job configuration and DistributedCache are 2 different
> approaches. I am referring based on "Hadoop: The definitive guide"
> Chapter:8 > Side Data Distribution (Page 288-295).
> As you are saying that now methods of DistributedCache moved to Job, I
> request if you please share some article or document on that for my better
> understanding, it will be great help.
>
> Thanks
> Amit
>
>
> On Thu, Jan 30, 2014 at 5:35 PM, praveenesh kumar <pr...@gmail.com>wrote:
>
>> Hi Amit,
>>
>> I am not sure how are they linked with DistributedCache.. Job
>> configuration is not uploading any data in memory.. As far as I am aware of
>> how DistributedCache works, nothing get loaded in memory. Distributed cache
>> just copies the files into slave nodes, so that they are accessible to
>> mappers/reducers. Usually the location is
>> ${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
>> distribution to distribution) You always have to read the files in your
>> mapper or reducer when ever you want to use them.
>>
>> What has happened is the method of DistributedCache class has now been
>> added to Job class, and I am assuming they won't change the functionality
>> of how distributed cache methods used to work, otherwise there would have
>> been some nice articles on that, plus I don't see any reason of changing
>> that as well too..  so everything works still the same way.. Its just that
>> you use the new Job class to use distributed cache features.
>>
>> I am not sure what entries you are exactly pointing to. Am I missing
>> anything here ?
>>
>>
>> Regards
>> Prav
>>
>>
>> On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com>wrote:
>>
>>> Hi Mike & Prav,
>>>
>>> Although I am new to Hadoop, but would like to add my 2 cents if that
>>> helps.
>>> We are having 2 ways for distribution of shared data, one is using Job
>>> configuration and other is DistributedCache.
>>> As job configuration is read by the JT, TT and child JVMs, and each time
>>> the configuration is read, all of its entries are read in memory, even if
>>> they are not used. So using job configuration is not advised if the data is
>>> more than few kilobytes. So it is not alternative to DistributedCache
>>> unless some modifications are done in Job configuration to address this
>>> limitation.
>>> So I am also curious to know the alternatative to DistributedCache class.
>>>
>>> Thanks
>>> Amit
>>>
>>>
>>>
>>> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
>>> Michael.Giordano@vistronix.com> wrote:
>>>
>>>>  I noticed that in Hadoop 2.2.0
>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>
>>>>
>>>>
>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>
>>>>
>>>>
>>>> Is there a class that provides equivalent functionality? My application
>>>> relies heavily on DistributedCache.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mike G.
>>>>
>>>> This communication, along with its attachments, is considered
>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>> distribution of information not generally known to the public is strictly
>>>> prohibited.  If you are not the intended recipient, please notify the
>>>> sender immediately.
>>>>
>>>
>>>
>>
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Amit,

Side data distribution is altogether a different concept at all. Its when
you set custom (key,value) pairs and use Job object for doing that, so that
you can use them in your mappers/reducers. It is good when you want to pass
some small information to your mappers/reducers like extra command line
arguments that is required by mappers/reducers.
We were not discussing Side data distribution at all.

The question was DistributedCache gets deprecated, where we can find the
right methods which DistributedCache delivers.
If you see the DistributedCache class in MR v1 -
https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/filecache/DistributedCache.html

and compare it with Job class in MR v2 -
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html

You would see the methods of DistributedCache class has been added to Job
class. Since DistributedCache is deprecated, my guess was that we can use
Job class to use distributed cache using the same methods which
DistributedCache used to provide.

Everything else is same, its just that you use Job class to set your files
for Distributed cache inside your job configuration. Well I am sorry. I
don't have any nice article as I said that I also did this as part of my
experiment and I was able to use it without any issues, so that's why I
suggested it.

Since most of the developers still using MRv1 on hadoop 2.0, that is why
these changes have not been come into highlights so far. I am hoping a new
documentation on how to use MRv2 would come soon, but if you understand
MRv1, I don't see any reasons why can't you just move around a bit in API
and find your relevant classes that you want to use by yourself.  Again, as
I said, I don't have any valid statements of what I am saying, they are
just the results of my own experiments, which you are most welcome to
conduct and play with. Happy Coding..!!

Regards
Prav




On Thu, Jan 30, 2014 at 12:27 PM, Amit Mittal <am...@gmail.com> wrote:

> Hi Prav,
>
> Yes, you are correct that DistributedCache does not upload file into
> memory. Also using job configuration and DistributedCache are 2 different
> approaches. I am referring based on "Hadoop: The definitive guide"
> Chapter:8 > Side Data Distribution (Page 288-295).
> As you are saying that now methods of DistributedCache moved to Job, I
> request if you please share some article or document on that for my better
> understanding, it will be great help.
>
> Thanks
> Amit
>
>
> On Thu, Jan 30, 2014 at 5:35 PM, praveenesh kumar <pr...@gmail.com>wrote:
>
>> Hi Amit,
>>
>> I am not sure how are they linked with DistributedCache.. Job
>> configuration is not uploading any data in memory.. As far as I am aware of
>> how DistributedCache works, nothing get loaded in memory. Distributed cache
>> just copies the files into slave nodes, so that they are accessible to
>> mappers/reducers. Usually the location is
>> ${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
>> distribution to distribution) You always have to read the files in your
>> mapper or reducer when ever you want to use them.
>>
>> What has happened is the method of DistributedCache class has now been
>> added to Job class, and I am assuming they won't change the functionality
>> of how distributed cache methods used to work, otherwise there would have
>> been some nice articles on that, plus I don't see any reason of changing
>> that as well too..  so everything works still the same way.. Its just that
>> you use the new Job class to use distributed cache features.
>>
>> I am not sure what entries you are exactly pointing to. Am I missing
>> anything here ?
>>
>>
>> Regards
>> Prav
>>
>>
>> On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com>wrote:
>>
>>> Hi Mike & Prav,
>>>
>>> Although I am new to Hadoop, but would like to add my 2 cents if that
>>> helps.
>>> We are having 2 ways for distribution of shared data, one is using Job
>>> configuration and other is DistributedCache.
>>> As job configuration is read by the JT, TT and child JVMs, and each time
>>> the configuration is read, all of its entries are read in memory, even if
>>> they are not used. So using job configuration is not advised if the data is
>>> more than few kilobytes. So it is not alternative to DistributedCache
>>> unless some modifications are done in Job configuration to address this
>>> limitation.
>>> So I am also curious to know the alternatative to DistributedCache class.
>>>
>>> Thanks
>>> Amit
>>>
>>>
>>>
>>> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
>>> Michael.Giordano@vistronix.com> wrote:
>>>
>>>>  I noticed that in Hadoop 2.2.0
>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>
>>>>
>>>>
>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>
>>>>
>>>>
>>>> Is there a class that provides equivalent functionality? My application
>>>> relies heavily on DistributedCache.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mike G.
>>>>
>>>> This communication, along with its attachments, is considered
>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>> distribution of information not generally known to the public is strictly
>>>> prohibited.  If you are not the intended recipient, please notify the
>>>> sender immediately.
>>>>
>>>
>>>
>>
>

Re: DistributedCache deprecated

Posted by Amit Mittal <am...@gmail.com>.
Hi Prav,

Yes, you are correct that DistributedCache does not upload file into
memory. Also using job configuration and DistributedCache are 2 different
approaches. I am referring based on "Hadoop: The definitive guide"
Chapter:8 > Side Data Distribution (Page 288-295).
As you are saying that now methods of DistributedCache moved to Job, I
request if you please share some article or document on that for my better
understanding, it will be great help.

Thanks
Amit


On Thu, Jan 30, 2014 at 5:35 PM, praveenesh kumar <pr...@gmail.com>wrote:

> Hi Amit,
>
> I am not sure how are they linked with DistributedCache.. Job
> configuration is not uploading any data in memory.. As far as I am aware of
> how DistributedCache works, nothing get loaded in memory. Distributed cache
> just copies the files into slave nodes, so that they are accessible to
> mappers/reducers. Usually the location is
> ${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
> distribution to distribution) You always have to read the files in your
> mapper or reducer when ever you want to use them.
>
> What has happened is the method of DistributedCache class has now been
> added to Job class, and I am assuming they won't change the functionality
> of how distributed cache methods used to work, otherwise there would have
> been some nice articles on that, plus I don't see any reason of changing
> that as well too..  so everything works still the same way.. Its just that
> you use the new Job class to use distributed cache features.
>
> I am not sure what entries you are exactly pointing to. Am I missing
> anything here ?
>
>
> Regards
> Prav
>
>
> On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com>wrote:
>
>> Hi Mike & Prav,
>>
>> Although I am new to Hadoop, but would like to add my 2 cents if that
>> helps.
>> We are having 2 ways for distribution of shared data, one is using Job
>> configuration and other is DistributedCache.
>> As job configuration is read by the JT, TT and child JVMs, and each time
>> the configuration is read, all of its entries are read in memory, even if
>> they are not used. So using job configuration is not advised if the data is
>> more than few kilobytes. So it is not alternative to DistributedCache
>> unless some modifications are done in Job configuration to address this
>> limitation.
>> So I am also curious to know the alternatative to DistributedCache class.
>>
>> Thanks
>> Amit
>>
>>
>>
>> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
>> Michael.Giordano@vistronix.com> wrote:
>>
>>>  I noticed that in Hadoop 2.2.0
>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>
>>>
>>>
>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>
>>>
>>>
>>> Is there a class that provides equivalent functionality? My application
>>> relies heavily on DistributedCache.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Mike G.
>>>
>>> This communication, along with its attachments, is considered
>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>> of the person(s) named above.  Note that unauthorized disclosure or
>>> distribution of information not generally known to the public is strictly
>>> prohibited.  If you are not the intended recipient, please notify the
>>> sender immediately.
>>>
>>
>>
>

Re: DistributedCache deprecated

Posted by Amit Mittal <am...@gmail.com>.
Hi Prav,

Yes, you are correct that DistributedCache does not upload file into
memory. Also using job configuration and DistributedCache are 2 different
approaches. I am referring based on "Hadoop: The definitive guide"
Chapter:8 > Side Data Distribution (Page 288-295).
As you are saying that now methods of DistributedCache moved to Job, I
request if you please share some article or document on that for my better
understanding, it will be great help.

Thanks
Amit


On Thu, Jan 30, 2014 at 5:35 PM, praveenesh kumar <pr...@gmail.com>wrote:

> Hi Amit,
>
> I am not sure how are they linked with DistributedCache.. Job
> configuration is not uploading any data in memory.. As far as I am aware of
> how DistributedCache works, nothing get loaded in memory. Distributed cache
> just copies the files into slave nodes, so that they are accessible to
> mappers/reducers. Usually the location is
> ${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
> distribution to distribution) You always have to read the files in your
> mapper or reducer when ever you want to use them.
>
> What has happened is the method of DistributedCache class has now been
> added to Job class, and I am assuming they won't change the functionality
> of how distributed cache methods used to work, otherwise there would have
> been some nice articles on that, plus I don't see any reason of changing
> that as well too..  so everything works still the same way.. Its just that
> you use the new Job class to use distributed cache features.
>
> I am not sure what entries you are exactly pointing to. Am I missing
> anything here ?
>
>
> Regards
> Prav
>
>
> On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com>wrote:
>
>> Hi Mike & Prav,
>>
>> Although I am new to Hadoop, but would like to add my 2 cents if that
>> helps.
>> We are having 2 ways for distribution of shared data, one is using Job
>> configuration and other is DistributedCache.
>> As job configuration is read by the JT, TT and child JVMs, and each time
>> the configuration is read, all of its entries are read in memory, even if
>> they are not used. So using job configuration is not advised if the data is
>> more than few kilobytes. So it is not alternative to DistributedCache
>> unless some modifications are done in Job configuration to address this
>> limitation.
>> So I am also curious to know the alternatative to DistributedCache class.
>>
>> Thanks
>> Amit
>>
>>
>>
>> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
>> Michael.Giordano@vistronix.com> wrote:
>>
>>>  I noticed that in Hadoop 2.2.0
>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>
>>>
>>>
>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>
>>>
>>>
>>> Is there a class that provides equivalent functionality? My application
>>> relies heavily on DistributedCache.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Mike G.
>>>
>>> This communication, along with its attachments, is considered
>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>> of the person(s) named above.  Note that unauthorized disclosure or
>>> distribution of information not generally known to the public is strictly
>>> prohibited.  If you are not the intended recipient, please notify the
>>> sender immediately.
>>>
>>
>>
>

Re: DistributedCache deprecated

Posted by Amit Mittal <am...@gmail.com>.
Hi Prav,

Yes, you are correct that DistributedCache does not upload file into
memory. Also using job configuration and DistributedCache are 2 different
approaches. I am referring based on "Hadoop: The definitive guide"
Chapter:8 > Side Data Distribution (Page 288-295).
As you are saying that now methods of DistributedCache moved to Job, I
request if you please share some article or document on that for my better
understanding, it will be great help.

Thanks
Amit


On Thu, Jan 30, 2014 at 5:35 PM, praveenesh kumar <pr...@gmail.com>wrote:

> Hi Amit,
>
> I am not sure how are they linked with DistributedCache.. Job
> configuration is not uploading any data in memory.. As far as I am aware of
> how DistributedCache works, nothing get loaded in memory. Distributed cache
> just copies the files into slave nodes, so that they are accessible to
> mappers/reducers. Usually the location is
> ${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
> distribution to distribution) You always have to read the files in your
> mapper or reducer when ever you want to use them.
>
> What has happened is the method of DistributedCache class has now been
> added to Job class, and I am assuming they won't change the functionality
> of how distributed cache methods used to work, otherwise there would have
> been some nice articles on that, plus I don't see any reason of changing
> that as well too..  so everything works still the same way.. Its just that
> you use the new Job class to use distributed cache features.
>
> I am not sure what entries you are exactly pointing to. Am I missing
> anything here ?
>
>
> Regards
> Prav
>
>
> On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com>wrote:
>
>> Hi Mike & Prav,
>>
>> Although I am new to Hadoop, but would like to add my 2 cents if that
>> helps.
>> We are having 2 ways for distribution of shared data, one is using Job
>> configuration and other is DistributedCache.
>> As job configuration is read by the JT, TT and child JVMs, and each time
>> the configuration is read, all of its entries are read in memory, even if
>> they are not used. So using job configuration is not advised if the data is
>> more than few kilobytes. So it is not alternative to DistributedCache
>> unless some modifications are done in Job configuration to address this
>> limitation.
>> So I am also curious to know the alternatative to DistributedCache class.
>>
>> Thanks
>> Amit
>>
>>
>>
>> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
>> Michael.Giordano@vistronix.com> wrote:
>>
>>>  I noticed that in Hadoop 2.2.0
>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>
>>>
>>>
>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>
>>>
>>>
>>> Is there a class that provides equivalent functionality? My application
>>> relies heavily on DistributedCache.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Mike G.
>>>
>>> This communication, along with its attachments, is considered
>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>> of the person(s) named above.  Note that unauthorized disclosure or
>>> distribution of information not generally known to the public is strictly
>>> prohibited.  If you are not the intended recipient, please notify the
>>> sender immediately.
>>>
>>
>>
>

Re: DistributedCache deprecated

Posted by Amit Mittal <am...@gmail.com>.
Hi Prav,

Yes, you are correct that DistributedCache does not upload file into
memory. Also using job configuration and DistributedCache are 2 different
approaches. I am referring based on "Hadoop: The definitive guide"
Chapter:8 > Side Data Distribution (Page 288-295).
As you are saying that now methods of DistributedCache moved to Job, I
request if you please share some article or document on that for my better
understanding, it will be great help.

Thanks
Amit


On Thu, Jan 30, 2014 at 5:35 PM, praveenesh kumar <pr...@gmail.com>wrote:

> Hi Amit,
>
> I am not sure how are they linked with DistributedCache.. Job
> configuration is not uploading any data in memory.. As far as I am aware of
> how DistributedCache works, nothing get loaded in memory. Distributed cache
> just copies the files into slave nodes, so that they are accessible to
> mappers/reducers. Usually the location is
> ${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
> distribution to distribution) You always have to read the files in your
> mapper or reducer when ever you want to use them.
>
> What has happened is the method of DistributedCache class has now been
> added to Job class, and I am assuming they won't change the functionality
> of how distributed cache methods used to work, otherwise there would have
> been some nice articles on that, plus I don't see any reason of changing
> that as well too..  so everything works still the same way.. Its just that
> you use the new Job class to use distributed cache features.
>
> I am not sure what entries you are exactly pointing to. Am I missing
> anything here ?
>
>
> Regards
> Prav
>
>
> On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com>wrote:
>
>> Hi Mike & Prav,
>>
>> Although I am new to Hadoop, but would like to add my 2 cents if that
>> helps.
>> We are having 2 ways for distribution of shared data, one is using Job
>> configuration and other is DistributedCache.
>> As job configuration is read by the JT, TT and child JVMs, and each time
>> the configuration is read, all of its entries are read in memory, even if
>> they are not used. So using job configuration is not advised if the data is
>> more than few kilobytes. So it is not alternative to DistributedCache
>> unless some modifications are done in Job configuration to address this
>> limitation.
>> So I am also curious to know the alternatative to DistributedCache class.
>>
>> Thanks
>> Amit
>>
>>
>>
>> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
>> Michael.Giordano@vistronix.com> wrote:
>>
>>>  I noticed that in Hadoop 2.2.0
>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>
>>>
>>>
>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>
>>>
>>>
>>> Is there a class that provides equivalent functionality? My application
>>> relies heavily on DistributedCache.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Mike G.
>>>
>>> This communication, along with its attachments, is considered
>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>> of the person(s) named above.  Note that unauthorized disclosure or
>>> distribution of information not generally known to the public is strictly
>>> prohibited.  If you are not the intended recipient, please notify the
>>> sender immediately.
>>>
>>
>>
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Amit,

I am not sure how are they linked with DistributedCache.. Job configuration
is not uploading any data in memory.. As far as I am aware of how
DistributedCache works, nothing get loaded in memory. Distributed cache
just copies the files into slave nodes, so that they are accessible to
mappers/reducers. Usually the location is
${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
distribution to distribution) You always have to read the files in your
mapper or reducer when ever you want to use them.

What has happened is the method of DistributedCache class has now been
added to Job class, and I am assuming they won't change the functionality
of how distributed cache methods used to work, otherwise there would have
been some nice articles on that, plus I don't see any reason of changing
that as well too..  so everything works still the same way.. Its just that
you use the new Job class to use distributed cache features.

I am not sure what entries you are exactly pointing to. Am I missing
anything here ?


Regards
Prav


On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com> wrote:

> Hi Mike & Prav,
>
> Although I am new to Hadoop, but would like to add my 2 cents if that
> helps.
> We are having 2 ways for distribution of shared data, one is using Job
> configuration and other is DistributedCache.
> As job configuration is read by the JT, TT and child JVMs, and each time
> the configuration is read, all of its entries are read in memory, even if
> they are not used. So using job configuration is not advised if the data is
> more than few kilobytes. So it is not alternative to DistributedCache
> unless some modifications are done in Job configuration to address this
> limitation.
> So I am also curious to know the alternatative to DistributedCache class.
>
> Thanks
> Amit
>
>
>
> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
> Michael.Giordano@vistronix.com> wrote:
>
>>  I noticed that in Hadoop 2.2.0
>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>
>>
>>
>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>
>>
>>
>> Is there a class that provides equivalent functionality? My application
>> relies heavily on DistributedCache.
>>
>>
>>
>> Thanks,
>>
>> Mike G.
>>
>> This communication, along with its attachments, is considered
>> confidential and proprietary to Vistronix.  It is intended only for the use
>> of the person(s) named above.  Note that unauthorized disclosure or
>> distribution of information not generally known to the public is strictly
>> prohibited.  If you are not the intended recipient, please notify the
>> sender immediately.
>>
>
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Amit,

I am not sure how are they linked with DistributedCache.. Job configuration
is not uploading any data in memory.. As far as I am aware of how
DistributedCache works, nothing get loaded in memory. Distributed cache
just copies the files into slave nodes, so that they are accessible to
mappers/reducers. Usually the location is
${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
distribution to distribution) You always have to read the files in your
mapper or reducer when ever you want to use them.

What has happened is the method of DistributedCache class has now been
added to Job class, and I am assuming they won't change the functionality
of how distributed cache methods used to work, otherwise there would have
been some nice articles on that, plus I don't see any reason of changing
that as well too..  so everything works still the same way.. Its just that
you use the new Job class to use distributed cache features.

I am not sure what entries you are exactly pointing to. Am I missing
anything here ?


Regards
Prav


On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com> wrote:

> Hi Mike & Prav,
>
> Although I am new to Hadoop, but would like to add my 2 cents if that
> helps.
> We are having 2 ways for distribution of shared data, one is using Job
> configuration and other is DistributedCache.
> As job configuration is read by the JT, TT and child JVMs, and each time
> the configuration is read, all of its entries are read in memory, even if
> they are not used. So using job configuration is not advised if the data is
> more than few kilobytes. So it is not alternative to DistributedCache
> unless some modifications are done in Job configuration to address this
> limitation.
> So I am also curious to know the alternatative to DistributedCache class.
>
> Thanks
> Amit
>
>
>
> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
> Michael.Giordano@vistronix.com> wrote:
>
>>  I noticed that in Hadoop 2.2.0
>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>
>>
>>
>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>
>>
>>
>> Is there a class that provides equivalent functionality? My application
>> relies heavily on DistributedCache.
>>
>>
>>
>> Thanks,
>>
>> Mike G.
>>
>> This communication, along with its attachments, is considered
>> confidential and proprietary to Vistronix.  It is intended only for the use
>> of the person(s) named above.  Note that unauthorized disclosure or
>> distribution of information not generally known to the public is strictly
>> prohibited.  If you are not the intended recipient, please notify the
>> sender immediately.
>>
>
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Amit,

I am not sure how are they linked with DistributedCache.. Job configuration
is not uploading any data in memory.. As far as I am aware of how
DistributedCache works, nothing get loaded in memory. Distributed cache
just copies the files into slave nodes, so that they are accessible to
mappers/reducers. Usually the location is
${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
distribution to distribution) You always have to read the files in your
mapper or reducer when ever you want to use them.

What has happened is the method of DistributedCache class has now been
added to Job class, and I am assuming they won't change the functionality
of how distributed cache methods used to work, otherwise there would have
been some nice articles on that, plus I don't see any reason of changing
that as well too..  so everything works still the same way.. Its just that
you use the new Job class to use distributed cache features.

I am not sure what entries you are exactly pointing to. Am I missing
anything here ?


Regards
Prav


On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com> wrote:

> Hi Mike & Prav,
>
> Although I am new to Hadoop, but would like to add my 2 cents if that
> helps.
> We are having 2 ways for distribution of shared data, one is using Job
> configuration and other is DistributedCache.
> As job configuration is read by the JT, TT and child JVMs, and each time
> the configuration is read, all of its entries are read in memory, even if
> they are not used. So using job configuration is not advised if the data is
> more than few kilobytes. So it is not alternative to DistributedCache
> unless some modifications are done in Job configuration to address this
> limitation.
> So I am also curious to know the alternatative to DistributedCache class.
>
> Thanks
> Amit
>
>
>
> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
> Michael.Giordano@vistronix.com> wrote:
>
>>  I noticed that in Hadoop 2.2.0
>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>
>>
>>
>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>
>>
>>
>> Is there a class that provides equivalent functionality? My application
>> relies heavily on DistributedCache.
>>
>>
>>
>> Thanks,
>>
>> Mike G.
>>
>> This communication, along with its attachments, is considered
>> confidential and proprietary to Vistronix.  It is intended only for the use
>> of the person(s) named above.  Note that unauthorized disclosure or
>> distribution of information not generally known to the public is strictly
>> prohibited.  If you are not the intended recipient, please notify the
>> sender immediately.
>>
>
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Amit,

I am not sure how are they linked with DistributedCache.. Job configuration
is not uploading any data in memory.. As far as I am aware of how
DistributedCache works, nothing get loaded in memory. Distributed cache
just copies the files into slave nodes, so that they are accessible to
mappers/reducers. Usually the location is
${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from
distribution to distribution) You always have to read the files in your
mapper or reducer when ever you want to use them.

What has happened is the method of DistributedCache class has now been
added to Job class, and I am assuming they won't change the functionality
of how distributed cache methods used to work, otherwise there would have
been some nice articles on that, plus I don't see any reason of changing
that as well too..  so everything works still the same way.. Its just that
you use the new Job class to use distributed cache features.

I am not sure what entries you are exactly pointing to. Am I missing
anything here ?


Regards
Prav


On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <am...@gmail.com> wrote:

> Hi Mike & Prav,
>
> Although I am new to Hadoop, but would like to add my 2 cents if that
> helps.
> We are having 2 ways for distribution of shared data, one is using Job
> configuration and other is DistributedCache.
> As job configuration is read by the JT, TT and child JVMs, and each time
> the configuration is read, all of its entries are read in memory, even if
> they are not used. So using job configuration is not advised if the data is
> more than few kilobytes. So it is not alternative to DistributedCache
> unless some modifications are done in Job configuration to address this
> limitation.
> So I am also curious to know the alternatative to DistributedCache class.
>
> Thanks
> Amit
>
>
>
> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
> Michael.Giordano@vistronix.com> wrote:
>
>>  I noticed that in Hadoop 2.2.0
>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>
>>
>>
>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>
>>
>>
>> Is there a class that provides equivalent functionality? My application
>> relies heavily on DistributedCache.
>>
>>
>>
>> Thanks,
>>
>> Mike G.
>>
>> This communication, along with its attachments, is considered
>> confidential and proprietary to Vistronix.  It is intended only for the use
>> of the person(s) named above.  Note that unauthorized disclosure or
>> distribution of information not generally known to the public is strictly
>> prohibited.  If you are not the intended recipient, please notify the
>> sender immediately.
>>
>
>

Re: DistributedCache deprecated

Posted by Amit Mittal <am...@gmail.com>.
Hi Mike & Prav,

Although I am new to Hadoop, but would like to add my 2 cents if that helps.
We are having 2 ways for distribution of shared data, one is using Job
configuration and other is DistributedCache.
As job configuration is read by the JT, TT and child JVMs, and each time
the configuration is read, all of its entries are read in memory, even if
they are not used. So using job configuration is not advised if the data is
more than few kilobytes. So it is not alternative to DistributedCache
unless some modifications are done in Job configuration to address this
limitation.
So I am also curious to know the alternatative to DistributedCache class.

Thanks
Amit


On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
Michael.Giordano@vistronix.com> wrote:

>  I noticed that in Hadoop 2.2.0
> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>
>
>
> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>
>
>
> Is there a class that provides equivalent functionality? My application
> relies heavily on DistributedCache.
>
>
>
> Thanks,
>
> Mike G.
>
> This communication, along with its attachments, is considered confidential
> and proprietary to Vistronix.  It is intended only for the use of the
> person(s) named above.  Note that unauthorized disclosure or distribution
> of information not generally known to the public is strictly
> prohibited.  If you are not the intended recipient, please notify the
> sender immediately.
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Mike,

I tried getInstance() method of Job class and it worked for me. I guess
they have make it factory class now. Sorry I also have experimented stuffs,
don't have the exact answers

static Job<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html>
*getInstance
<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getInstance%28%29>*
()----   Creates a new
Job<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html>with
no particular
Cluster<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Cluster.html>.


Regards
Prav


On Wed, Jan 29, 2014 at 10:53 PM, Giordano, Michael <
Michael.Giordano@vistronix.com> wrote:

>  Prav,
>
>
>
> Thank you for the prompt answer. I see the methods on the job class and
> this does make sense.
>
>
>
> Unfortunately something else has me confused. It seems as though all of
> the Job() constructors have also been marked deprecated.
>
>
>
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html
>
>
>
> How do you create a new Job instance? Is there a factory class?
>
>
>
> Thanks,
>
> Mike G.
>
>  ------------------------------
> *From:* praveenesh kumar <pr...@gmail.com>
> *Sent:* Wednesday, January 29, 2014 4:41 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: DistributedCache deprecated
>
>    @Jay - I don't know how Job class is replacing the DistributedCache
> class , but I remember trying distributed cache functions like
>
>    void *addArchiveToClassPath
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
> (Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
>  archive)
>           Add an archive path to the current set of classpath entries.
>  void *addCacheArchive
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a archives to be localized   void *addCacheFile
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a file to be localized
>
>  and it works fine. The same way you were using DC before.. Well I am not
> sure what would be the best answer, but if you are trying to use DC , I was
> able to do it with Job class itself.
>
>  Regards
>  Prav
>
>
> On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:
>
>>  Thanks for asking this : Im not sure and didnt realize this until you
>> mentioned it!
>>
>> 1) Prav:  You are implying that we would use the "Job" Class... but how
>> could it replace the DC?
>>
>> 2) The point of the DC is to replicate a file so that its present and
>> local on ALL nodes.   I didnt know it was deprecated, but there must be
>> some replacement for it - or maybe it just got renamed and moved?
>>
>>  SO ... what is the future of the DistributedCache for mapreduce jobs?
>>
>>
>> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>
>>>  I think you can use the Job class.
>>>
>>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>>
>>>  Regards
>>>  Prav
>>>
>>>
>>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>>> Michael.Giordano@vistronix.com> wrote:
>>>
>>>>  I noticed that in Hadoop 2.2.0
>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>
>>>>
>>>>
>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>
>>>>
>>>>
>>>> Is there a class that provides equivalent functionality? My application
>>>> relies heavily on DistributedCache.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mike G.
>>>>
>>>> This communication, along with its attachments, is considered
>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>> distribution of information not generally known to the public is strictly
>>>> prohibited.  If you are not the intended recipient, please notify the
>>>> sender immediately.
>>>>
>>>
>>>
>>
>>
>>  --
>> Jay Vyas
>> http://jayunit100.blogspot.com
>>
>
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Mike,

I tried getInstance() method of Job class and it worked for me. I guess
they have make it factory class now. Sorry I also have experimented stuffs,
don't have the exact answers

static Job<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html>
*getInstance
<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getInstance%28%29>*
()----   Creates a new
Job<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html>with
no particular
Cluster<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Cluster.html>.


Regards
Prav


On Wed, Jan 29, 2014 at 10:53 PM, Giordano, Michael <
Michael.Giordano@vistronix.com> wrote:

>  Prav,
>
>
>
> Thank you for the prompt answer. I see the methods on the job class and
> this does make sense.
>
>
>
> Unfortunately something else has me confused. It seems as though all of
> the Job() constructors have also been marked deprecated.
>
>
>
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html
>
>
>
> How do you create a new Job instance? Is there a factory class?
>
>
>
> Thanks,
>
> Mike G.
>
>  ------------------------------
> *From:* praveenesh kumar <pr...@gmail.com>
> *Sent:* Wednesday, January 29, 2014 4:41 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: DistributedCache deprecated
>
>    @Jay - I don't know how Job class is replacing the DistributedCache
> class , but I remember trying distributed cache functions like
>
>    void *addArchiveToClassPath
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
> (Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
>  archive)
>           Add an archive path to the current set of classpath entries.
>  void *addCacheArchive
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a archives to be localized   void *addCacheFile
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a file to be localized
>
>  and it works fine. The same way you were using DC before.. Well I am not
> sure what would be the best answer, but if you are trying to use DC , I was
> able to do it with Job class itself.
>
>  Regards
>  Prav
>
>
> On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:
>
>>  Thanks for asking this : Im not sure and didnt realize this until you
>> mentioned it!
>>
>> 1) Prav:  You are implying that we would use the "Job" Class... but how
>> could it replace the DC?
>>
>> 2) The point of the DC is to replicate a file so that its present and
>> local on ALL nodes.   I didnt know it was deprecated, but there must be
>> some replacement for it - or maybe it just got renamed and moved?
>>
>>  SO ... what is the future of the DistributedCache for mapreduce jobs?
>>
>>
>> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>
>>>  I think you can use the Job class.
>>>
>>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>>
>>>  Regards
>>>  Prav
>>>
>>>
>>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>>> Michael.Giordano@vistronix.com> wrote:
>>>
>>>>  I noticed that in Hadoop 2.2.0
>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>
>>>>
>>>>
>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>
>>>>
>>>>
>>>> Is there a class that provides equivalent functionality? My application
>>>> relies heavily on DistributedCache.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mike G.
>>>>
>>>> This communication, along with its attachments, is considered
>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>> distribution of information not generally known to the public is strictly
>>>> prohibited.  If you are not the intended recipient, please notify the
>>>> sender immediately.
>>>>
>>>
>>>
>>
>>
>>  --
>> Jay Vyas
>> http://jayunit100.blogspot.com
>>
>
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Mike,

I tried getInstance() method of Job class and it worked for me. I guess
they have make it factory class now. Sorry I also have experimented stuffs,
don't have the exact answers

static Job<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html>
*getInstance
<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getInstance%28%29>*
()----   Creates a new
Job<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html>with
no particular
Cluster<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Cluster.html>.


Regards
Prav


On Wed, Jan 29, 2014 at 10:53 PM, Giordano, Michael <
Michael.Giordano@vistronix.com> wrote:

>  Prav,
>
>
>
> Thank you for the prompt answer. I see the methods on the job class and
> this does make sense.
>
>
>
> Unfortunately something else has me confused. It seems as though all of
> the Job() constructors have also been marked deprecated.
>
>
>
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html
>
>
>
> How do you create a new Job instance? Is there a factory class?
>
>
>
> Thanks,
>
> Mike G.
>
>  ------------------------------
> *From:* praveenesh kumar <pr...@gmail.com>
> *Sent:* Wednesday, January 29, 2014 4:41 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: DistributedCache deprecated
>
>    @Jay - I don't know how Job class is replacing the DistributedCache
> class , but I remember trying distributed cache functions like
>
>    void *addArchiveToClassPath
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
> (Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
>  archive)
>           Add an archive path to the current set of classpath entries.
>  void *addCacheArchive
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a archives to be localized   void *addCacheFile
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a file to be localized
>
>  and it works fine. The same way you were using DC before.. Well I am not
> sure what would be the best answer, but if you are trying to use DC , I was
> able to do it with Job class itself.
>
>  Regards
>  Prav
>
>
> On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:
>
>>  Thanks for asking this : Im not sure and didnt realize this until you
>> mentioned it!
>>
>> 1) Prav:  You are implying that we would use the "Job" Class... but how
>> could it replace the DC?
>>
>> 2) The point of the DC is to replicate a file so that its present and
>> local on ALL nodes.   I didnt know it was deprecated, but there must be
>> some replacement for it - or maybe it just got renamed and moved?
>>
>>  SO ... what is the future of the DistributedCache for mapreduce jobs?
>>
>>
>> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>
>>>  I think you can use the Job class.
>>>
>>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>>
>>>  Regards
>>>  Prav
>>>
>>>
>>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>>> Michael.Giordano@vistronix.com> wrote:
>>>
>>>>  I noticed that in Hadoop 2.2.0
>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>
>>>>
>>>>
>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>
>>>>
>>>>
>>>> Is there a class that provides equivalent functionality? My application
>>>> relies heavily on DistributedCache.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mike G.
>>>>
>>>> This communication, along with its attachments, is considered
>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>> distribution of information not generally known to the public is strictly
>>>> prohibited.  If you are not the intended recipient, please notify the
>>>> sender immediately.
>>>>
>>>
>>>
>>
>>
>>  --
>> Jay Vyas
>> http://jayunit100.blogspot.com
>>
>
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Mike,

I tried getInstance() method of Job class and it worked for me. I guess
they have make it factory class now. Sorry I also have experimented stuffs,
don't have the exact answers

static Job<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html>
*getInstance
<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getInstance%28%29>*
()----   Creates a new
Job<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html>with
no particular
Cluster<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Cluster.html>.


Regards
Prav


On Wed, Jan 29, 2014 at 10:53 PM, Giordano, Michael <
Michael.Giordano@vistronix.com> wrote:

>  Prav,
>
>
>
> Thank you for the prompt answer. I see the methods on the job class and
> this does make sense.
>
>
>
> Unfortunately something else has me confused. It seems as though all of
> the Job() constructors have also been marked deprecated.
>
>
>
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html
>
>
>
> How do you create a new Job instance? Is there a factory class?
>
>
>
> Thanks,
>
> Mike G.
>
>  ------------------------------
> *From:* praveenesh kumar <pr...@gmail.com>
> *Sent:* Wednesday, January 29, 2014 4:41 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: DistributedCache deprecated
>
>    @Jay - I don't know how Job class is replacing the DistributedCache
> class , but I remember trying distributed cache functions like
>
>    void *addArchiveToClassPath
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
> (Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
>  archive)
>           Add an archive path to the current set of classpath entries.
>  void *addCacheArchive
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a archives to be localized   void *addCacheFile
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a file to be localized
>
>  and it works fine. The same way you were using DC before.. Well I am not
> sure what would be the best answer, but if you are trying to use DC , I was
> able to do it with Job class itself.
>
>  Regards
>  Prav
>
>
> On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:
>
>>  Thanks for asking this : Im not sure and didnt realize this until you
>> mentioned it!
>>
>> 1) Prav:  You are implying that we would use the "Job" Class... but how
>> could it replace the DC?
>>
>> 2) The point of the DC is to replicate a file so that its present and
>> local on ALL nodes.   I didnt know it was deprecated, but there must be
>> some replacement for it - or maybe it just got renamed and moved?
>>
>>  SO ... what is the future of the DistributedCache for mapreduce jobs?
>>
>>
>> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>
>>>  I think you can use the Job class.
>>>
>>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>>
>>>  Regards
>>>  Prav
>>>
>>>
>>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>>> Michael.Giordano@vistronix.com> wrote:
>>>
>>>>  I noticed that in Hadoop 2.2.0
>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>
>>>>
>>>>
>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>
>>>>
>>>>
>>>> Is there a class that provides equivalent functionality? My application
>>>> relies heavily on DistributedCache.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mike G.
>>>>
>>>> This communication, along with its attachments, is considered
>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>> distribution of information not generally known to the public is strictly
>>>> prohibited.  If you are not the intended recipient, please notify the
>>>> sender immediately.
>>>>
>>>
>>>
>>
>>
>>  --
>> Jay Vyas
>> http://jayunit100.blogspot.com
>>
>
>

RE: DistributedCache deprecated

Posted by "Giordano, Michael" <Mi...@Vistronix.com>.
Prav,



Thank you for the prompt answer. I see the methods on the job class and this does make sense.



Unfortunately something else has me confused. It seems as though all of the Job() constructors have also been marked deprecated.



http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html



How do you create a new Job instance? Is there a factory class?



Thanks,

Mike G.

________________________________
From: praveenesh kumar <pr...@gmail.com>
Sent: Wednesday, January 29, 2014 4:41 PM
To: user@hadoop.apache.org
Subject: Re: DistributedCache deprecated

@Jay - I don't know how Job class is replacing the DistributedCache class , but I remember trying distributed cache functions like


 void   addArchiveToClassPath<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>(Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html> archive)
          Add an archive path to the current set of classpath entries.
 void   addCacheArchive<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true> uri)
          Add a archives to be localized
 void   addCacheFile<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true> uri)
          Add a file to be localized


and it works fine. The same way you were using DC before.. Well I am not sure what would be the best answer, but if you are trying to use DC , I was able to do it with Job class itself.

Regards
Prav


On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com>> wrote:
Thanks for asking this : Im not sure and didnt realize this until you mentioned it!

1) Prav:  You are implying that we would use the "Job" Class... but how could it replace the DC?

2) The point of the DC is to replicate a file so that its present and local on ALL nodes.   I didnt know it was deprecated, but there must be some replacement for it - or maybe it just got renamed and moved?

SO ... what is the future of the DistributedCache for mapreduce jobs?


On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>> wrote:
I think you can use the Job class.
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html

Regards
Prav


On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <Mi...@vistronix.com>> wrote:

I noticed that in Hadoop 2.2.0 org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.



(http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)



Is there a class that provides equivalent functionality? My application relies heavily on DistributedCache.



Thanks,

Mike G.

This communication, along with its attachments, is considered confidential and proprietary to Vistronix.  It is intended only for the use of the person(s) named above.  Note that unauthorized disclosure or distribution of information not generally known to the public is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately.




--
Jay Vyas
http://jayunit100.blogspot.com


RE: DistributedCache deprecated

Posted by "Giordano, Michael" <Mi...@Vistronix.com>.
Prav,



Thank you for the prompt answer. I see the methods on the job class and this does make sense.



Unfortunately something else has me confused. It seems as though all of the Job() constructors have also been marked deprecated.



http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html



How do you create a new Job instance? Is there a factory class?



Thanks,

Mike G.

________________________________
From: praveenesh kumar <pr...@gmail.com>
Sent: Wednesday, January 29, 2014 4:41 PM
To: user@hadoop.apache.org
Subject: Re: DistributedCache deprecated

@Jay - I don't know how Job class is replacing the DistributedCache class , but I remember trying distributed cache functions like


 void   addArchiveToClassPath<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>(Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html> archive)
          Add an archive path to the current set of classpath entries.
 void   addCacheArchive<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true> uri)
          Add a archives to be localized
 void   addCacheFile<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true> uri)
          Add a file to be localized


and it works fine. The same way you were using DC before.. Well I am not sure what would be the best answer, but if you are trying to use DC , I was able to do it with Job class itself.

Regards
Prav


On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com>> wrote:
Thanks for asking this : Im not sure and didnt realize this until you mentioned it!

1) Prav:  You are implying that we would use the "Job" Class... but how could it replace the DC?

2) The point of the DC is to replicate a file so that its present and local on ALL nodes.   I didnt know it was deprecated, but there must be some replacement for it - or maybe it just got renamed and moved?

SO ... what is the future of the DistributedCache for mapreduce jobs?


On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>> wrote:
I think you can use the Job class.
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html

Regards
Prav


On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <Mi...@vistronix.com>> wrote:

I noticed that in Hadoop 2.2.0 org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.



(http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)



Is there a class that provides equivalent functionality? My application relies heavily on DistributedCache.



Thanks,

Mike G.

This communication, along with its attachments, is considered confidential and proprietary to Vistronix.  It is intended only for the use of the person(s) named above.  Note that unauthorized disclosure or distribution of information not generally known to the public is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately.




--
Jay Vyas
http://jayunit100.blogspot.com


Re: DistributedCache deprecated

Posted by Jay Vyas <ja...@gmail.com>.
gotcha this makes sense


On Wed, Jan 29, 2014 at 4:44 PM, praveenesh kumar <pr...@gmail.com>wrote:

> @Jay - Plus if you see DistributedCache class, these methods have been
> added inside the Job class, I am guessing they have kept the functionality
> same, just merged DistributedCache class into Job class itself. giving more
> methods for developers with less classes to worry about, thus simplifying
> the API. I hope that makes sense.
>
> Regards
> Prav
>
>
> On Wed, Jan 29, 2014 at 9:41 PM, praveenesh kumar <pr...@gmail.com>wrote:
>
>> @Jay - I don't know how Job class is replacing the DistributedCache class
>> , but I remember trying distributed cache functions like
>>
>>    void *addArchiveToClassPath
>> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
>> (Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
>>  archive)
>>           Add an archive path to the current set of classpath entries.
>>  void *addCacheArchive
>> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
>> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>>  uri)
>>           Add a archives to be localized   void *addCacheFile
>> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
>> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>>  uri)
>>           Add a file to be localized
>>
>> and it works fine. The same way you were using DC before.. Well I am not
>> sure what would be the best answer, but if you are trying to use DC , I was
>> able to do it with Job class itself.
>>
>> Regards
>> Prav
>>
>>
>> On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:
>>
>>> Thanks for asking this : Im not sure and didnt realize this until you
>>> mentioned it!
>>>
>>> 1) Prav:  You are implying that we would use the "Job" Class... but how
>>> could it replace the DC?
>>>
>>> 2) The point of the DC is to replicate a file so that its present and
>>> local on ALL nodes.   I didnt know it was deprecated, but there must be
>>> some replacement for it - or maybe it just got renamed and moved?
>>>
>>> SO ... what is the future of the DistributedCache for mapreduce jobs?
>>>
>>>
>>> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>>
>>>> I think you can use the Job class.
>>>>
>>>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>>>
>>>> Regards
>>>> Prav
>>>>
>>>>
>>>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>>>> Michael.Giordano@vistronix.com> wrote:
>>>>
>>>>>  I noticed that in Hadoop 2.2.0
>>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>>
>>>>>
>>>>>
>>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>>
>>>>>
>>>>>
>>>>> Is there a class that provides equivalent functionality? My
>>>>> application relies heavily on DistributedCache.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mike G.
>>>>>
>>>>> This communication, along with its attachments, is considered
>>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>>> distribution of information not generally known to the public is strictly
>>>>> prohibited.  If you are not the intended recipient, please notify the
>>>>> sender immediately.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Jay Vyas
>>> http://jayunit100.blogspot.com
>>>
>>
>>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: DistributedCache deprecated

Posted by Jay Vyas <ja...@gmail.com>.
gotcha this makes sense


On Wed, Jan 29, 2014 at 4:44 PM, praveenesh kumar <pr...@gmail.com>wrote:

> @Jay - Plus if you see DistributedCache class, these methods have been
> added inside the Job class, I am guessing they have kept the functionality
> same, just merged DistributedCache class into Job class itself. giving more
> methods for developers with less classes to worry about, thus simplifying
> the API. I hope that makes sense.
>
> Regards
> Prav
>
>
> On Wed, Jan 29, 2014 at 9:41 PM, praveenesh kumar <pr...@gmail.com>wrote:
>
>> @Jay - I don't know how Job class is replacing the DistributedCache class
>> , but I remember trying distributed cache functions like
>>
>>    void *addArchiveToClassPath
>> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
>> (Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
>>  archive)
>>           Add an archive path to the current set of classpath entries.
>>  void *addCacheArchive
>> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
>> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>>  uri)
>>           Add a archives to be localized   void *addCacheFile
>> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
>> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>>  uri)
>>           Add a file to be localized
>>
>> and it works fine. The same way you were using DC before.. Well I am not
>> sure what would be the best answer, but if you are trying to use DC , I was
>> able to do it with Job class itself.
>>
>> Regards
>> Prav
>>
>>
>> On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:
>>
>>> Thanks for asking this : Im not sure and didnt realize this until you
>>> mentioned it!
>>>
>>> 1) Prav:  You are implying that we would use the "Job" Class... but how
>>> could it replace the DC?
>>>
>>> 2) The point of the DC is to replicate a file so that its present and
>>> local on ALL nodes.   I didnt know it was deprecated, but there must be
>>> some replacement for it - or maybe it just got renamed and moved?
>>>
>>> SO ... what is the future of the DistributedCache for mapreduce jobs?
>>>
>>>
>>> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>>
>>>> I think you can use the Job class.
>>>>
>>>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>>>
>>>> Regards
>>>> Prav
>>>>
>>>>
>>>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>>>> Michael.Giordano@vistronix.com> wrote:
>>>>
>>>>>  I noticed that in Hadoop 2.2.0
>>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>>
>>>>>
>>>>>
>>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>>
>>>>>
>>>>>
>>>>> Is there a class that provides equivalent functionality? My
>>>>> application relies heavily on DistributedCache.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mike G.
>>>>>
>>>>> This communication, along with its attachments, is considered
>>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>>> distribution of information not generally known to the public is strictly
>>>>> prohibited.  If you are not the intended recipient, please notify the
>>>>> sender immediately.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Jay Vyas
>>> http://jayunit100.blogspot.com
>>>
>>
>>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: DistributedCache deprecated

Posted by Jay Vyas <ja...@gmail.com>.
gotcha this makes sense


On Wed, Jan 29, 2014 at 4:44 PM, praveenesh kumar <pr...@gmail.com>wrote:

> @Jay - Plus if you see DistributedCache class, these methods have been
> added inside the Job class, I am guessing they have kept the functionality
> same, just merged DistributedCache class into Job class itself. giving more
> methods for developers with less classes to worry about, thus simplifying
> the API. I hope that makes sense.
>
> Regards
> Prav
>
>
> On Wed, Jan 29, 2014 at 9:41 PM, praveenesh kumar <pr...@gmail.com>wrote:
>
>> @Jay - I don't know how Job class is replacing the DistributedCache class
>> , but I remember trying distributed cache functions like
>>
>>    void *addArchiveToClassPath
>> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
>> (Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
>>  archive)
>>           Add an archive path to the current set of classpath entries.
>>  void *addCacheArchive
>> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
>> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>>  uri)
>>           Add a archives to be localized   void *addCacheFile
>> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
>> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>>  uri)
>>           Add a file to be localized
>>
>> and it works fine. The same way you were using DC before.. Well I am not
>> sure what would be the best answer, but if you are trying to use DC , I was
>> able to do it with Job class itself.
>>
>> Regards
>> Prav
>>
>>
>> On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:
>>
>>> Thanks for asking this : Im not sure and didnt realize this until you
>>> mentioned it!
>>>
>>> 1) Prav:  You are implying that we would use the "Job" Class... but how
>>> could it replace the DC?
>>>
>>> 2) The point of the DC is to replicate a file so that its present and
>>> local on ALL nodes.   I didnt know it was deprecated, but there must be
>>> some replacement for it - or maybe it just got renamed and moved?
>>>
>>> SO ... what is the future of the DistributedCache for mapreduce jobs?
>>>
>>>
>>> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>>
>>>> I think you can use the Job class.
>>>>
>>>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>>>
>>>> Regards
>>>> Prav
>>>>
>>>>
>>>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>>>> Michael.Giordano@vistronix.com> wrote:
>>>>
>>>>>  I noticed that in Hadoop 2.2.0
>>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>>
>>>>>
>>>>>
>>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>>
>>>>>
>>>>>
>>>>> Is there a class that provides equivalent functionality? My
>>>>> application relies heavily on DistributedCache.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mike G.
>>>>>
>>>>> This communication, along with its attachments, is considered
>>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>>> distribution of information not generally known to the public is strictly
>>>>> prohibited.  If you are not the intended recipient, please notify the
>>>>> sender immediately.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Jay Vyas
>>> http://jayunit100.blogspot.com
>>>
>>
>>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: DistributedCache deprecated

Posted by Jay Vyas <ja...@gmail.com>.
gotcha this makes sense


On Wed, Jan 29, 2014 at 4:44 PM, praveenesh kumar <pr...@gmail.com>wrote:

> @Jay - Plus if you see DistributedCache class, these methods have been
> added inside the Job class, I am guessing they have kept the functionality
> same, just merged DistributedCache class into Job class itself. giving more
> methods for developers with less classes to worry about, thus simplifying
> the API. I hope that makes sense.
>
> Regards
> Prav
>
>
> On Wed, Jan 29, 2014 at 9:41 PM, praveenesh kumar <pr...@gmail.com>wrote:
>
>> @Jay - I don't know how Job class is replacing the DistributedCache class
>> , but I remember trying distributed cache functions like
>>
>>    void *addArchiveToClassPath
>> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
>> (Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
>>  archive)
>>           Add an archive path to the current set of classpath entries.
>>  void *addCacheArchive
>> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
>> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>>  uri)
>>           Add a archives to be localized   void *addCacheFile
>> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
>> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>>  uri)
>>           Add a file to be localized
>>
>> and it works fine. The same way you were using DC before.. Well I am not
>> sure what would be the best answer, but if you are trying to use DC , I was
>> able to do it with Job class itself.
>>
>> Regards
>> Prav
>>
>>
>> On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:
>>
>>> Thanks for asking this : Im not sure and didnt realize this until you
>>> mentioned it!
>>>
>>> 1) Prav:  You are implying that we would use the "Job" Class... but how
>>> could it replace the DC?
>>>
>>> 2) The point of the DC is to replicate a file so that its present and
>>> local on ALL nodes.   I didnt know it was deprecated, but there must be
>>> some replacement for it - or maybe it just got renamed and moved?
>>>
>>> SO ... what is the future of the DistributedCache for mapreduce jobs?
>>>
>>>
>>> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>>
>>>> I think you can use the Job class.
>>>>
>>>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>>>
>>>> Regards
>>>> Prav
>>>>
>>>>
>>>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>>>> Michael.Giordano@vistronix.com> wrote:
>>>>
>>>>>  I noticed that in Hadoop 2.2.0
>>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>>
>>>>>
>>>>>
>>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>>
>>>>>
>>>>>
>>>>> Is there a class that provides equivalent functionality? My
>>>>> application relies heavily on DistributedCache.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mike G.
>>>>>
>>>>> This communication, along with its attachments, is considered
>>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>>> distribution of information not generally known to the public is strictly
>>>>> prohibited.  If you are not the intended recipient, please notify the
>>>>> sender immediately.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Jay Vyas
>>> http://jayunit100.blogspot.com
>>>
>>
>>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
@Jay - Plus if you see DistributedCache class, these methods have been
added inside the Job class, I am guessing they have kept the functionality
same, just merged DistributedCache class into Job class itself. giving more
methods for developers with less classes to worry about, thus simplifying
the API. I hope that makes sense.

Regards
Prav


On Wed, Jan 29, 2014 at 9:41 PM, praveenesh kumar <pr...@gmail.com>wrote:

> @Jay - I don't know how Job class is replacing the DistributedCache class
> , but I remember trying distributed cache functions like
>
>    void *addArchiveToClassPath
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
> (Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
>  archive)
>           Add an archive path to the current set of classpath entries.
>  void *addCacheArchive
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a archives to be localized   void *addCacheFile
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a file to be localized
>
> and it works fine. The same way you were using DC before.. Well I am not
> sure what would be the best answer, but if you are trying to use DC , I was
> able to do it with Job class itself.
>
> Regards
> Prav
>
>
> On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:
>
>> Thanks for asking this : Im not sure and didnt realize this until you
>> mentioned it!
>>
>> 1) Prav:  You are implying that we would use the "Job" Class... but how
>> could it replace the DC?
>>
>> 2) The point of the DC is to replicate a file so that its present and
>> local on ALL nodes.   I didnt know it was deprecated, but there must be
>> some replacement for it - or maybe it just got renamed and moved?
>>
>> SO ... what is the future of the DistributedCache for mapreduce jobs?
>>
>>
>> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>
>>> I think you can use the Job class.
>>>
>>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>>
>>> Regards
>>> Prav
>>>
>>>
>>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>>> Michael.Giordano@vistronix.com> wrote:
>>>
>>>>  I noticed that in Hadoop 2.2.0
>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>
>>>>
>>>>
>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>
>>>>
>>>>
>>>> Is there a class that provides equivalent functionality? My application
>>>> relies heavily on DistributedCache.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mike G.
>>>>
>>>> This communication, along with its attachments, is considered
>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>> distribution of information not generally known to the public is strictly
>>>> prohibited.  If you are not the intended recipient, please notify the
>>>> sender immediately.
>>>>
>>>
>>>
>>
>>
>> --
>> Jay Vyas
>> http://jayunit100.blogspot.com
>>
>
>

RE: DistributedCache deprecated

Posted by "Giordano, Michael" <Mi...@Vistronix.com>.
Prav,



Thank you for the prompt answer. I see the methods on the job class and this does make sense.



Unfortunately something else has me confused. It seems as though all of the Job() constructors have also been marked deprecated.



http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html



How do you create a new Job instance? Is there a factory class?



Thanks,

Mike G.

________________________________
From: praveenesh kumar <pr...@gmail.com>
Sent: Wednesday, January 29, 2014 4:41 PM
To: user@hadoop.apache.org
Subject: Re: DistributedCache deprecated

@Jay - I don't know how Job class is replacing the DistributedCache class , but I remember trying distributed cache functions like


 void   addArchiveToClassPath<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>(Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html> archive)
          Add an archive path to the current set of classpath entries.
 void   addCacheArchive<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true> uri)
          Add a archives to be localized
 void   addCacheFile<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true> uri)
          Add a file to be localized


and it works fine. The same way you were using DC before.. Well I am not sure what would be the best answer, but if you are trying to use DC , I was able to do it with Job class itself.

Regards
Prav


On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com>> wrote:
Thanks for asking this : Im not sure and didnt realize this until you mentioned it!

1) Prav:  You are implying that we would use the "Job" Class... but how could it replace the DC?

2) The point of the DC is to replicate a file so that its present and local on ALL nodes.   I didnt know it was deprecated, but there must be some replacement for it - or maybe it just got renamed and moved?

SO ... what is the future of the DistributedCache for mapreduce jobs?


On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>> wrote:
I think you can use the Job class.
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html

Regards
Prav


On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <Mi...@vistronix.com>> wrote:

I noticed that in Hadoop 2.2.0 org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.



(http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)



Is there a class that provides equivalent functionality? My application relies heavily on DistributedCache.



Thanks,

Mike G.

This communication, along with its attachments, is considered confidential and proprietary to Vistronix.  It is intended only for the use of the person(s) named above.  Note that unauthorized disclosure or distribution of information not generally known to the public is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately.




--
Jay Vyas
http://jayunit100.blogspot.com


RE: DistributedCache deprecated

Posted by "Giordano, Michael" <Mi...@Vistronix.com>.
Prav,



Thank you for the prompt answer. I see the methods on the job class and this does make sense.



Unfortunately something else has me confused. It seems as though all of the Job() constructors have also been marked deprecated.



http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html



How do you create a new Job instance? Is there a factory class?



Thanks,

Mike G.

________________________________
From: praveenesh kumar <pr...@gmail.com>
Sent: Wednesday, January 29, 2014 4:41 PM
To: user@hadoop.apache.org
Subject: Re: DistributedCache deprecated

@Jay - I don't know how Job class is replacing the DistributedCache class , but I remember trying distributed cache functions like


 void   addArchiveToClassPath<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>(Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html> archive)
          Add an archive path to the current set of classpath entries.
 void   addCacheArchive<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true> uri)
          Add a archives to be localized
 void   addCacheFile<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true> uri)
          Add a file to be localized


and it works fine. The same way you were using DC before.. Well I am not sure what would be the best answer, but if you are trying to use DC , I was able to do it with Job class itself.

Regards
Prav


On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com>> wrote:
Thanks for asking this : Im not sure and didnt realize this until you mentioned it!

1) Prav:  You are implying that we would use the "Job" Class... but how could it replace the DC?

2) The point of the DC is to replicate a file so that its present and local on ALL nodes.   I didnt know it was deprecated, but there must be some replacement for it - or maybe it just got renamed and moved?

SO ... what is the future of the DistributedCache for mapreduce jobs?


On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>> wrote:
I think you can use the Job class.
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html

Regards
Prav


On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <Mi...@vistronix.com>> wrote:

I noticed that in Hadoop 2.2.0 org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.



(http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)



Is there a class that provides equivalent functionality? My application relies heavily on DistributedCache.



Thanks,

Mike G.

This communication, along with its attachments, is considered confidential and proprietary to Vistronix.  It is intended only for the use of the person(s) named above.  Note that unauthorized disclosure or distribution of information not generally known to the public is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately.




--
Jay Vyas
http://jayunit100.blogspot.com


Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
@Jay - Plus if you see DistributedCache class, these methods have been
added inside the Job class, I am guessing they have kept the functionality
same, just merged DistributedCache class into Job class itself. giving more
methods for developers with less classes to worry about, thus simplifying
the API. I hope that makes sense.

Regards
Prav


On Wed, Jan 29, 2014 at 9:41 PM, praveenesh kumar <pr...@gmail.com>wrote:

> @Jay - I don't know how Job class is replacing the DistributedCache class
> , but I remember trying distributed cache functions like
>
>    void *addArchiveToClassPath
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
> (Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
>  archive)
>           Add an archive path to the current set of classpath entries.
>  void *addCacheArchive
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a archives to be localized   void *addCacheFile
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a file to be localized
>
> and it works fine. The same way you were using DC before.. Well I am not
> sure what would be the best answer, but if you are trying to use DC , I was
> able to do it with Job class itself.
>
> Regards
> Prav
>
>
> On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:
>
>> Thanks for asking this : Im not sure and didnt realize this until you
>> mentioned it!
>>
>> 1) Prav:  You are implying that we would use the "Job" Class... but how
>> could it replace the DC?
>>
>> 2) The point of the DC is to replicate a file so that its present and
>> local on ALL nodes.   I didnt know it was deprecated, but there must be
>> some replacement for it - or maybe it just got renamed and moved?
>>
>> SO ... what is the future of the DistributedCache for mapreduce jobs?
>>
>>
>> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>
>>> I think you can use the Job class.
>>>
>>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>>
>>> Regards
>>> Prav
>>>
>>>
>>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>>> Michael.Giordano@vistronix.com> wrote:
>>>
>>>>  I noticed that in Hadoop 2.2.0
>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>
>>>>
>>>>
>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>
>>>>
>>>>
>>>> Is there a class that provides equivalent functionality? My application
>>>> relies heavily on DistributedCache.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mike G.
>>>>
>>>> This communication, along with its attachments, is considered
>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>> distribution of information not generally known to the public is strictly
>>>> prohibited.  If you are not the intended recipient, please notify the
>>>> sender immediately.
>>>>
>>>
>>>
>>
>>
>> --
>> Jay Vyas
>> http://jayunit100.blogspot.com
>>
>
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
@Jay - Plus if you see DistributedCache class, these methods have been
added inside the Job class, I am guessing they have kept the functionality
same, just merged DistributedCache class into Job class itself. giving more
methods for developers with less classes to worry about, thus simplifying
the API. I hope that makes sense.

Regards
Prav


On Wed, Jan 29, 2014 at 9:41 PM, praveenesh kumar <pr...@gmail.com>wrote:

> @Jay - I don't know how Job class is replacing the DistributedCache class
> , but I remember trying distributed cache functions like
>
>    void *addArchiveToClassPath
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
> (Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
>  archive)
>           Add an archive path to the current set of classpath entries.
>  void *addCacheArchive
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a archives to be localized   void *addCacheFile
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a file to be localized
>
> and it works fine. The same way you were using DC before.. Well I am not
> sure what would be the best answer, but if you are trying to use DC , I was
> able to do it with Job class itself.
>
> Regards
> Prav
>
>
> On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:
>
>> Thanks for asking this : Im not sure and didnt realize this until you
>> mentioned it!
>>
>> 1) Prav:  You are implying that we would use the "Job" Class... but how
>> could it replace the DC?
>>
>> 2) The point of the DC is to replicate a file so that its present and
>> local on ALL nodes.   I didnt know it was deprecated, but there must be
>> some replacement for it - or maybe it just got renamed and moved?
>>
>> SO ... what is the future of the DistributedCache for mapreduce jobs?
>>
>>
>> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>
>>> I think you can use the Job class.
>>>
>>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>>
>>> Regards
>>> Prav
>>>
>>>
>>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>>> Michael.Giordano@vistronix.com> wrote:
>>>
>>>>  I noticed that in Hadoop 2.2.0
>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>
>>>>
>>>>
>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>
>>>>
>>>>
>>>> Is there a class that provides equivalent functionality? My application
>>>> relies heavily on DistributedCache.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mike G.
>>>>
>>>> This communication, along with its attachments, is considered
>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>> distribution of information not generally known to the public is strictly
>>>> prohibited.  If you are not the intended recipient, please notify the
>>>> sender immediately.
>>>>
>>>
>>>
>>
>>
>> --
>> Jay Vyas
>> http://jayunit100.blogspot.com
>>
>
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
@Jay - Plus if you see DistributedCache class, these methods have been
added inside the Job class, I am guessing they have kept the functionality
same, just merged DistributedCache class into Job class itself. giving more
methods for developers with less classes to worry about, thus simplifying
the API. I hope that makes sense.

Regards
Prav


On Wed, Jan 29, 2014 at 9:41 PM, praveenesh kumar <pr...@gmail.com>wrote:

> @Jay - I don't know how Job class is replacing the DistributedCache class
> , but I remember trying distributed cache functions like
>
>    void *addArchiveToClassPath
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
> (Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
>  archive)
>           Add an archive path to the current set of classpath entries.
>  void *addCacheArchive
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a archives to be localized   void *addCacheFile
> <http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
> (URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
>  uri)
>           Add a file to be localized
>
> and it works fine. The same way you were using DC before.. Well I am not
> sure what would be the best answer, but if you are trying to use DC , I was
> able to do it with Job class itself.
>
> Regards
> Prav
>
>
> On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:
>
>> Thanks for asking this : Im not sure and didnt realize this until you
>> mentioned it!
>>
>> 1) Prav:  You are implying that we would use the "Job" Class... but how
>> could it replace the DC?
>>
>> 2) The point of the DC is to replicate a file so that its present and
>> local on ALL nodes.   I didnt know it was deprecated, but there must be
>> some replacement for it - or maybe it just got renamed and moved?
>>
>> SO ... what is the future of the DistributedCache for mapreduce jobs?
>>
>>
>> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>
>>> I think you can use the Job class.
>>>
>>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>>
>>> Regards
>>> Prav
>>>
>>>
>>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>>> Michael.Giordano@vistronix.com> wrote:
>>>
>>>>  I noticed that in Hadoop 2.2.0
>>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>>
>>>>
>>>>
>>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>>
>>>>
>>>>
>>>> Is there a class that provides equivalent functionality? My application
>>>> relies heavily on DistributedCache.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mike G.
>>>>
>>>> This communication, along with its attachments, is considered
>>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>>> of the person(s) named above.  Note that unauthorized disclosure or
>>>> distribution of information not generally known to the public is strictly
>>>> prohibited.  If you are not the intended recipient, please notify the
>>>> sender immediately.
>>>>
>>>
>>>
>>
>>
>> --
>> Jay Vyas
>> http://jayunit100.blogspot.com
>>
>
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
@Jay - I don't know how Job class is replacing the DistributedCache class ,
but I remember trying distributed cache functions like

   void *addArchiveToClassPath
<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
(Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
 archive)
          Add an archive path to the current set of classpath entries.
 void *addCacheArchive
<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
 uri)
          Add a archives to be localized   void *addCacheFile
<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
 uri)
          Add a file to be localized

and it works fine. The same way you were using DC before.. Well I am not
sure what would be the best answer, but if you are trying to use DC , I was
able to do it with Job class itself.

Regards
Prav


On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:

> Thanks for asking this : Im not sure and didnt realize this until you
> mentioned it!
>
> 1) Prav:  You are implying that we would use the "Job" Class... but how
> could it replace the DC?
>
> 2) The point of the DC is to replicate a file so that its present and
> local on ALL nodes.   I didnt know it was deprecated, but there must be
> some replacement for it - or maybe it just got renamed and moved?
>
> SO ... what is the future of the DistributedCache for mapreduce jobs?
>
>
> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>
>> I think you can use the Job class.
>>
>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>
>> Regards
>> Prav
>>
>>
>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>> Michael.Giordano@vistronix.com> wrote:
>>
>>>  I noticed that in Hadoop 2.2.0
>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>
>>>
>>>
>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>
>>>
>>>
>>> Is there a class that provides equivalent functionality? My application
>>> relies heavily on DistributedCache.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Mike G.
>>>
>>> This communication, along with its attachments, is considered
>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>> of the person(s) named above.  Note that unauthorized disclosure or
>>> distribution of information not generally known to the public is strictly
>>> prohibited.  If you are not the intended recipient, please notify the
>>> sender immediately.
>>>
>>
>>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
@Jay - I don't know how Job class is replacing the DistributedCache class ,
but I remember trying distributed cache functions like

   void *addArchiveToClassPath
<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
(Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
 archive)
          Add an archive path to the current set of classpath entries.
 void *addCacheArchive
<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
 uri)
          Add a archives to be localized   void *addCacheFile
<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
 uri)
          Add a file to be localized

and it works fine. The same way you were using DC before.. Well I am not
sure what would be the best answer, but if you are trying to use DC , I was
able to do it with Job class itself.

Regards
Prav


On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:

> Thanks for asking this : Im not sure and didnt realize this until you
> mentioned it!
>
> 1) Prav:  You are implying that we would use the "Job" Class... but how
> could it replace the DC?
>
> 2) The point of the DC is to replicate a file so that its present and
> local on ALL nodes.   I didnt know it was deprecated, but there must be
> some replacement for it - or maybe it just got renamed and moved?
>
> SO ... what is the future of the DistributedCache for mapreduce jobs?
>
>
> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>
>> I think you can use the Job class.
>>
>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>
>> Regards
>> Prav
>>
>>
>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>> Michael.Giordano@vistronix.com> wrote:
>>
>>>  I noticed that in Hadoop 2.2.0
>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>
>>>
>>>
>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>
>>>
>>>
>>> Is there a class that provides equivalent functionality? My application
>>> relies heavily on DistributedCache.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Mike G.
>>>
>>> This communication, along with its attachments, is considered
>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>> of the person(s) named above.  Note that unauthorized disclosure or
>>> distribution of information not generally known to the public is strictly
>>> prohibited.  If you are not the intended recipient, please notify the
>>> sender immediately.
>>>
>>
>>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
@Jay - I don't know how Job class is replacing the DistributedCache class ,
but I remember trying distributed cache functions like

   void *addArchiveToClassPath
<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
(Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
 archive)
          Add an archive path to the current set of classpath entries.
 void *addCacheArchive
<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
 uri)
          Add a archives to be localized   void *addCacheFile
<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
 uri)
          Add a file to be localized

and it works fine. The same way you were using DC before.. Well I am not
sure what would be the best answer, but if you are trying to use DC , I was
able to do it with Job class itself.

Regards
Prav


On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:

> Thanks for asking this : Im not sure and didnt realize this until you
> mentioned it!
>
> 1) Prav:  You are implying that we would use the "Job" Class... but how
> could it replace the DC?
>
> 2) The point of the DC is to replicate a file so that its present and
> local on ALL nodes.   I didnt know it was deprecated, but there must be
> some replacement for it - or maybe it just got renamed and moved?
>
> SO ... what is the future of the DistributedCache for mapreduce jobs?
>
>
> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>
>> I think you can use the Job class.
>>
>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>
>> Regards
>> Prav
>>
>>
>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>> Michael.Giordano@vistronix.com> wrote:
>>
>>>  I noticed that in Hadoop 2.2.0
>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>
>>>
>>>
>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>
>>>
>>>
>>> Is there a class that provides equivalent functionality? My application
>>> relies heavily on DistributedCache.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Mike G.
>>>
>>> This communication, along with its attachments, is considered
>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>> of the person(s) named above.  Note that unauthorized disclosure or
>>> distribution of information not generally known to the public is strictly
>>> prohibited.  If you are not the intended recipient, please notify the
>>> sender immediately.
>>>
>>
>>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
@Jay - I don't know how Job class is replacing the DistributedCache class ,
but I remember trying distributed cache functions like

   void *addArchiveToClassPath
<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29>*
(Path<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/Path.html>
 archive)
          Add an archive path to the current set of classpath entries.
 void *addCacheArchive
<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheArchive%28java.net.URI%29>*
(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
 uri)
          Add a archives to be localized   void *addCacheFile
<http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addCacheFile%28java.net.URI%29>*
(URI<http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true>
 uri)
          Add a file to be localized

and it works fine. The same way you were using DC before.. Well I am not
sure what would be the best answer, but if you are trying to use DC , I was
able to do it with Job class itself.

Regards
Prav


On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas <ja...@gmail.com> wrote:

> Thanks for asking this : Im not sure and didnt realize this until you
> mentioned it!
>
> 1) Prav:  You are implying that we would use the "Job" Class... but how
> could it replace the DC?
>
> 2) The point of the DC is to replicate a file so that its present and
> local on ALL nodes.   I didnt know it was deprecated, but there must be
> some replacement for it - or maybe it just got renamed and moved?
>
> SO ... what is the future of the DistributedCache for mapreduce jobs?
>
>
> On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:
>
>> I think you can use the Job class.
>>
>> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>>
>> Regards
>> Prav
>>
>>
>> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
>> Michael.Giordano@vistronix.com> wrote:
>>
>>>  I noticed that in Hadoop 2.2.0
>>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>>
>>>
>>>
>>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>>
>>>
>>>
>>> Is there a class that provides equivalent functionality? My application
>>> relies heavily on DistributedCache.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Mike G.
>>>
>>> This communication, along with its attachments, is considered
>>> confidential and proprietary to Vistronix.  It is intended only for the use
>>> of the person(s) named above.  Note that unauthorized disclosure or
>>> distribution of information not generally known to the public is strictly
>>> prohibited.  If you are not the intended recipient, please notify the
>>> sender immediately.
>>>
>>
>>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Re: DistributedCache deprecated

Posted by Jay Vyas <ja...@gmail.com>.
Thanks for asking this : Im not sure and didnt realize this until you
mentioned it!

1) Prav:  You are implying that we would use the "Job" Class... but how
could it replace the DC?

2) The point of the DC is to replicate a file so that its present and local
on ALL nodes.   I didnt know it was deprecated, but there must be some
replacement for it - or maybe it just got renamed and moved?

SO ... what is the future of the DistributedCache for mapreduce jobs?


On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:

> I think you can use the Job class.
>
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>
> Regards
> Prav
>
>
> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
> Michael.Giordano@vistronix.com> wrote:
>
>>  I noticed that in Hadoop 2.2.0
>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>
>>
>>
>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>
>>
>>
>> Is there a class that provides equivalent functionality? My application
>> relies heavily on DistributedCache.
>>
>>
>>
>> Thanks,
>>
>> Mike G.
>>
>> This communication, along with its attachments, is considered
>> confidential and proprietary to Vistronix.  It is intended only for the use
>> of the person(s) named above.  Note that unauthorized disclosure or
>> distribution of information not generally known to the public is strictly
>> prohibited.  If you are not the intended recipient, please notify the
>> sender immediately.
>>
>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: DistributedCache deprecated

Posted by Jay Vyas <ja...@gmail.com>.
Thanks for asking this : Im not sure and didnt realize this until you
mentioned it!

1) Prav:  You are implying that we would use the "Job" Class... but how
could it replace the DC?

2) The point of the DC is to replicate a file so that its present and local
on ALL nodes.   I didnt know it was deprecated, but there must be some
replacement for it - or maybe it just got renamed and moved?

SO ... what is the future of the DistributedCache for mapreduce jobs?


On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:

> I think you can use the Job class.
>
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>
> Regards
> Prav
>
>
> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
> Michael.Giordano@vistronix.com> wrote:
>
>>  I noticed that in Hadoop 2.2.0
>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>
>>
>>
>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>
>>
>>
>> Is there a class that provides equivalent functionality? My application
>> relies heavily on DistributedCache.
>>
>>
>>
>> Thanks,
>>
>> Mike G.
>>
>> This communication, along with its attachments, is considered
>> confidential and proprietary to Vistronix.  It is intended only for the use
>> of the person(s) named above.  Note that unauthorized disclosure or
>> distribution of information not generally known to the public is strictly
>> prohibited.  If you are not the intended recipient, please notify the
>> sender immediately.
>>
>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: DistributedCache deprecated

Posted by Jay Vyas <ja...@gmail.com>.
Thanks for asking this : Im not sure and didnt realize this until you
mentioned it!

1) Prav:  You are implying that we would use the "Job" Class... but how
could it replace the DC?

2) The point of the DC is to replicate a file so that its present and local
on ALL nodes.   I didnt know it was deprecated, but there must be some
replacement for it - or maybe it just got renamed and moved?

SO ... what is the future of the DistributedCache for mapreduce jobs?


On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:

> I think you can use the Job class.
>
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>
> Regards
> Prav
>
>
> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
> Michael.Giordano@vistronix.com> wrote:
>
>>  I noticed that in Hadoop 2.2.0
>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>
>>
>>
>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>
>>
>>
>> Is there a class that provides equivalent functionality? My application
>> relies heavily on DistributedCache.
>>
>>
>>
>> Thanks,
>>
>> Mike G.
>>
>> This communication, along with its attachments, is considered
>> confidential and proprietary to Vistronix.  It is intended only for the use
>> of the person(s) named above.  Note that unauthorized disclosure or
>> distribution of information not generally known to the public is strictly
>> prohibited.  If you are not the intended recipient, please notify the
>> sender immediately.
>>
>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: DistributedCache deprecated

Posted by Jay Vyas <ja...@gmail.com>.
Thanks for asking this : Im not sure and didnt realize this until you
mentioned it!

1) Prav:  You are implying that we would use the "Job" Class... but how
could it replace the DC?

2) The point of the DC is to replicate a file so that its present and local
on ALL nodes.   I didnt know it was deprecated, but there must be some
replacement for it - or maybe it just got renamed and moved?

SO ... what is the future of the DistributedCache for mapreduce jobs?


On Wed, Jan 29, 2014 at 4:22 PM, praveenesh kumar <pr...@gmail.com>wrote:

> I think you can use the Job class.
>
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html
>
> Regards
> Prav
>
>
> On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
> Michael.Giordano@vistronix.com> wrote:
>
>>  I noticed that in Hadoop 2.2.0
>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>>
>>
>>
>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>>
>>
>>
>> Is there a class that provides equivalent functionality? My application
>> relies heavily on DistributedCache.
>>
>>
>>
>> Thanks,
>>
>> Mike G.
>>
>> This communication, along with its attachments, is considered
>> confidential and proprietary to Vistronix.  It is intended only for the use
>> of the person(s) named above.  Note that unauthorized disclosure or
>> distribution of information not generally known to the public is strictly
>> prohibited.  If you are not the intended recipient, please notify the
>> sender immediately.
>>
>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
I think you can use the Job class.
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html

Regards
Prav


On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
Michael.Giordano@vistronix.com> wrote:

>  I noticed that in Hadoop 2.2.0
> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>
>
>
> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>
>
>
> Is there a class that provides equivalent functionality? My application
> relies heavily on DistributedCache.
>
>
>
> Thanks,
>
> Mike G.
>
> This communication, along with its attachments, is considered confidential
> and proprietary to Vistronix.  It is intended only for the use of the
> person(s) named above.  Note that unauthorized disclosure or distribution
> of information not generally known to the public is strictly
> prohibited.  If you are not the intended recipient, please notify the
> sender immediately.
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
I think you can use the Job class.
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html

Regards
Prav


On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
Michael.Giordano@vistronix.com> wrote:

>  I noticed that in Hadoop 2.2.0
> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>
>
>
> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>
>
>
> Is there a class that provides equivalent functionality? My application
> relies heavily on DistributedCache.
>
>
>
> Thanks,
>
> Mike G.
>
> This communication, along with its attachments, is considered confidential
> and proprietary to Vistronix.  It is intended only for the use of the
> person(s) named above.  Note that unauthorized disclosure or distribution
> of information not generally known to the public is strictly
> prohibited.  If you are not the intended recipient, please notify the
> sender immediately.
>

Re: DistributedCache deprecated

Posted by Amit Mittal <am...@gmail.com>.
Hi Mike & Prav,

Although I am new to Hadoop, but would like to add my 2 cents if that helps.
We are having 2 ways for distribution of shared data, one is using Job
configuration and other is DistributedCache.
As job configuration is read by the JT, TT and child JVMs, and each time
the configuration is read, all of its entries are read in memory, even if
they are not used. So using job configuration is not advised if the data is
more than few kilobytes. So it is not alternative to DistributedCache
unless some modifications are done in Job configuration to address this
limitation.
So I am also curious to know the alternatative to DistributedCache class.

Thanks
Amit


On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
Michael.Giordano@vistronix.com> wrote:

>  I noticed that in Hadoop 2.2.0
> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>
>
>
> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>
>
>
> Is there a class that provides equivalent functionality? My application
> relies heavily on DistributedCache.
>
>
>
> Thanks,
>
> Mike G.
>
> This communication, along with its attachments, is considered confidential
> and proprietary to Vistronix.  It is intended only for the use of the
> person(s) named above.  Note that unauthorized disclosure or distribution
> of information not generally known to the public is strictly
> prohibited.  If you are not the intended recipient, please notify the
> sender immediately.
>

Re: DistributedCache deprecated

Posted by Amit Mittal <am...@gmail.com>.
Hi Mike & Prav,

Although I am new to Hadoop, but would like to add my 2 cents if that helps.
We are having 2 ways for distribution of shared data, one is using Job
configuration and other is DistributedCache.
As job configuration is read by the JT, TT and child JVMs, and each time
the configuration is read, all of its entries are read in memory, even if
they are not used. So using job configuration is not advised if the data is
more than few kilobytes. So it is not alternative to DistributedCache
unless some modifications are done in Job configuration to address this
limitation.
So I am also curious to know the alternatative to DistributedCache class.

Thanks
Amit


On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
Michael.Giordano@vistronix.com> wrote:

>  I noticed that in Hadoop 2.2.0
> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>
>
>
> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>
>
>
> Is there a class that provides equivalent functionality? My application
> relies heavily on DistributedCache.
>
>
>
> Thanks,
>
> Mike G.
>
> This communication, along with its attachments, is considered confidential
> and proprietary to Vistronix.  It is intended only for the use of the
> person(s) named above.  Note that unauthorized disclosure or distribution
> of information not generally known to the public is strictly
> prohibited.  If you are not the intended recipient, please notify the
> sender immediately.
>

Re: DistributedCache deprecated

Posted by Amit Mittal <am...@gmail.com>.
Hi Mike & Prav,

Although I am new to Hadoop, but would like to add my 2 cents if that helps.
We are having 2 ways for distribution of shared data, one is using Job
configuration and other is DistributedCache.
As job configuration is read by the JT, TT and child JVMs, and each time
the configuration is read, all of its entries are read in memory, even if
they are not used. So using job configuration is not advised if the data is
more than few kilobytes. So it is not alternative to DistributedCache
unless some modifications are done in Job configuration to address this
limitation.
So I am also curious to know the alternatative to DistributedCache class.

Thanks
Amit


On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael <
Michael.Giordano@vistronix.com> wrote:

>  I noticed that in Hadoop 2.2.0
> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>
>
>
> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>
>
>
> Is there a class that provides equivalent functionality? My application
> relies heavily on DistributedCache.
>
>
>
> Thanks,
>
> Mike G.
>
> This communication, along with its attachments, is considered confidential
> and proprietary to Vistronix.  It is intended only for the use of the
> person(s) named above.  Note that unauthorized disclosure or distribution
> of information not generally known to the public is strictly
> prohibited.  If you are not the intended recipient, please notify the
> sender immediately.
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
I think you can use the Job class.
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html

Regards
Prav


On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
Michael.Giordano@vistronix.com> wrote:

>  I noticed that in Hadoop 2.2.0
> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>
>
>
> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>
>
>
> Is there a class that provides equivalent functionality? My application
> relies heavily on DistributedCache.
>
>
>
> Thanks,
>
> Mike G.
>
> This communication, along with its attachments, is considered confidential
> and proprietary to Vistronix.  It is intended only for the use of the
> person(s) named above.  Note that unauthorized disclosure or distribution
> of information not generally known to the public is strictly
> prohibited.  If you are not the intended recipient, please notify the
> sender immediately.
>

Re: DistributedCache deprecated

Posted by praveenesh kumar <pr...@gmail.com>.
I think you can use the Job class.
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html

Regards
Prav


On Wed, Jan 29, 2014 at 9:13 PM, Giordano, Michael <
Michael.Giordano@vistronix.com> wrote:

>  I noticed that in Hadoop 2.2.0
> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
>
>
>
> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
>
>
>
> Is there a class that provides equivalent functionality? My application
> relies heavily on DistributedCache.
>
>
>
> Thanks,
>
> Mike G.
>
> This communication, along with its attachments, is considered confidential
> and proprietary to Vistronix.  It is intended only for the use of the
> person(s) named above.  Note that unauthorized disclosure or distribution
> of information not generally known to the public is strictly
> prohibited.  If you are not the intended recipient, please notify the
> sender immediately.
>