You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Srinivas Chamarthi <sr...@gmail.com> on 2014/12/12 03:07:53 UTC
DistributedCache
Hi,
I want to cache map/reducer temporary output files so that I can compare
two map results coming from two different nodes to verify the integrity
check.
I am simulating this use case with speculative execution by rescheduling
the first task as soon as it is started and running.
Now I want to compare output files coming from speculative attempt and
prior attempt so that I can calculate the credit scoring of each node.
I want to use DistributedCache to cache the local file system files in
CommitPending stage from TaskImpl. But the DistributedCache is actually
deprecated. is there any other way I can do this ?
I think I can use HDFS to save the temporary output files so that other
nodes can see it ? but is there any in-memory solution I can use ?
any pointers are greatly appreciated.
thx & rgds,
srinivas chamarthi
Re: DistributedCache
Posted by unmesha sreeveni <un...@gmail.com>.
On Fri, Dec 12, 2014 at 9:55 AM, Shahab Yunus <sh...@gmail.com>
wrote:
>
> job.addCacheFiles
Yes you can use job.addCacheFiles to cache the file.
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
DistributedCache.addCacheFile(status.getPath().toUri(), conf);
}
Hope this link helps
[1]
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
--
*Thanks & Regards *
*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/
Re: DistributedCache
Posted by unmesha sreeveni <un...@gmail.com>.
On Fri, Dec 12, 2014 at 9:55 AM, Shahab Yunus <sh...@gmail.com>
wrote:
>
> job.addCacheFiles
Yes you can use job.addCacheFiles to cache the file.
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
DistributedCache.addCacheFile(status.getPath().toUri(), conf);
}
Hope this link helps
[1]
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
--
*Thanks & Regards *
*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/
Re: DistributedCache
Posted by unmesha sreeveni <un...@gmail.com>.
On Fri, Dec 12, 2014 at 9:55 AM, Shahab Yunus <sh...@gmail.com>
wrote:
>
> job.addCacheFiles
Yes you can use job.addCacheFiles to cache the file.
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
DistributedCache.addCacheFile(status.getPath().toUri(), conf);
}
Hope this link helps
[1]
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
--
*Thanks & Regards *
*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/
Re: DistributedCache
Posted by unmesha sreeveni <un...@gmail.com>.
On Fri, Dec 12, 2014 at 9:55 AM, Shahab Yunus <sh...@gmail.com>
wrote:
>
> job.addCacheFiles
Yes you can use job.addCacheFiles to cache the file.
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
DistributedCache.addCacheFile(status.getPath().toUri(), conf);
}
Hope this link helps
[1]
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
--
*Thanks & Regards *
*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/
Re: DistributedCache
Posted by Shahab Yunus <sh...@gmail.com>.
Look at this thread. It has alternatives to DistributedCache.
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api
Basically you can use the new method job.addCacheFiles to pass on stuff to
the individual tasks.
Regards,
Shahab
On Thu, Dec 11, 2014 at 9:07 PM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:
>
> Hi,
>
> I want to cache map/reducer temporary output files so that I can compare
> two map results coming from two different nodes to verify the integrity
> check.
>
> I am simulating this use case with speculative execution by rescheduling
> the first task as soon as it is started and running.
>
> Now I want to compare output files coming from speculative attempt and
> prior attempt so that I can calculate the credit scoring of each node.
>
> I want to use DistributedCache to cache the local file system files in
> CommitPending stage from TaskImpl. But the DistributedCache is actually
> deprecated. is there any other way I can do this ?
>
> I think I can use HDFS to save the temporary output files so that other
> nodes can see it ? but is there any in-memory solution I can use ?
>
> any pointers are greatly appreciated.
>
> thx & rgds,
> srinivas chamarthi
>
Re: DistributedCache
Posted by Shahab Yunus <sh...@gmail.com>.
Look at this thread. It has alternatives to DistributedCache.
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api
Basically you can use the new method job.addCacheFiles to pass on stuff to
the individual tasks.
Regards,
Shahab
On Thu, Dec 11, 2014 at 9:07 PM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:
>
> Hi,
>
> I want to cache map/reducer temporary output files so that I can compare
> two map results coming from two different nodes to verify the integrity
> check.
>
> I am simulating this use case with speculative execution by rescheduling
> the first task as soon as it is started and running.
>
> Now I want to compare output files coming from speculative attempt and
> prior attempt so that I can calculate the credit scoring of each node.
>
> I want to use DistributedCache to cache the local file system files in
> CommitPending stage from TaskImpl. But the DistributedCache is actually
> deprecated. is there any other way I can do this ?
>
> I think I can use HDFS to save the temporary output files so that other
> nodes can see it ? but is there any in-memory solution I can use ?
>
> any pointers are greatly appreciated.
>
> thx & rgds,
> srinivas chamarthi
>
Re: DistributedCache
Posted by Shahab Yunus <sh...@gmail.com>.
Look at this thread. It has alternatives to DistributedCache.
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api
Basically you can use the new method job.addCacheFiles to pass on stuff to
the individual tasks.
Regards,
Shahab
On Thu, Dec 11, 2014 at 9:07 PM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:
>
> Hi,
>
> I want to cache map/reducer temporary output files so that I can compare
> two map results coming from two different nodes to verify the integrity
> check.
>
> I am simulating this use case with speculative execution by rescheduling
> the first task as soon as it is started and running.
>
> Now I want to compare output files coming from speculative attempt and
> prior attempt so that I can calculate the credit scoring of each node.
>
> I want to use DistributedCache to cache the local file system files in
> CommitPending stage from TaskImpl. But the DistributedCache is actually
> deprecated. is there any other way I can do this ?
>
> I think I can use HDFS to save the temporary output files so that other
> nodes can see it ? but is there any in-memory solution I can use ?
>
> any pointers are greatly appreciated.
>
> thx & rgds,
> srinivas chamarthi
>
Re: DistributedCache
Posted by Shahab Yunus <sh...@gmail.com>.
Look at this thread. It has alternatives to DistributedCache.
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api
Basically you can use the new method job.addCacheFiles to pass on stuff to
the individual tasks.
Regards,
Shahab
On Thu, Dec 11, 2014 at 9:07 PM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:
>
> Hi,
>
> I want to cache map/reducer temporary output files so that I can compare
> two map results coming from two different nodes to verify the integrity
> check.
>
> I am simulating this use case with speculative execution by rescheduling
> the first task as soon as it is started and running.
>
> Now I want to compare output files coming from speculative attempt and
> prior attempt so that I can calculate the credit scoring of each node.
>
> I want to use DistributedCache to cache the local file system files in
> CommitPending stage from TaskImpl. But the DistributedCache is actually
> deprecated. is there any other way I can do this ?
>
> I think I can use HDFS to save the temporary output files so that other
> nodes can see it ? but is there any in-memory solution I can use ?
>
> any pointers are greatly appreciated.
>
> thx & rgds,
> srinivas chamarthi
>