You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Srinivas Chamarthi <sr...@gmail.com> on 2014/12/12 03:07:53 UTC

DistributedCache

Hi,

I want to cache map/reducer temporary output files so that I can compare
two map results coming from two different nodes to verify the integrity
check.

I am simulating this use case with speculative execution by rescheduling
the first task as soon as it is started and running.

Now I want to compare output files coming from speculative attempt and
prior attempt so that I can calculate the credit scoring of each node.

I want to use DistributedCache to cache the local file system files in
CommitPending stage from TaskImpl. But the DistributedCache is actually
deprecated. is there any other way I can do this ?

I think I can use HDFS to save the temporary output files so that other
nodes can see it ? but is there any in-memory solution I can use ?

any pointers are greatly appreciated.

thx & rgds,
srinivas chamarthi

Re: DistributedCache

Posted by unmesha sreeveni <un...@gmail.com>.
On Fri, Dec 12, 2014 at 9:55 AM, Shahab Yunus <sh...@gmail.com>
wrote:
>
> job.addCacheFiles


​Yes you can use job.addCacheFiles to cache the file.

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
 DistributedCache.addCacheFile(status.getPath().toUri(), conf);

}

Hope this link helps
[1]
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: DistributedCache

Posted by unmesha sreeveni <un...@gmail.com>.
On Fri, Dec 12, 2014 at 9:55 AM, Shahab Yunus <sh...@gmail.com>
wrote:
>
> job.addCacheFiles


​Yes you can use job.addCacheFiles to cache the file.

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
 DistributedCache.addCacheFile(status.getPath().toUri(), conf);

}

Hope this link helps
[1]
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: DistributedCache

Posted by unmesha sreeveni <un...@gmail.com>.
On Fri, Dec 12, 2014 at 9:55 AM, Shahab Yunus <sh...@gmail.com>
wrote:
>
> job.addCacheFiles


​Yes you can use job.addCacheFiles to cache the file.

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
 DistributedCache.addCacheFile(status.getPath().toUri(), conf);

}

Hope this link helps
[1]
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: DistributedCache

Posted by unmesha sreeveni <un...@gmail.com>.
On Fri, Dec 12, 2014 at 9:55 AM, Shahab Yunus <sh...@gmail.com>
wrote:
>
> job.addCacheFiles


​Yes you can use job.addCacheFiles to cache the file.

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
 DistributedCache.addCacheFile(status.getPath().toUri(), conf);

}

Hope this link helps
[1]
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: DistributedCache

Posted by Shahab Yunus <sh...@gmail.com>.
Look at this thread. It has alternatives to DistributedCache.
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api

Basically you can use the new method job.addCacheFiles to pass on stuff to
the individual tasks.

Regards,
Shahab

On Thu, Dec 11, 2014 at 9:07 PM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:
>
> Hi,
>
> I want to cache map/reducer temporary output files so that I can compare
> two map results coming from two different nodes to verify the integrity
> check.
>
> I am simulating this use case with speculative execution by rescheduling
> the first task as soon as it is started and running.
>
> Now I want to compare output files coming from speculative attempt and
> prior attempt so that I can calculate the credit scoring of each node.
>
> I want to use DistributedCache to cache the local file system files in
> CommitPending stage from TaskImpl. But the DistributedCache is actually
> deprecated. is there any other way I can do this ?
>
> I think I can use HDFS to save the temporary output files so that other
> nodes can see it ? but is there any in-memory solution I can use ?
>
> any pointers are greatly appreciated.
>
> thx & rgds,
> srinivas chamarthi
>

Re: DistributedCache

Posted by Shahab Yunus <sh...@gmail.com>.
Look at this thread. It has alternatives to DistributedCache.
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api

Basically you can use the new method job.addCacheFiles to pass on stuff to
the individual tasks.

Regards,
Shahab

On Thu, Dec 11, 2014 at 9:07 PM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:
>
> Hi,
>
> I want to cache map/reducer temporary output files so that I can compare
> two map results coming from two different nodes to verify the integrity
> check.
>
> I am simulating this use case with speculative execution by rescheduling
> the first task as soon as it is started and running.
>
> Now I want to compare output files coming from speculative attempt and
> prior attempt so that I can calculate the credit scoring of each node.
>
> I want to use DistributedCache to cache the local file system files in
> CommitPending stage from TaskImpl. But the DistributedCache is actually
> deprecated. is there any other way I can do this ?
>
> I think I can use HDFS to save the temporary output files so that other
> nodes can see it ? but is there any in-memory solution I can use ?
>
> any pointers are greatly appreciated.
>
> thx & rgds,
> srinivas chamarthi
>

Re: DistributedCache

Posted by Shahab Yunus <sh...@gmail.com>.
Look at this thread. It has alternatives to DistributedCache.
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api

Basically you can use the new method job.addCacheFiles to pass on stuff to
the individual tasks.

Regards,
Shahab

On Thu, Dec 11, 2014 at 9:07 PM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:
>
> Hi,
>
> I want to cache map/reducer temporary output files so that I can compare
> two map results coming from two different nodes to verify the integrity
> check.
>
> I am simulating this use case with speculative execution by rescheduling
> the first task as soon as it is started and running.
>
> Now I want to compare output files coming from speculative attempt and
> prior attempt so that I can calculate the credit scoring of each node.
>
> I want to use DistributedCache to cache the local file system files in
> CommitPending stage from TaskImpl. But the DistributedCache is actually
> deprecated. is there any other way I can do this ?
>
> I think I can use HDFS to save the temporary output files so that other
> nodes can see it ? but is there any in-memory solution I can use ?
>
> any pointers are greatly appreciated.
>
> thx & rgds,
> srinivas chamarthi
>

Re: DistributedCache

Posted by Shahab Yunus <sh...@gmail.com>.
Look at this thread. It has alternatives to DistributedCache.
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api

Basically you can use the new method job.addCacheFiles to pass on stuff to
the individual tasks.

Regards,
Shahab

On Thu, Dec 11, 2014 at 9:07 PM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:
>
> Hi,
>
> I want to cache map/reducer temporary output files so that I can compare
> two map results coming from two different nodes to verify the integrity
> check.
>
> I am simulating this use case with speculative execution by rescheduling
> the first task as soon as it is started and running.
>
> Now I want to compare output files coming from speculative attempt and
> prior attempt so that I can calculate the credit scoring of each node.
>
> I want to use DistributedCache to cache the local file system files in
> CommitPending stage from TaskImpl. But the DistributedCache is actually
> deprecated. is there any other way I can do this ?
>
> I think I can use HDFS to save the temporary output files so that other
> nodes can see it ? but is there any in-memory solution I can use ?
>
> any pointers are greatly appreciated.
>
> thx & rgds,
> srinivas chamarthi
>