You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Bhupesh Bansal <bb...@linkedin.com> on 2009/01/22 20:29:54 UTC

Distributed cache testing in local mode

Hey folks, 

I am trying to use Distributed cache in hadoop jobs to pass around
configuration files , external-jars (job sepecific) and some archive data.

I want to test Job end-to-end in local mode, but I think the distributed
caches are localized in TaskTracker code which is not called in local mode
Through LocalJobRunner.

I can do some fairly simple workarounds for this but was just wondering if
folks have more ideas about it.

Thanks
Bhupesh


Re: Distributed cache testing in local mode

Posted by Tom White <to...@cloudera.com>.
It would be nice to make this more uniform. There's an outstanding
Jira on this if anyone is interested in looking at it:
https://issues.apache.org/jira/browse/HADOOP-2914

Tom

On Fri, Jan 23, 2009 at 12:14 AM, Aaron Kimball <aa...@cloudera.com> wrote:
> Hi Bhupesh,
>
> I've noticed the same problem -- LocalJobRunner makes the DistributedCache
> effectively not work; so my code often winds up with two codepaths to
> retrieve the local data :\
>
> You could try running in pseudo-distributed mode to test, though then you
> lose the ability to run a single-stepping debugger on the whole end-to-end
> process.
>
> - Aaron
>
> On Thu, Jan 22, 2009 at 11:29 AM, Bhupesh Bansal <bb...@linkedin.com>wrote:
>
>> Hey folks,
>>
>> I am trying to use Distributed cache in hadoop jobs to pass around
>> configuration files , external-jars (job sepecific) and some archive data.
>>
>> I want to test Job end-to-end in local mode, but I think the distributed
>> caches are localized in TaskTracker code which is not called in local mode
>> Through LocalJobRunner.
>>
>> I can do some fairly simple workarounds for this but was just wondering if
>> folks have more ideas about it.
>>
>> Thanks
>> Bhupesh
>>
>>
>

Re: Distributed cache testing in local mode

Posted by Aaron Kimball <aa...@cloudera.com>.
Hi Bhupesh,

I've noticed the same problem -- LocalJobRunner makes the DistributedCache
effectively not work; so my code often winds up with two codepaths to
retrieve the local data :\

You could try running in pseudo-distributed mode to test, though then you
lose the ability to run a single-stepping debugger on the whole end-to-end
process.

- Aaron

On Thu, Jan 22, 2009 at 11:29 AM, Bhupesh Bansal <bb...@linkedin.com>wrote:

> Hey folks,
>
> I am trying to use Distributed cache in hadoop jobs to pass around
> configuration files , external-jars (job sepecific) and some archive data.
>
> I want to test Job end-to-end in local mode, but I think the distributed
> caches are localized in TaskTracker code which is not called in local mode
> Through LocalJobRunner.
>
> I can do some fairly simple workarounds for this but was just wondering if
> folks have more ideas about it.
>
> Thanks
> Bhupesh
>
>