You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Nathan Marz <na...@rapleaf.com> on 2008/09/25 19:48:14 UTC
Custom input format getSplits being called twice
Hello all,
I am getting some odd behavior from hadoop which seems like a bug. I
have created a custom input format, and I am observing that my
"getSplits" method is being called twice. Each call is on a different
instance of the input format. The job, however, is only run once,
using the result from the second call to getSplits. The first call
receives the numSplits hint as expected, while in the second call that
value is overriden to 1. I am running hadoop in standalone mode. Does
anyone know anything about this issue?
Thanks,
Nathan Marz
Rapleaf
Re: Custom input format getSplits being called twice
Posted by Chris Douglas <ch...@yahoo-inc.com>.
The two calls are probably from JobClient::submitJob and
LocalJobRunner.Job::run. It's not a bug per se, but there are many
cases where one would want LocalJobRunner to behave less
idiosyncratically. -C
On Sep 25, 2008, at 10:48 AM, Nathan Marz wrote:
> Hello all,
>
> I am getting some odd behavior from hadoop which seems like a bug. I
> have created a custom input format, and I am observing that my
> "getSplits" method is being called twice. Each call is on a
> different instance of the input format. The job, however, is only
> run once, using the result from the second call to getSplits. The
> first call receives the numSplits hint as expected, while in the
> second call that value is overriden to 1. I am running hadoop in
> standalone mode. Does anyone know anything about this issue?
>
> Thanks,
>
> Nathan Marz
> Rapleaf