You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Nathan Marz <na...@rapleaf.com> on 2008/09/25 19:48:14 UTC

Custom input format getSplits being called twice

Hello all,

I am getting some odd behavior from hadoop which seems like a bug. I  
have created a custom input format, and I am observing that my  
"getSplits" method is being called twice. Each call is on a different  
instance of the input format. The job, however, is only run once,  
using the result from the second call to getSplits. The first call  
receives the numSplits hint as expected, while in the second call that  
value is overriden to 1. I am running hadoop in standalone mode. Does  
anyone know anything about this issue?

Thanks,

Nathan Marz
Rapleaf

Re: Custom input format getSplits being called twice

Posted by Chris Douglas <ch...@yahoo-inc.com>.
The two calls are probably from JobClient::submitJob and  
LocalJobRunner.Job::run. It's not a bug per se, but there are many  
cases where one would want LocalJobRunner to behave less  
idiosyncratically. -C

On Sep 25, 2008, at 10:48 AM, Nathan Marz wrote:

> Hello all,
>
> I am getting some odd behavior from hadoop which seems like a bug. I  
> have created a custom input format, and I am observing that my  
> "getSplits" method is being called twice. Each call is on a  
> different instance of the input format. The job, however, is only  
> run once, using the result from the second call to getSplits. The  
> first call receives the numSplits hint as expected, while in the  
> second call that value is overriden to 1. I am running hadoop in  
> standalone mode. Does anyone know anything about this issue?
>
> Thanks,
>
> Nathan Marz
> Rapleaf