You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Gal Nitzan <gn...@usa.net> on 2005/12/30 00:05:34 UTC

Bug in DeleteDuplicates.java ?

this function throws IOException. Why?

         public long getPos() throws IOException {
            return (doc*INDEX_LENGTH)/maxDoc;
          }

It should be throwing ArithmeticException 

What happens when maxDoc is zero?


Gal





Re: Bug in DeleteDuplicates.java ?

Posted by Doug Cutting <cu...@nutch.org>.
Andrzej Bialecki wrote:
> Gal Nitzan wrote:
> 
>> this function throws IOException. Why?
>>
>>         public long getPos() throws IOException {
>>            return (doc*INDEX_LENGTH)/maxDoc;
>>          }
>>
>> It should be throwing ArithmeticException
>>  
>>
> 
> The IOException is required by the API of RecordReader.
> 
>> What happens when maxDoc is zero?
>>  
>>
> 
> Ka-boom! ;-) You're right, this should be wrapped in an IOException and 
> rethrown.

No, it should really just be fixed to not cause an ArithmeticException. 
  This is called to report progress.  In this case the input "file" for 
the map is a Lucene index whose documents we iterate through.  To 
simplify the construction of input splits (without opening each index) a 
constant "length" is used for each "file".  So we have to scale the 
document numbers to give progress in this range.

The problem is that progress may be reported even when there are no 
documents in the index.  So the call is valid and no exception should be 
thrown.

Doug

Re: Bug in DeleteDuplicates.java ?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Gal Nitzan wrote:

>this function throws IOException. Why?
>
>         public long getPos() throws IOException {
>            return (doc*INDEX_LENGTH)/maxDoc;
>          }
>
>It should be throwing ArithmeticException 
>
>  
>

The IOException is required by the API of RecordReader.

>What happens when maxDoc is zero?
>  
>

Ka-boom! ;-) You're right, this should be wrapped in an IOException and 
rethrown.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com