You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2008/08/04 22:12:44 UTC

CheckIndex tool

Hey Mike,

I'm thinking about https://issues.apache.org/jira/browse/SOLR-566 and  
was thinking about adding some more programmatic access to the  
CheckIndex tool and wanted to see if you had any thoughts.  Basically,  
I am going to to capture info into a simple data structure that can  
then be introspected and serialized into a RequestHandler, but also  
something that might be more generally useful in certain cases where  
things go bad.  I was debating keeping the inline out.printlns, but  
not sure if they shouldn't just be moved to the main such that the cmd  
line stuff still works as is, but it doesn't clog the logs for those  
that want programmatic access.

I'll post a patch soon, but wanted to see if you had any preliminary  
insight.

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: CheckIndex tool

Posted by Michael McCandless <lu...@mikemccandless.com>.
Actually, those exceptions are thrown by the code detecting the  
mismatch, and then caught by CheckIndex and handled as meaning that  
segment is corrupt.  This is consistent eg with how Lucene throws  
CorruptIndexException deep down if it hits an inconsistency.

I think it's fine if you want to not use exceptions for the "local"  
mismatches, and instead record the error in a data structure and then  
stop processing that one segment.  But for the "deep down" exceptions  
you still have to keep the catch in CheckIndex to record those.

Mike

On Aug 5, 2008, at 9:30 AM, Grant Ingersoll wrote:

> I'll look into these.  The other parts I am not sure on is the  
> throwing of exceptions for mismatches.  I know they mean CheckIndex  
> can't go forward, but they aren't really errors in CheckIndex, so  
> much as errors in the index, which CheckIndex is just reporting.   
> So, I'm inclined to capture that and present it (and return  
> immediately) instead of throw an exception.  Is that reasonable?
>
> -Grant
>
>
> On Aug 4, 2008, at 5:01 PM, Michael McCandless wrote:
>
>>
>> This sounds good!  I like the idea of checking the index when Solr  
>> has to force release the write.lock.
>>
>> The one caveat is, when checking a large index (which can take  
>> quite some time), it'd be nice to have the equivalent of the  
>> inline'd out.print/ln calls happen in realtime so that you can see  
>> (on the command line output) that progress is being made, which  
>> segment is being checked, etc.?
>>
>> Maybe change it to an optional "infoStream" (like IndexWriter), and  
>> then the current inlined prints become calls to message() which  
>> checks if infoStream is non-null?
>>
>> Mike
>>
>> Grant Ingersoll wrote:
>>
>>> Hey Mike,
>>>
>>> I'm thinking about https://issues.apache.org/jira/browse/SOLR-566  
>>> and was thinking about adding some more programmatic access to the  
>>> CheckIndex tool and wanted to see if you had any thoughts.   
>>> Basically, I am going to to capture info into a simple data  
>>> structure that can then be introspected and serialized into a  
>>> RequestHandler, but also something that might be more generally  
>>> useful in certain cases where things go bad.  I was debating  
>>> keeping the inline out.printlns, but not sure if they shouldn't  
>>> just be moved to the main such that the cmd line stuff still works  
>>> as is, but it doesn't clog the logs for those that want  
>>> programmatic access.
>>>
>>> I'll post a patch soon, but wanted to see if you had any  
>>> preliminary insight.
>>>
>>> -Grant
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: CheckIndex tool

Posted by Grant Ingersoll <gs...@apache.org>.
I'll look into these.  The other parts I am not sure on is the  
throwing of exceptions for mismatches.  I know they mean CheckIndex  
can't go forward, but they aren't really errors in CheckIndex, so much  
as errors in the index, which CheckIndex is just reporting.  So, I'm  
inclined to capture that and present it (and return immediately)  
instead of throw an exception.  Is that reasonable?

-Grant


On Aug 4, 2008, at 5:01 PM, Michael McCandless wrote:

>
> This sounds good!  I like the idea of checking the index when Solr  
> has to force release the write.lock.
>
> The one caveat is, when checking a large index (which can take quite  
> some time), it'd be nice to have the equivalent of the inline'd  
> out.print/ln calls happen in realtime so that you can see (on the  
> command line output) that progress is being made, which segment is  
> being checked, etc.?
>
> Maybe change it to an optional "infoStream" (like IndexWriter), and  
> then the current inlined prints become calls to message() which  
> checks if infoStream is non-null?
>
> Mike
>
> Grant Ingersoll wrote:
>
>> Hey Mike,
>>
>> I'm thinking about https://issues.apache.org/jira/browse/SOLR-566  
>> and was thinking about adding some more programmatic access to the  
>> CheckIndex tool and wanted to see if you had any thoughts.   
>> Basically, I am going to to capture info into a simple data  
>> structure that can then be introspected and serialized into a  
>> RequestHandler, but also something that might be more generally  
>> useful in certain cases where things go bad.  I was debating  
>> keeping the inline out.printlns, but not sure if they shouldn't  
>> just be moved to the main such that the cmd line stuff still works  
>> as is, but it doesn't clog the logs for those that want  
>> programmatic access.
>>
>> I'll post a patch soon, but wanted to see if you had any  
>> preliminary insight.
>>
>> -Grant
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: CheckIndex tool

Posted by Michael McCandless <lu...@mikemccandless.com>.
This sounds good!  I like the idea of checking the index when Solr has  
to force release the write.lock.

The one caveat is, when checking a large index (which can take quite  
some time), it'd be nice to have the equivalent of the inline'd  
out.print/ln calls happen in realtime so that you can see (on the  
command line output) that progress is being made, which segment is  
being checked, etc.?

Maybe change it to an optional "infoStream" (like IndexWriter), and  
then the current inlined prints become calls to message() which checks  
if infoStream is non-null?

Mike

Grant Ingersoll wrote:

> Hey Mike,
>
> I'm thinking about https://issues.apache.org/jira/browse/SOLR-566  
> and was thinking about adding some more programmatic access to the  
> CheckIndex tool and wanted to see if you had any thoughts.   
> Basically, I am going to to capture info into a simple data  
> structure that can then be introspected and serialized into a  
> RequestHandler, but also something that might be more generally  
> useful in certain cases where things go bad.  I was debating keeping  
> the inline out.printlns, but not sure if they shouldn't just be  
> moved to the main such that the cmd line stuff still works as is,  
> but it doesn't clog the logs for those that want programmatic access.
>
> I'll post a patch soon, but wanted to see if you had any preliminary  
> insight.
>
> -Grant
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org