You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Peter Becker <pb...@dstc.edu.au> on 2003/07/09 08:16:39 UTC

Iterating through all documents indexed

And another one...

It seems a reoccuring question but I can't figure out how to do a proper 
update of an index. The problem I have is iterating through all 
documents -- I can think of a few hacks for this but there seems to be 
no way to just get an iterator/enumeration of all documents. This bit of 
code seems to work:

            IndexReader reader = IndexReader.open(this.indexLocation);
            for(int i = 0; i < reader.maxDoc(); i++) {
                Document doc = reader.document(i);
                if(doc != null) {
                   // check if up-to-date, fix if required
                }
            }

but I am a bit suspicious about what happens when I start deleting 
documents from the index. Is this ok? Are there better ways?

  Peter


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Fuzzy queries are case sensitive; doesn't behave as documented

Posted by Cormac Twomey <co...@siderean.com>.
Otis Gospodnetic wrote:

>>) I documented a test case for this. Fuzzy matches appear for me
>>ahead 
>>of some exact matches in some cases. As this is not as clear cut a
>>bug 
>>as the case sensitivity issue, I didn't post this as a bug.
>>
>>
>>any feedback you have would be much appreciated.
>>    
>>
>
>I think you should add the second issue to Bugzilla, if for no other
>reason then so it does not get lost.
>
>  
>
Ok, done. Note, I also posted a candidate patch to address the issue.

Regards,
--Cormac Twomey

Re: Fuzzy queries are case sensitive; doesn't behave as documented

Posted by Otis Gospodnetic <ot...@yahoo.com>.
--- Cormac Twomey <co...@siderean.com> wrote:
> I raised this issue a while back but it went unanswered so I'm trying
> again.
> 
> Anyhow, FuzzyTermEnum.java appears to have two problems -
> 
> 1) FuzzyTermEnum searches are case sensitive. Presumably this is not
> as 
> designed? See bug #18014 for a candidate patch.

I think you are correct about this.  I followed up to that bug report
just now.

> 2) The "Query Syntax" page on the website states in the "Fuzzy
> Searches" 
> section, that:
>  
>      "Terms found by the fuzzy search will automatically get a boost 
> factor of 0.2"

Hm, I am unable to find where this boost factor of 0.2 gets applied.
Have you been able to find it?  I was not the person who wrote the
query syntax page, so I am not sure where that statement comes from.

>     I've found this not to be the case. In my previous email ( 
>
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg02819.html
> 
> ) I documented a test case for this. Fuzzy matches appear for me
> ahead 
> of some exact matches in some cases. As this is not as clear cut a
> bug 
> as the case sensitivity issue, I didn't post this as a bug.
> 
> 
> any feedback you have would be much appreciated.

I think you should add the second issue to Bugzilla, if for no other
reason then so it does not get lost.

Thanks,
Otis


__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Fuzzy queries are case sensitive; doesn't behave as documented

Posted by Cormac Twomey <co...@siderean.com>.
Folks,

I raised this issue a while back but it went unanswered so I'm trying again.

Anyhow, FuzzyTermEnum.java appears to have two problems -

1) FuzzyTermEnum searches are case sensitive. Presumably this is not as 
designed? See bug #18014 for a candidate patch.
2) The "Query Syntax" page on the website states in the "Fuzzy Searches" 
section, that:
 
     "Terms found by the fuzzy search will automatically get a boost 
factor of 0.2"

    I've found this not to be the case. In my previous email ( 
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg02819.html 
) I documented a test case for this. Fuzzy matches appear for me ahead 
of some exact matches in some cases. As this is not as clear cut a bug 
as the case sensitivity issue, I didn't post this as a bug.


any feedback you have would be much appreciated.

Thanks,
--Cormac Twomey


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Iterating through all documents indexed

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Moved to lucene-user.
Peter, also see this:

http://jguru.com/faq/view.jsp?EID=703116
http://jguru.com/faq/view.jsp?EID=587213
http://jguru.com/faq/view.jsp?EID=1042002

Otis


--- Peter Becker <pb...@dstc.edu.au> wrote:
> And another one...
> 
> It seems a reoccuring question but I can't figure out how to do a
> proper 
> update of an index. The problem I have is iterating through all 
> documents -- I can think of a few hacks for this but there seems to
> be 
> no way to just get an iterator/enumeration of all documents. This bit
> of 
> code seems to work:
> 
>             IndexReader reader =
> IndexReader.open(this.indexLocation);
>             for(int i = 0; i < reader.maxDoc(); i++) {
>                 Document doc = reader.document(i);
>                 if(doc != null) {
>                    // check if up-to-date, fix if required
>                 }
>             }
> 
> but I am a bit suspicious about what happens when I start deleting 
> documents from the index. Is this ok? Are there better ways?
> 
>   Peter
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org