You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Peter Becker <pb...@dstc.edu.au> on 2003/07/09 08:16:39 UTC
Iterating through all documents indexed
And another one...
It seems a reoccuring question but I can't figure out how to do a proper
update of an index. The problem I have is iterating through all
documents -- I can think of a few hacks for this but there seems to be
no way to just get an iterator/enumeration of all documents. This bit of
code seems to work:
IndexReader reader = IndexReader.open(this.indexLocation);
for(int i = 0; i < reader.maxDoc(); i++) {
Document doc = reader.document(i);
if(doc != null) {
// check if up-to-date, fix if required
}
}
but I am a bit suspicious about what happens when I start deleting
documents from the index. Is this ok? Are there better ways?
Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: Fuzzy queries are case sensitive; doesn't behave as documented
Posted by Cormac Twomey <co...@siderean.com>.
Otis Gospodnetic wrote:
>>) I documented a test case for this. Fuzzy matches appear for me
>>ahead
>>of some exact matches in some cases. As this is not as clear cut a
>>bug
>>as the case sensitivity issue, I didn't post this as a bug.
>>
>>
>>any feedback you have would be much appreciated.
>>
>>
>
>I think you should add the second issue to Bugzilla, if for no other
>reason then so it does not get lost.
>
>
>
Ok, done. Note, I also posted a candidate patch to address the issue.
Regards,
--Cormac Twomey
Re: Fuzzy queries are case sensitive; doesn't behave as documented
Posted by Otis Gospodnetic <ot...@yahoo.com>.
--- Cormac Twomey <co...@siderean.com> wrote:
> I raised this issue a while back but it went unanswered so I'm trying
> again.
>
> Anyhow, FuzzyTermEnum.java appears to have two problems -
>
> 1) FuzzyTermEnum searches are case sensitive. Presumably this is not
> as
> designed? See bug #18014 for a candidate patch.
I think you are correct about this. I followed up to that bug report
just now.
> 2) The "Query Syntax" page on the website states in the "Fuzzy
> Searches"
> section, that:
>
> "Terms found by the fuzzy search will automatically get a boost
> factor of 0.2"
Hm, I am unable to find where this boost factor of 0.2 gets applied.
Have you been able to find it? I was not the person who wrote the
query syntax page, so I am not sure where that statement comes from.
> I've found this not to be the case. In my previous email (
>
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg02819.html
>
> ) I documented a test case for this. Fuzzy matches appear for me
> ahead
> of some exact matches in some cases. As this is not as clear cut a
> bug
> as the case sensitivity issue, I didn't post this as a bug.
>
>
> any feedback you have would be much appreciated.
I think you should add the second issue to Bugzilla, if for no other
reason then so it does not get lost.
Thanks,
Otis
__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Fuzzy queries are case sensitive; doesn't behave as documented
Posted by Cormac Twomey <co...@siderean.com>.
Folks,
I raised this issue a while back but it went unanswered so I'm trying again.
Anyhow, FuzzyTermEnum.java appears to have two problems -
1) FuzzyTermEnum searches are case sensitive. Presumably this is not as
designed? See bug #18014 for a candidate patch.
2) The "Query Syntax" page on the website states in the "Fuzzy Searches"
section, that:
"Terms found by the fuzzy search will automatically get a boost
factor of 0.2"
I've found this not to be the case. In my previous email (
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg02819.html
) I documented a test case for this. Fuzzy matches appear for me ahead
of some exact matches in some cases. As this is not as clear cut a bug
as the case sensitivity issue, I didn't post this as a bug.
any feedback you have would be much appreciated.
Thanks,
--Cormac Twomey
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: Iterating through all documents indexed
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Moved to lucene-user.
Peter, also see this:
http://jguru.com/faq/view.jsp?EID=703116
http://jguru.com/faq/view.jsp?EID=587213
http://jguru.com/faq/view.jsp?EID=1042002
Otis
--- Peter Becker <pb...@dstc.edu.au> wrote:
> And another one...
>
> It seems a reoccuring question but I can't figure out how to do a
> proper
> update of an index. The problem I have is iterating through all
> documents -- I can think of a few hacks for this but there seems to
> be
> no way to just get an iterator/enumeration of all documents. This bit
> of
> code seems to work:
>
> IndexReader reader =
> IndexReader.open(this.indexLocation);
> for(int i = 0; i < reader.maxDoc(); i++) {
> Document doc = reader.document(i);
> if(doc != null) {
> // check if up-to-date, fix if required
> }
> }
>
> but I am a bit suspicious about what happens when I start deleting
> documents from the index. Is this ok? Are there better ways?
>
> Peter
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org