You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Vinicius Carvalho <vi...@gmail.com> on 2008/06/20 18:12:58 UTC
Termdocs question
Hello there! I trying to query for a specific document on a efficient way.
My index is structured in a way where I have an id field which is a unique
key for the whole index. When I'm updating/removing a document I was
searching for my id using a Searcher and a TermQuery. But reading the list
it seems that its a bit of overhead, using a reader.termDocs(term) would be
faster.
Here's a piece of code:
private void deleteFromIndex(String id){
Term term = new Term("id",id);
IndexReader reader = readerManager.getIndexReader();
TermDocs termDocs = null;
try {
termDocs = reader.termDocs(term);
while(termDocs.next()){
int index = termDocs.doc();
if(reader.document(index).get("id").equals(id)){
reader.deleteDocument(index);
}
}
} catch (IOException e) {
e.printStackTrace();
}finally{
if(termDocs != null){
try {
termDocs.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
problem is, reader is not returning any term. When I switch to query it
works. My documents have all being indexed using BrazilianAnalyzer, don't
know if that could be the reason.
Regards
--
"In a world without fences and walls, who needs Gates and Windows?"
Re: Termdocs question
Posted by Chris Hostetter <ho...@fucit.org>.
:
: > termDocs = reader.termDocs(term);
: > while(termDocs.next()){
: > int index = termDocs.doc();
: > if(reader.document(index).get("id").equals(id)){
: > reader.deleteDocument(index);
: > }
: > }
:
: Iterating documents and string comparing stored values is not very efficient.
: Use a query instead, something like this:
more specificly: there is no reason at all to look at the stored value --
just ensure that the *indexed* value is the "unique id" (which TermDocs
will already ensure for you)
: BooleanQuery query = new BooleanQuery();
: query.add(new TermQuery(term), Occurs.MUST);
: query.add(new TermQuery(new Term("id", id), Occurs.MUST);
note: in the orriginal code, "term" was new Term("id", id) so this is
unneded ... the goal was to iterate the doc(s) matching a unique(?) id.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Termdocs question
Posted by Vinicius Carvalho <vi...@gmail.com>.
I'm sorry, the problem was with the way the id was being indexed. It was
marked as tokenized, so I when searched for it's untokenized form I was not
getting the doc, now everything works fine :)
Regards
On Sat, Jun 21, 2008 at 2:08 PM, Karl Wettin <ka...@gmail.com> wrote:
>
> 20 jun 2008 kl. 18.12 skrev Vinicius Carvalho:
>
>
> Hello there! I trying to query for a specific document on a efficient way.
>>
>
> Hi Vinicius,
>
> termDocs = reader.termDocs(term);
>> while(termDocs.next()){
>> int index = termDocs.doc();
>> if(reader.document(index).get("id").equals(id)){
>> reader.deleteDocument(index);
>> }
>> }
>>
>
> Iterating documents and string comparing stored values is not very
> efficient. Use a query instead, something like this:
>
> BooleanQuery query = new BooleanQuery();
> query.add(new TermQuery(term), Occurs.MUST);
> query.add(new TermQuery(new Term("id", id), Occurs.MUST);
> searcher.search(query, new HitCollector() {
> public void collect(int doc, float score) {
> reader.deleteDocument(doc);
> }
> });
>
>
> karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
--
"In a world without fences and walls, who needs Gates and Windows?"
Re: Termdocs question
Posted by Karl Wettin <ka...@gmail.com>.
20 jun 2008 kl. 18.12 skrev Vinicius Carvalho:
> Hello there! I trying to query for a specific document on a
> efficient way.
Hi Vinicius,
> termDocs = reader.termDocs(term);
> while(termDocs.next()){
> int index = termDocs.doc();
> if(reader.document(index).get("id").equals(id)){
> reader.deleteDocument(index);
> }
> }
Iterating documents and string comparing stored values is not very
efficient. Use a query instead, something like this:
BooleanQuery query = new BooleanQuery();
query.add(new TermQuery(term), Occurs.MUST);
query.add(new TermQuery(new Term("id", id), Occurs.MUST);
searcher.search(query, new HitCollector() {
public void collect(int doc, float score) {
reader.deleteDocument(doc);
}
});
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Termdocs question
Posted by Erick Erickson <er...@gmail.com>.
A couple of questions:
1> I assume by "not returning any docs" you mean that you
never get into your while loop. Is that true?
2> I'm a little suspicious of the field labeled "id" and whether
it's at all possible that this is getting confused with the
internal Lucene doc ID. This is a wild shot in the dark
influenced by the fact that I'm working with Groovy where
property access is equivalent to get.....
3> What is the form of your unique key? If it's all numeric I
don't see a problem, but if it has any non-numeric
characters in it, then your analyzer will probably
lower-case the term wheres your term constructor
won't. Which would also explain why searching
would work.
If none of this is remotely helpful, could you post a few
examples of keys that work and those that don't?
Best
Erick
On Fri, Jun 20, 2008 at 12:12 PM, Vinicius Carvalho <
viniciusccarvalho@gmail.com> wrote:
> Hello there! I trying to query for a specific document on a efficient way.
> My index is structured in a way where I have an id field which is a unique
> key for the whole index. When I'm updating/removing a document I was
> searching for my id using a Searcher and a TermQuery. But reading the list
> it seems that its a bit of overhead, using a reader.termDocs(term) would be
> faster.
>
> Here's a piece of code:
>
> private void deleteFromIndex(String id){
> Term term = new Term("id",id);
> IndexReader reader = readerManager.getIndexReader();
> TermDocs termDocs = null;
> try {
> termDocs = reader.termDocs(term);
> while(termDocs.next()){
> int index = termDocs.doc();
> if(reader.document(index).get("id").equals(id)){
> reader.deleteDocument(index);
> }
> }
> } catch (IOException e) {
> e.printStackTrace();
> }finally{
> if(termDocs != null){
> try {
> termDocs.close();
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> }
> }
>
> problem is, reader is not returning any term. When I switch to query it
> works. My documents have all being indexed using BrazilianAnalyzer, don't
> know if that could be the reason.
>
> Regards
>
> --
> "In a world without fences and walls, who needs Gates and Windows?"
>