You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Vinicius Carvalho <vi...@gmail.com> on 2008/06/20 18:12:58 UTC

Termdocs question

Hello there! I trying to query for a specific document on a efficient way.
My index is structured in a way where I have an id field which is a unique
key for the whole index. When I'm updating/removing a document I was
searching for my id using a Searcher and a TermQuery. But reading the list
it seems that its a bit of overhead, using a reader.termDocs(term) would be
faster.

Here's a piece of code:

private void deleteFromIndex(String id){
        Term term = new Term("id",id);
        IndexReader reader = readerManager.getIndexReader();
        TermDocs termDocs = null;
        try {
            termDocs = reader.termDocs(term);
            while(termDocs.next()){
                int index = termDocs.doc();
                if(reader.document(index).get("id").equals(id)){
                    reader.deleteDocument(index);
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }finally{
            if(termDocs != null){
                try {
                    termDocs.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

problem is, reader is not returning any term. When I switch to query it
works. My documents have all being indexed using BrazilianAnalyzer, don't
know if that could be the reason.

Regards

-- 
"In a world without fences and walls, who needs Gates and Windows?"

Re: Termdocs question

Posted by Chris Hostetter <ho...@fucit.org>.

: 
: >            termDocs = reader.termDocs(term);
: >            while(termDocs.next()){
: >                int index = termDocs.doc();
: >                if(reader.document(index).get("id").equals(id)){
: >                    reader.deleteDocument(index);
: >                }
: >            }
: 
: Iterating documents and string comparing stored values is not very efficient.
: Use a query instead, something like this:

more specificly: there is no reason at all to look at the stored value -- 
just ensure that the *indexed* value is the "unique id" (which TermDocs 
will already ensure for you)

: BooleanQuery query = new BooleanQuery();
: query.add(new TermQuery(term), Occurs.MUST);
: query.add(new TermQuery(new Term("id", id), Occurs.MUST);

note: in the orriginal code, "term" was new Term("id", id) so this is 
unneded ... the goal was to iterate the doc(s) matching a unique(?) id.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Termdocs question

Posted by Vinicius Carvalho <vi...@gmail.com>.

I'm sorry, the problem was with the way the id was being indexed. It was
marked as tokenized, so I when searched for it's untokenized form I was not
getting the doc, now everything works fine :)

Regards

On Sat, Jun 21, 2008 at 2:08 PM, Karl Wettin <ka...@gmail.com> wrote:

>
> 20 jun 2008 kl. 18.12 skrev Vinicius Carvalho:
>
>
>  Hello there! I trying to query for a specific document on a efficient way.
>>
>
> Hi Vinicius,
>
>            termDocs = reader.termDocs(term);
>>           while(termDocs.next()){
>>               int index = termDocs.doc();
>>               if(reader.document(index).get("id").equals(id)){
>>                   reader.deleteDocument(index);
>>               }
>>           }
>>
>
> Iterating documents and string comparing stored values is not very
> efficient. Use a query instead, something like this:
>
> BooleanQuery query = new BooleanQuery();
> query.add(new TermQuery(term), Occurs.MUST);
> query.add(new TermQuery(new Term("id", id), Occurs.MUST);
> searcher.search(query, new HitCollector() {
>  public void collect(int doc, float score) {
>    reader.deleteDocument(doc);
>  }
> });
>
>
>       karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
"In a world without fences and walls, who needs Gates and Windows?"

Re: Termdocs question

Posted by Karl Wettin <ka...@gmail.com>.

20 jun 2008 kl. 18.12 skrev Vinicius Carvalho:


> Hello there! I trying to query for a specific document on a  
> efficient way.

Hi Vinicius,

>            termDocs = reader.termDocs(term);
>            while(termDocs.next()){
>                int index = termDocs.doc();
>                if(reader.document(index).get("id").equals(id)){
>                    reader.deleteDocument(index);
>                }
>            }

Iterating documents and string comparing stored values is not very  
efficient. Use a query instead, something like this:

BooleanQuery query = new BooleanQuery();
query.add(new TermQuery(term), Occurs.MUST);
query.add(new TermQuery(new Term("id", id), Occurs.MUST);
searcher.search(query, new HitCollector() {
   public void collect(int doc, float score) {
     reader.deleteDocument(doc);
   }
});


        karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Termdocs question

Posted by Erick Erickson <er...@gmail.com>.

A couple of questions:
1> I assume by "not returning any docs" you mean that you
    never get into your while loop. Is that true?
2> I'm a little suspicious of the field labeled "id" and whether
     it's at all possible that this is getting confused with the
     internal Lucene doc ID. This is a wild shot in the dark
     influenced by the fact that I'm working with Groovy where
     property access is equivalent to get.....
3> What is the form of your unique key? If it's all numeric I
     don't see a problem, but if it has any non-numeric
     characters in it, then your analyzer will probably
     lower-case the term wheres your term constructor
     won't. Which would also explain why searching
     would work.

If none of this is remotely helpful, could you post a few
examples of keys that work and those that don't?

Best
Erick

On Fri, Jun 20, 2008 at 12:12 PM, Vinicius Carvalho <
viniciusccarvalho@gmail.com> wrote:

> Hello there! I trying to query for a specific document on a efficient way.
> My index is structured in a way where I have an id field which is a unique
> key for the whole index. When I'm updating/removing a document I was
> searching for my id using a Searcher and a TermQuery. But reading the list
> it seems that its a bit of overhead, using a reader.termDocs(term) would be
> faster.
>
> Here's a piece of code:
>
> private void deleteFromIndex(String id){
>        Term term = new Term("id",id);
>        IndexReader reader = readerManager.getIndexReader();
>        TermDocs termDocs = null;
>        try {
>            termDocs = reader.termDocs(term);
>            while(termDocs.next()){
>                int index = termDocs.doc();
>                if(reader.document(index).get("id").equals(id)){
>                    reader.deleteDocument(index);
>                }
>            }
>        } catch (IOException e) {
>            e.printStackTrace();
>        }finally{
>            if(termDocs != null){
>                try {
>                    termDocs.close();
>                } catch (IOException e) {
>                    e.printStackTrace();
>                }
>            }
>        }
>    }
>
> problem is, reader is not returning any term. When I switch to query it
> works. My documents have all being indexed using BrazilianAnalyzer, don't
> know if that could be the reason.
>
> Regards
>
> --
> "In a world without fences and walls, who needs Gates and Windows?"
>