You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by ne...@hotu.com on 2003/05/10 01:50:02 UTC

query matching all documents

Is there a query object which can match all documents?

As an example, suppose my search supports zero or more search criteria.  
Each criteria, if specified, is required to match the document.  If no 
criteria are specified, all documents should be returned.  Here's an 
example (NullQuery is an imaginary class which represents the query type 
I'm looking for):

BooleanQuery query = new BooleanQuery();
query.add(new NullQuery(), true, false);
if (criteria1exists) {
  query.add(..., true, false);
}
if (criteria2exists) {
  query.add(..., true, false);
}

Without line 2, if no criteria is specified, no documents will be 
returned.  Apparently, a boolean query with no boolean elements matches 
zero documents.  In my opinion, it would have made more sense for it to 
match all documents, since each boolean element represents a constraint, 
and no constraints implies all documents.  But for clarity, maybe there 
should be an AllQuery and a NoneQuery which match all documents and no 
documents respectively.

But back to reality, is there a way to achieve this with the existing 
query classes?  I'd prefer not to add a dummy field to search by, as 
this is not efficient.

Thanks,
Jim


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: query matching all documents

Posted by Tatu Saloranta <ta...@hypermall.net>.

On Thursday 22 May 2003 07:32, Brisbart Franck wrote:
> You don't really need to take care of the deleted docs. When you'll try
> to get a deleted doc (reader.document(i) on a deleted doc), a
> IllegalArgumentException will thrown with the message 'attempt to access
> a deleted document'. Just catch this exception.

Although technically speaking that would work, in general it's almost always 
better to check for a condition that can occur  instead of waiting for an 
exception to occur. For example, if there is a chance that a reference may 
well be null, it's better to check for null than to do try-catch for 
NullPointerException; if no null is expected, no specific catch block is 
needed (it may make sense to have catch-all construct for robustness, but not 
for specific one).

Exceptions are to be used only for exceptional things, and for iterating over 
non-optimized index, finding a deleted doc is nothing too exceptional? If 
index is assumed to be optimal, and no deleted docs should be there, then it 
may be reasonable to use try-catch "just in case".

There's a significant overhead for just creating the exception object (JVM 
needs to unwind the whole call stack, and that's seldom optimized by VMs... 
because they optimize non-exception cases etc. etc.), and when iterating over 
all docs, this overhead may make traversing significantly slower if there are 
multiple deleted docs.

-+ Tatu +-

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: query matching all documents

Posted by Guilherme Barile <gu...@prosoma.com.br>.

Sorry to bother, but how would it work ?
I already have a wrapper class (as I'm creating an index with predefined
fields), I'll implement the delete() function soon.
So, after deleting, I must optimize the index ? so this way the doc
numbers will always be a sequence ? Or maybe the safer way would be to
add if(!reader.isDeleted(doc-number)) in my listing loop ?

thanks

gui

On Thu, 2003-05-22 at 10:30, Morus Walter wrote:
> Guilherme Barile writes:
> > As I said, I'm still getting started (didn't implement deleting
> > documents yet). Any tips on checking this ?
> > 
> IndexReader has a isDeleted(doc-number) method...
> 
> If you make sure all indices are optimized after deleting, it should
> be ok to skip the test.
> 
> greetings
> 	Morus
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: query matching all documents

Posted by Morus Walter <mo...@tanto-xipolis.de>.

Guilherme Barile writes:
> As I said, I'm still getting started (didn't implement deleting
> documents yet). Any tips on checking this ?
> 
IndexReader has a isDeleted(doc-number) method...

If you make sure all indices are optimized after deleting, it should
be ok to skip the test.

greetings
	Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: query matching all documents

Posted by Guilherme Barile <gu...@prosoma.com.br>.

What are the performance concerns of doing an optimization after every
delete ?

On Thu, 2003-05-22 at 11:36, Brisbart Franck wrote:
> You're right.
> When you delete a document, the document is marked as 'deleted'. And the 
> documents numbers are still the same until an optimize is done.
> 
> So, after deleting documents, if you want to list them:
> - either you do a loop from 0 to maxDoc() and you treat the deleted docs 
> (with the same IndexReader)
> - or you do an 'optimize' and with a brand new IndexReader you do your 
> loop from 0 to numDocs() (without any deleted docs to treat).
> 
> franck
> 
> Guilherme Barile wrote:
> > What I didn't figure out is, if I have some index like:
> > [0] doc1.txt
> > [1] doc2.doc
> > [2] doc3.xls
> > [3] doc4.nfo
> > [4] doc4.pdf
> > 
> > and then I delete doc2.doc (document #1 in lucene). Will the other
> > documents numbers change ? or there will be a gap in my index ?
> > Let's list it again with doc2.doc deleted (supposing the gap will be
> > there)
> > 
> > [0] doc1.txt
> > [1] << DELETED >>
> > [2] doc3.xls
> > [3] doc4.nfo
> > [4] doc4.pdf
> > 
> > this way (i think) numDocs() will return 4, but maxDocs() would return
> > 5. Using numDocs() would make me lose a document, at least in the way I
> > implemented it. Any tips ?
> > 
> > gui
> > 
> > 
> > On Thu, 2003-05-22 at 10:32, Brisbart Franck wrote:
> > 
> >>You don't really need to take care of the deleted docs. When you'll try 
> >>to get a deleted doc (reader.document(i) on a deleted doc), a 
> >>IllegalArgumentException will thrown with the message 'attempt to access 
> >>a deleted document'. Just catch this exception.
> >>
> >>Also, I suggest you to use 'numDocs()' instead of 'maxDoc()' to get the 
> >>real number of documnets in the index.
> >>
> >>Franck
> >>
> >>Guilherme Barile wrote:
> >>
> >>>As I said, I'm still getting started (didn't implement deleting
> >>>documents yet). Any tips on checking this ?
> >>>
> >>>On Thu, 2003-05-22 at 03:31, Morus Walter wrote:
> >>>
> >>>
> >>>>Guilherme Barile writes:
> >>>>
> >>>>
> >>>>>If you're trying to get all documents, why not
> >>>>>
> >>>>>IndexReader reader = IndexReader.open(this.indexDir);
> >>>>>Document doc;
> >>>>>	
> >>>>>for (int i = 0; i < reader.maxDoc(); i++) {
> >>>>>	try {
> >>>>>		doc = reader.document(i);
> >>>>>		System.out.println(i + " " + doc.get("source"));
> >>>>>	}
> >>>>>	catch (Exception e) {
> >>>>>		System.out.println("Error getting doc " + i);
> >>>>>	}
> >>>>>}
> >>>>>
> >>>>
> >>>>I guess there should be some extra check to take care of deleted
> >>>>documents, that aren't removed from the index yet.
> >>>>
> >>>>greetings
> >>>>	Morus
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: query matching all documents

Posted by Brisbart Franck <Fr...@kelkoo.net>.

You're right.
When you delete a document, the document is marked as 'deleted'. And the 
documents numbers are still the same until an optimize is done.

So, after deleting documents, if you want to list them:
- either you do a loop from 0 to maxDoc() and you treat the deleted docs 
(with the same IndexReader)
- or you do an 'optimize' and with a brand new IndexReader you do your 
loop from 0 to numDocs() (without any deleted docs to treat).

franck

Guilherme Barile wrote:
> What I didn't figure out is, if I have some index like:
> [0] doc1.txt
> [1] doc2.doc
> [2] doc3.xls
> [3] doc4.nfo
> [4] doc4.pdf
> 
> and then I delete doc2.doc (document #1 in lucene). Will the other
> documents numbers change ? or there will be a gap in my index ?
> Let's list it again with doc2.doc deleted (supposing the gap will be
> there)
> 
> [0] doc1.txt
> [1] << DELETED >>
> [2] doc3.xls
> [3] doc4.nfo
> [4] doc4.pdf
> 
> this way (i think) numDocs() will return 4, but maxDocs() would return
> 5. Using numDocs() would make me lose a document, at least in the way I
> implemented it. Any tips ?
> 
> gui
> 
> 
> On Thu, 2003-05-22 at 10:32, Brisbart Franck wrote:
> 
>>You don't really need to take care of the deleted docs. When you'll try 
>>to get a deleted doc (reader.document(i) on a deleted doc), a 
>>IllegalArgumentException will thrown with the message 'attempt to access 
>>a deleted document'. Just catch this exception.
>>
>>Also, I suggest you to use 'numDocs()' instead of 'maxDoc()' to get the 
>>real number of documnets in the index.
>>
>>Franck
>>
>>Guilherme Barile wrote:
>>
>>>As I said, I'm still getting started (didn't implement deleting
>>>documents yet). Any tips on checking this ?
>>>
>>>On Thu, 2003-05-22 at 03:31, Morus Walter wrote:
>>>
>>>
>>>>Guilherme Barile writes:
>>>>
>>>>
>>>>>If you're trying to get all documents, why not
>>>>>
>>>>>IndexReader reader = IndexReader.open(this.indexDir);
>>>>>Document doc;
>>>>>	
>>>>>for (int i = 0; i < reader.maxDoc(); i++) {
>>>>>	try {
>>>>>		doc = reader.document(i);
>>>>>		System.out.println(i + " " + doc.get("source"));
>>>>>	}
>>>>>	catch (Exception e) {
>>>>>		System.out.println("Error getting doc " + i);
>>>>>	}
>>>>>}
>>>>>
>>>>
>>>>I guess there should be some extra check to take care of deleted
>>>>documents, that aren't removed from the index yet.
>>>>
>>>>greetings
>>>>	Morus


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: query matching all documents

Posted by Guilherme Barile <gu...@prosoma.com.br>.

What I didn't figure out is, if I have some index like:
[0] doc1.txt
[1] doc2.doc
[2] doc3.xls
[3] doc4.nfo
[4] doc4.pdf

and then I delete doc2.doc (document #1 in lucene). Will the other
documents numbers change ? or there will be a gap in my index ?
Let's list it again with doc2.doc deleted (supposing the gap will be
there)

[0] doc1.txt
[1] << DELETED >>
[2] doc3.xls
[3] doc4.nfo
[4] doc4.pdf

this way (i think) numDocs() will return 4, but maxDocs() would return
5. Using numDocs() would make me lose a document, at least in the way I
implemented it. Any tips ?

gui


On Thu, 2003-05-22 at 10:32, Brisbart Franck wrote:
> You don't really need to take care of the deleted docs. When you'll try 
> to get a deleted doc (reader.document(i) on a deleted doc), a 
> IllegalArgumentException will thrown with the message 'attempt to access 
> a deleted document'. Just catch this exception.
> 
> Also, I suggest you to use 'numDocs()' instead of 'maxDoc()' to get the 
> real number of documnets in the index.
> 
> Franck
> 
> Guilherme Barile wrote:
> > As I said, I'm still getting started (didn't implement deleting
> > documents yet). Any tips on checking this ?
> > 
> > On Thu, 2003-05-22 at 03:31, Morus Walter wrote:
> > 
> >>Guilherme Barile writes:
> >>
> >>>If you're trying to get all documents, why not
> >>>
> >>>IndexReader reader = IndexReader.open(this.indexDir);
> >>>Document doc;
> >>>	
> >>>for (int i = 0; i < reader.maxDoc(); i++) {
> >>>	try {
> >>>		doc = reader.document(i);
> >>>		System.out.println(i + " " + doc.get("source"));
> >>>	}
> >>>	catch (Exception e) {
> >>>		System.out.println("Error getting doc " + i);
> >>>	}
> >>>}
> >>>
> >>
> >>I guess there should be some extra check to take care of deleted
> >>documents, that aren't removed from the index yet.
> >>
> >>greetings
> >>	Morus
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: query matching all documents

Posted by Brisbart Franck <Fr...@kelkoo.net>.

You don't really need to take care of the deleted docs. When you'll try 
to get a deleted doc (reader.document(i) on a deleted doc), a 
IllegalArgumentException will thrown with the message 'attempt to access 
a deleted document'. Just catch this exception.

Also, I suggest you to use 'numDocs()' instead of 'maxDoc()' to get the 
real number of documnets in the index.

Franck

Guilherme Barile wrote:
> As I said, I'm still getting started (didn't implement deleting
> documents yet). Any tips on checking this ?
> 
> On Thu, 2003-05-22 at 03:31, Morus Walter wrote:
> 
>>Guilherme Barile writes:
>>
>>>If you're trying to get all documents, why not
>>>
>>>IndexReader reader = IndexReader.open(this.indexDir);
>>>Document doc;
>>>	
>>>for (int i = 0; i < reader.maxDoc(); i++) {
>>>	try {
>>>		doc = reader.document(i);
>>>		System.out.println(i + " " + doc.get("source"));
>>>	}
>>>	catch (Exception e) {
>>>		System.out.println("Error getting doc " + i);
>>>	}
>>>}
>>>
>>
>>I guess there should be some extra check to take care of deleted
>>documents, that aren't removed from the index yet.
>>
>>greetings
>>	Morus
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: query matching all documents

Posted by Guilherme Barile <gu...@prosoma.com.br>.

As I said, I'm still getting started (didn't implement deleting
documents yet). Any tips on checking this ?

On Thu, 2003-05-22 at 03:31, Morus Walter wrote:
> Guilherme Barile writes:
> > If you're trying to get all documents, why not
> > 
> > IndexReader reader = IndexReader.open(this.indexDir);
> > Document doc;
> > 	
> > for (int i = 0; i < reader.maxDoc(); i++) {
> > 	try {
> > 		doc = reader.document(i);
> > 		System.out.println(i + " " + doc.get("source"));
> > 	}
> > 	catch (Exception e) {
> > 		System.out.println("Error getting doc " + i);
> > 	}
> > }
> > 
> I guess there should be some extra check to take care of deleted
> documents, that aren't removed from the index yet.
> 
> greetings
> 	Morus
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: query matching all documents

Posted by Morus Walter <mo...@tanto-xipolis.de>.

Guilherme Barile writes:
> If you're trying to get all documents, why not
> 
> IndexReader reader = IndexReader.open(this.indexDir);
> Document doc;
> 	
> for (int i = 0; i < reader.maxDoc(); i++) {
> 	try {
> 		doc = reader.document(i);
> 		System.out.println(i + " " + doc.get("source"));
> 	}
> 	catch (Exception e) {
> 		System.out.println("Error getting doc " + i);
> 	}
> }
> 
I guess there should be some extra check to take care of deleted
documents, that aren't removed from the index yet.

greetings
	Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: query matching all documents

Posted by Guilherme Barile <gu...@prosoma.com.br>.

If you're trying to get all documents, why not

IndexReader reader = IndexReader.open(this.indexDir);
Document doc;
	
for (int i = 0; i < reader.maxDoc(); i++) {
	try {
		doc = reader.document(i);
		System.out.println(i + " " + doc.get("source"));
	}
	catch (Exception e) {
		System.out.println("Error getting doc " + i);
	}
}

this is just a simple example (i'm a newbie, but using this to list all
documents on a certain index)
you could add another loop inside the for so you could get all the
fields in a document (in this case I'm only getting the field named
"source") there are methods on the indexreader object for that.

see ya

gui

On Wed, 2003-05-21 at 17:16, David Medinets wrote:
> ----- Original Message -----
> From: <ne...@hotu.com>
> > > Is there a query object which can match all documents?
> > > I'd prefer not to add a dummy field to search by, as
> > > this is not efficient.
> 
> Why is the dummy field not efficient? I use something like a keyword of 'ZZ'
> and a value of 'A' so that the disk space used is minimal.
> 
> David Medinets
> http://www.codebits.com
> Need some R&D done? Let me know I'm looking for interesting projects.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: query matching all documents

Posted by David Medinets <me...@mtolive.com>.

----- Original Message -----
From: <ne...@hotu.com>
> > Is there a query object which can match all documents?
> > I'd prefer not to add a dummy field to search by, as
> > this is not efficient.

Why is the dummy field not efficient? I use something like a keyword of 'ZZ'
and a value of 'A' so that the disk space used is minimal.

David Medinets
http://www.codebits.com
Need some R&D done? Let me know I'm looking for interesting projects.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: query matching all documents

Posted by Scott Ganyo <sc...@etapestry.com>.

Yes.  You must use a dummy field.

(Isn't this in the FAQ?)

newsham@hotu.com wrote:

> Since this question has gone unanswered, should I assume that such a 
> query is not possible with the existing Lucene version?
>
> newsham@hotu.com wrote:
>
>>
>> Is there a query object which can match all documents?
>>
>> As an example, suppose my search supports zero or more search 
>> criteria.  Each criteria, if specified, is required to match the 
>> document.  If no criteria are specified, all documents should be 
>> returned.  Here's an example (NullQuery is an imaginary class which 
>> represents the query type I'm looking for):
>>
>> BooleanQuery query = new BooleanQuery();
>> query.add(new NullQuery(), true, false);
>> if (criteria1exists) {
>>  query.add(..., true, false);
>> }
>> if (criteria2exists) {
>>  query.add(..., true, false);
>> }
>>
>> Without line 2, if no criteria is specified, no documents will be 
>> returned.  Apparently, a boolean query with no boolean elements 
>> matches zero documents.  In my opinion, it would have made more sense 
>> for it to match all documents, since each boolean element represents 
>> a constraint, and no constraints implies all documents.  But for 
>> clarity, maybe there should be an AllQuery and a NoneQuery which 
>> match all documents and no documents respectively.
>>
>> But back to reality, is there a way to achieve this with the existing 
>> query classes?  I'd prefer not to add a dummy field to search by, as 
>> this is not efficient.
>>
>> Thanks,
>> Jim
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


-- 
If the world should blow itself up, the last audible voice would be that of an expert saying it can't be done. - Peter Ustinov



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: query matching all documents

Posted by ne...@hotu.com.

Since this question has gone unanswered, should I assume that such a 
query is not possible with the existing Lucene version?

newsham@hotu.com wrote:

>
> Is there a query object which can match all documents?
>
> As an example, suppose my search supports zero or more search 
> criteria.  Each criteria, if specified, is required to match the 
> document.  If no criteria are specified, all documents should be 
> returned.  Here's an example (NullQuery is an imaginary class which 
> represents the query type I'm looking for):
>
> BooleanQuery query = new BooleanQuery();
> query.add(new NullQuery(), true, false);
> if (criteria1exists) {
>  query.add(..., true, false);
> }
> if (criteria2exists) {
>  query.add(..., true, false);
> }
>
> Without line 2, if no criteria is specified, no documents will be 
> returned.  Apparently, a boolean query with no boolean elements 
> matches zero documents.  In my opinion, it would have made more sense 
> for it to match all documents, since each boolean element represents a 
> constraint, and no constraints implies all documents.  But for 
> clarity, maybe there should be an AllQuery and a NoneQuery which match 
> all documents and no documents respectively.
>
> But back to reality, is there a way to achieve this with the existing 
> query classes?  I'd prefer not to add a dummy field to search by, as 
> this is not efficient.
>
> Thanks,
> Jim
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org