You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Bhavin Pandya <bh...@rediff.co.in> on 2006/11/21 08:46:21 UTC

is there any way to find unique records ?

Hi,
In lucene, is there any way to find only unique records from a single field ..?

otherwise unnecessary i have to itereate through Hits and find out unique...

plz help..

- Bhavin pandya

Re: is there any way to find unique records ?

Posted by Bhavin Pandya <bh...@rediff.co.in>.
Hi Erick,

Thanks for your help...
I have successfully implemented using custom HitCollector....

- Bhavin pandya

----- Original Message ----- 
From: "Erick Erickson" <er...@gmail.com>
To: <ja...@lucene.apache.org>; "Bhavin Pandya" <bh...@rediff.co.in>
Sent: Tuesday, November 21, 2006 8:58 PM
Subject: Re: is there any way to find unique records ?


> Ok, I think I get it now. You're right that you probably don't want to
> iterate the Hits object since that has performance issues once you get
> beyond 100 docs or so. Although, I don't know how big your result sets 
> are.
> If they are guaranteed to be small, this may not matter.
>
> I'm guessing you want to implement a custom HitCollector. That has it's 
> own
> cautions about calling, say, IndexReader.document(id) for each hit, so you
> probably want to use TermDocs object. seek() and skipTo() and doc() are 
> your
> friends. Although I'd try the simple way of just calling
> IndexReader.document(id) first just to see if the performance was
> acceptable. Be sure you're looking at a truly representative data set 
> though
> <G>...
>
> Hope this helps
> Erick
>
> On 11/21/06, Bhavin Pandya <bh...@rediff.co.in> wrote:
>>
>> Hi Erick,
>>
>> > If your asking for a list of all the unique values for a particular
>> field,
>> > see TermDocs and/or TermEnum which will allow you to look at, say, all
>> the
>> > values stored for some field. A trick here is to seek (new 
>> > Term("field",
>> > ""));. By putting nothing in the value, you effectively enumerate them
>> > all,
>> > something that I didn't find obvious
>>
>> I think your above solution is very near to what i am looking for ,
>> But little bit different way...
>> here is what i am planning to do...
>>
>> Suppose my index has four fields "product-title" , "product-desc" ,
>> "category" and "FLAG"    ( Fieldname FLAG has value "true" for each n
>> every
>> doc in index ...just added for iteration purpose )
>>
>> At search time.. .
>> query =  +(product-title:nokia) +(product-desc:nokia)
>> Hits hits = searcher.search(query);
>> I want to fetch unique "category" from above hits object...
>>
>> But i dont want to iterate through Hits object....
>>
>> Now As per your suggestions,  I can do something like this...
>> TermEnum  enum = termDocs(new Term("FLAG","true")
>> But it will return enumeration of all the document which is in 
>> index...But
>> i
>> want enumeration of all the document which is relevant to "nokia"...
>> How to . . ?
>>
>> Thanks
>> - Bhavin pandya
>>
>>
>> ----- Original Message -----
>> From: "Erick Erickson" <er...@gmail.com>
>> To: <ja...@lucene.apache.org>; "Bhavin Pandya" <bh...@rediff.co.in>
>> Sent: Tuesday, November 21, 2006 7:01 PM
>> Subject: Re: is there any way to find unique records ?
>>
>>
>> > I don't think I understand what "only unique records from a single
>> field"
>> > means.  If it's a unique value in a filed, there'll only be one 
>> > document
>> > in
>> > the hits object and there's no cost to iterating, so I doubt that's 
>> > what
>> > you
>> > mean.
>> >
>> > If your asking for a list of all the unique values for a particular
>> field,
>> > see TermDocs and/or TermEnum which will allow you to look at, say, all
>> the
>> > values stored for some field. A trick here is to seek (new 
>> > Term("field",
>> > ""));. By putting nothing in the value, you effectively enumerate them
>> > all,
>> > something that I didn't find obvious.
>> >
>> > If neither of these are close to the mark, perhaps you could provide
>> more
>> > detail.
>> >
>> > Best
>> > Erick
>> >
>> > On 11/21/06, Bhavin Pandya <bh...@rediff.co.in> wrote:
>> >>
>> >> Hi,
>> >> In lucene, is there any way to find only unique records from a single
>> >> field ..?
>> >>
>> >> otherwise unnecessary i have to itereate through Hits and find out
>> >> unique...
>> >>
>> >> plz help..
>> >>
>> >> - Bhavin pandya
>> >>
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: is there any way to find unique records ?

Posted by Erick Erickson <er...@gmail.com>.
Ok, I think I get it now. You're right that you probably don't want to
iterate the Hits object since that has performance issues once you get
beyond 100 docs or so. Although, I don't know how big your result sets are.
If they are guaranteed to be small, this may not matter.

I'm guessing you want to implement a custom HitCollector. That has it's own
cautions about calling, say, IndexReader.document(id) for each hit, so you
probably want to use TermDocs object. seek() and skipTo() and doc() are your
friends. Although I'd try the simple way of just calling
IndexReader.document(id) first just to see if the performance was
acceptable. Be sure you're looking at a truly representative data set though
<G>...

Hope this helps
Erick

On 11/21/06, Bhavin Pandya <bh...@rediff.co.in> wrote:
>
> Hi Erick,
>
> > If your asking for a list of all the unique values for a particular
> field,
> > see TermDocs and/or TermEnum which will allow you to look at, say, all
> the
> > values stored for some field. A trick here is to seek (new Term("field",
> > ""));. By putting nothing in the value, you effectively enumerate them
> > all,
> > something that I didn't find obvious
>
> I think your above solution is very near to what i am looking for ,
> But little bit different way...
> here is what i am planning to do...
>
> Suppose my index has four fields "product-title" , "product-desc" ,
> "category" and "FLAG"    ( Fieldname FLAG has value "true" for each n
> every
> doc in index ...just added for iteration purpose )
>
> At search time.. .
> query =  +(product-title:nokia) +(product-desc:nokia)
> Hits hits = searcher.search(query);
> I want to fetch unique "category" from above hits object...
>
> But i dont want to iterate through Hits object....
>
> Now As per your suggestions,  I can do something like this...
> TermEnum  enum = termDocs(new Term("FLAG","true")
> But it will return enumeration of all the document which is in index...But
> i
> want enumeration of all the document which is relevant to "nokia"...
> How to . . ?
>
> Thanks
> - Bhavin pandya
>
>
> ----- Original Message -----
> From: "Erick Erickson" <er...@gmail.com>
> To: <ja...@lucene.apache.org>; "Bhavin Pandya" <bh...@rediff.co.in>
> Sent: Tuesday, November 21, 2006 7:01 PM
> Subject: Re: is there any way to find unique records ?
>
>
> > I don't think I understand what "only unique records from a single
> field"
> > means.  If it's a unique value in a filed, there'll only be one document
> > in
> > the hits object and there's no cost to iterating, so I doubt that's what
> > you
> > mean.
> >
> > If your asking for a list of all the unique values for a particular
> field,
> > see TermDocs and/or TermEnum which will allow you to look at, say, all
> the
> > values stored for some field. A trick here is to seek (new Term("field",
> > ""));. By putting nothing in the value, you effectively enumerate them
> > all,
> > something that I didn't find obvious.
> >
> > If neither of these are close to the mark, perhaps you could provide
> more
> > detail.
> >
> > Best
> > Erick
> >
> > On 11/21/06, Bhavin Pandya <bh...@rediff.co.in> wrote:
> >>
> >> Hi,
> >> In lucene, is there any way to find only unique records from a single
> >> field ..?
> >>
> >> otherwise unnecessary i have to itereate through Hits and find out
> >> unique...
> >>
> >> plz help..
> >>
> >> - Bhavin pandya
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: is there any way to find unique records ?

Posted by Steven Rowe <sa...@syr.edu>.
Bhavin,

Mark Harwood gives a solution that looks almost exactly like what you want:

   http://www.mail-archive.com/java-user@lucene.apache.org/msg05154.html

Steve

Chris Hostetter wrote:
> serach the archives for "faceted searching" and "category counts" and you
> should find lots of discussions on this topic.
> 
> : Date: Tue, 21 Nov 2006 20:30:22 +0530
> : From: Bhavin Pandya <bh...@rediff.co.in>
> : Reply-To: java-user@lucene.apache.org, Bhavin Pandya <bh...@rediff.co.in>
> : To: java-user@lucene.apache.org
> : Subject: Re: is there any way to find unique records ?
> :
> : Hi Erick,
> :
> : > If your asking for a list of all the unique values for a particular field,
> : > see TermDocs and/or TermEnum which will allow you to look at, say, all the
> : > values stored for some field. A trick here is to seek (new Term("field",
> : > ""));. By putting nothing in the value, you effectively enumerate them
> : > all,
> : > something that I didn't find obvious
> :
> : I think your above solution is very near to what i am looking for ,
> : But little bit different way...
> : here is what i am planning to do...
> :
> : Suppose my index has four fields "product-title" , "product-desc" ,
> : "category" and "FLAG"    ( Fieldname FLAG has value "true" for each n every
> : doc in index ...just added for iteration purpose )
> :
> : At search time.. .
> : query =  +(product-title:nokia) +(product-desc:nokia)
> : Hits hits = searcher.search(query);
> : I want to fetch unique "category" from above hits object...
> :
> : But i dont want to iterate through Hits object....
> :
> : Now As per your suggestions,  I can do something like this...
> : TermEnum  enum = termDocs(new Term("FLAG","true")
> : But it will return enumeration of all the document which is in index...But i
> : want enumeration of all the document which is relevant to "nokia"...
> : How to . . ?
> :
> : Thanks
> : - Bhavin pandya
> :
> :
> : ----- Original Message -----
> : From: "Erick Erickson" <er...@gmail.com>
> : To: <ja...@lucene.apache.org>; "Bhavin Pandya" <bh...@rediff.co.in>
> : Sent: Tuesday, November 21, 2006 7:01 PM
> : Subject: Re: is there any way to find unique records ?
> :
> :
> : > I don't think I understand what "only unique records from a single field"
> : > means.  If it's a unique value in a filed, there'll only be one document
> : > in
> : > the hits object and there's no cost to iterating, so I doubt that's what
> : > you
> : > mean.
> : >
> : > If your asking for a list of all the unique values for a particular field,
> : > see TermDocs and/or TermEnum which will allow you to look at, say, all the
> : > values stored for some field. A trick here is to seek (new Term("field",
> : > ""));. By putting nothing in the value, you effectively enumerate them
> : > all,
> : > something that I didn't find obvious.
> : >
> : > If neither of these are close to the mark, perhaps you could provide more
> : > detail.
> : >
> : > Best
> : > Erick
> : >
> : > On 11/21/06, Bhavin Pandya <bh...@rediff.co.in> wrote:
> : >>
> : >> Hi,
> : >> In lucene, is there any way to find only unique records from a single
> : >> field ..?
> : >>
> : >> otherwise unnecessary i have to itereate through Hits and find out
> : >> unique...
> : >>
> : >> plz help..
> : >>
> : >> - Bhavin pandya


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: is there any way to find unique records ?

Posted by Chris Hostetter <ho...@fucit.org>.
serach the archives for "faceted searching" and "category counts" and you
should find lots of discussions on this topic.

: Date: Tue, 21 Nov 2006 20:30:22 +0530
: From: Bhavin Pandya <bh...@rediff.co.in>
: Reply-To: java-user@lucene.apache.org, Bhavin Pandya <bh...@rediff.co.in>
: To: java-user@lucene.apache.org
: Subject: Re: is there any way to find unique records ?
:
: Hi Erick,
:
: > If your asking for a list of all the unique values for a particular field,
: > see TermDocs and/or TermEnum which will allow you to look at, say, all the
: > values stored for some field. A trick here is to seek (new Term("field",
: > ""));. By putting nothing in the value, you effectively enumerate them
: > all,
: > something that I didn't find obvious
:
: I think your above solution is very near to what i am looking for ,
: But little bit different way...
: here is what i am planning to do...
:
: Suppose my index has four fields "product-title" , "product-desc" ,
: "category" and "FLAG"    ( Fieldname FLAG has value "true" for each n every
: doc in index ...just added for iteration purpose )
:
: At search time.. .
: query =  +(product-title:nokia) +(product-desc:nokia)
: Hits hits = searcher.search(query);
: I want to fetch unique "category" from above hits object...
:
: But i dont want to iterate through Hits object....
:
: Now As per your suggestions,  I can do something like this...
: TermEnum  enum = termDocs(new Term("FLAG","true")
: But it will return enumeration of all the document which is in index...But i
: want enumeration of all the document which is relevant to "nokia"...
: How to . . ?
:
: Thanks
: - Bhavin pandya
:
:
: ----- Original Message -----
: From: "Erick Erickson" <er...@gmail.com>
: To: <ja...@lucene.apache.org>; "Bhavin Pandya" <bh...@rediff.co.in>
: Sent: Tuesday, November 21, 2006 7:01 PM
: Subject: Re: is there any way to find unique records ?
:
:
: > I don't think I understand what "only unique records from a single field"
: > means.  If it's a unique value in a filed, there'll only be one document
: > in
: > the hits object and there's no cost to iterating, so I doubt that's what
: > you
: > mean.
: >
: > If your asking for a list of all the unique values for a particular field,
: > see TermDocs and/or TermEnum which will allow you to look at, say, all the
: > values stored for some field. A trick here is to seek (new Term("field",
: > ""));. By putting nothing in the value, you effectively enumerate them
: > all,
: > something that I didn't find obvious.
: >
: > If neither of these are close to the mark, perhaps you could provide more
: > detail.
: >
: > Best
: > Erick
: >
: > On 11/21/06, Bhavin Pandya <bh...@rediff.co.in> wrote:
: >>
: >> Hi,
: >> In lucene, is there any way to find only unique records from a single
: >> field ..?
: >>
: >> otherwise unnecessary i have to itereate through Hits and find out
: >> unique...
: >>
: >> plz help..
: >>
: >> - Bhavin pandya
: >>
: >
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: is there any way to find unique records ?

Posted by Bhavin Pandya <bh...@rediff.co.in>.
Hi Erick,

> If your asking for a list of all the unique values for a particular field,
> see TermDocs and/or TermEnum which will allow you to look at, say, all the
> values stored for some field. A trick here is to seek (new Term("field",
> ""));. By putting nothing in the value, you effectively enumerate them 
> all,
> something that I didn't find obvious

I think your above solution is very near to what i am looking for ,
But little bit different way...
here is what i am planning to do...

Suppose my index has four fields "product-title" , "product-desc" , 
"category" and "FLAG"    ( Fieldname FLAG has value "true" for each n every 
doc in index ...just added for iteration purpose )

At search time.. .
query =  +(product-title:nokia) +(product-desc:nokia)
Hits hits = searcher.search(query);
I want to fetch unique "category" from above hits object...

But i dont want to iterate through Hits object....

Now As per your suggestions,  I can do something like this...
TermEnum  enum = termDocs(new Term("FLAG","true")
But it will return enumeration of all the document which is in index...But i 
want enumeration of all the document which is relevant to "nokia"...
How to . . ?

Thanks
- Bhavin pandya


----- Original Message ----- 
From: "Erick Erickson" <er...@gmail.com>
To: <ja...@lucene.apache.org>; "Bhavin Pandya" <bh...@rediff.co.in>
Sent: Tuesday, November 21, 2006 7:01 PM
Subject: Re: is there any way to find unique records ?


> I don't think I understand what "only unique records from a single field"
> means.  If it's a unique value in a filed, there'll only be one document 
> in
> the hits object and there's no cost to iterating, so I doubt that's what 
> you
> mean.
>
> If your asking for a list of all the unique values for a particular field,
> see TermDocs and/or TermEnum which will allow you to look at, say, all the
> values stored for some field. A trick here is to seek (new Term("field",
> ""));. By putting nothing in the value, you effectively enumerate them 
> all,
> something that I didn't find obvious.
>
> If neither of these are close to the mark, perhaps you could provide more
> detail.
>
> Best
> Erick
>
> On 11/21/06, Bhavin Pandya <bh...@rediff.co.in> wrote:
>>
>> Hi,
>> In lucene, is there any way to find only unique records from a single
>> field ..?
>>
>> otherwise unnecessary i have to itereate through Hits and find out
>> unique...
>>
>> plz help..
>>
>> - Bhavin pandya
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: is there any way to find unique records ?

Posted by Erick Erickson <er...@gmail.com>.
 I don't think I understand what "only unique records from a single field"
means.  If it's a unique value in a filed, there'll only be one document in
the hits object and there's no cost to iterating, so I doubt that's what you
mean.

If your asking for a list of all the unique values for a particular field,
see TermDocs and/or TermEnum which will allow you to look at, say, all the
values stored for some field. A trick here is to seek (new Term("field",
""));. By putting nothing in the value, you effectively enumerate them all,
something that I didn't find obvious.

If neither of these are close to the mark, perhaps you could provide more
detail.

Best
Erick

On 11/21/06, Bhavin Pandya <bh...@rediff.co.in> wrote:
>
> Hi,
> In lucene, is there any way to find only unique records from a single
> field ..?
>
> otherwise unnecessary i have to itereate through Hits and find out
> unique...
>
> plz help..
>
> - Bhavin pandya
>

Re: is there any way to find unique records ?

Posted by Steven Rowe <sa...@syr.edu>.
Hi Bhavin,

Bhavin Pandya wrote:
> In lucene, is there any way to find only unique records from a single
> field ..?
>
> otherwise unnecessary i have to itereate through Hits and find out
> unique...

Can you clarify what you mean by "unique"?

Do you want to return only documents that have a field value that is not
contained in other documents, that is, to exclude documents which share
the same particular field value with at least one other document?

Or, do you want to return just one document from each set of documents
that share the same field value (given your database-ish "record"
terminology, where standard Lucene usage is "document", I'm guessing
that "SELECT UNIQUE" is what you want)?  And if so, do you care which
one from the set is returned?  That is, do you have any other selection
criteria?

Steve



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org