You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Terry Steichen <te...@net-frame.com> on 2002/10/26 00:08:59 UTC

Bitset Filters

Peter,

Could you give, or point to, a couple of examples on how to use bitset
filters in the way you describe below?

Regards,

Terry

----- Original Message -----
From: "Peter Carlson" <ca...@bookandhammer.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Tuesday, October 22, 2002 11:26 PM
Subject: Re: Need Help URGENT


> I think the answer is yes.
>
> When creating a Lucene Document you can create a field which is the URL
> field. If you are not searching for words within the field, I would
> probably make it a keyword field type so you don't tokenize it into
> multiple Terms.
>
> Then you can great a multi-field search.
>
>
> url:www.apache.org AND lucene
>
> Where url is the field where the URL exists and the term you want to
> search for in your default field is Lucene.
>
> To answer what I think your second question is I will restate the
> question.
>
> Can Lucene support subsearching.
> Well yes and no. I will answer how to accomplish this, there is also
> some information in the FAQ about this.
>
> You can just add criteria to the search so
>
> url:www.apache.org AND lucene AND indexing
>
> This will return the subset of information.
>
> If you are going to do the same search over and over again, you may
> also want to look at filters, which basically keep a bitset of a Lucene
> search results so you don't actually have to do the search again, just
> an intersection of two bitsets.
>
> When you get the Hits back you can get the information from what ever
> field you want including the URL field that you will create.
>
> I hope this helps and is on the mark. If not, the answer in can you use
> Lucene to accomplish the task the answer is typically yes (The
> questions then become just how much work has to be done on top of
> Lucene, or is Lucene the right tool).
>
> --Peter
>
>
>
> On Tuesday, October 22, 2002, at 04:32 PM, nandkumar rayanker wrote:
>
> > Hi,
> >
> > Forther to the request already made in my previous
> > mail I would like to know:
> >
> > - Whether I can use lucene to search the remote site
> > or not?
> >
> > Here is what I wnt to do.
> > -Install Licene and search and create search info for
> > a given URL.
> >
> > -Search the info from search info already created .
> >
> > Can do this sort of things using Lucene or not?
> >
> > thanks and regards
> > Nandkumar
> >
> > --- nandkumar rayanker <nr...@sbcglobal.net>
> > wrote:
> >> Hi,
> >>
> >> I need to develop search java stand alone
> >> application,
> >> which takes "SearchString" and "URL/URLS"
> >>
> >> "SearchString": string to be searched in web
> >>
> >> URL/URLS" : List of URLs where string needs to
> >> searched.
> >> return: List of URL/URLS where "SearchString" is
> >> found.
> >>
> >> thanks & regards
> >> Nandkumar
> >>
> >> --
> >> To unsubscribe, e-mail:
> >> <ma...@jakarta.apache.org>
> >> For additional commands, e-mail:
> >> <ma...@jakarta.apache.org>
> >>
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> >
> >
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Bitset Filters

Posted by Che Dong <ch...@hotmail.com>.
I wrote a StringFilter for exactly match and prefix match field filter
http://www.chedong.com/tech/lucene_ext.tar.gz

Che, Dong
----- Original Message ----- 
From: "Terry Steichen" <te...@net-frame.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Saturday, October 26, 2002 6:08 AM
Subject: Bitset Filters


> Peter,
> 
> Could you give, or point to, a couple of examples on how to use bitset
> filters in the way you describe below?
> 
> Regards,
> 
> Terry
> 
> ----- Original Message -----
> From: "Peter Carlson" <ca...@bookandhammer.com>
> To: "Lucene Users List" <lu...@jakarta.apache.org>
> Sent: Tuesday, October 22, 2002 11:26 PM
> Subject: Re: Need Help URGENT
> 
> 
> > I think the answer is yes.
> >
> > When creating a Lucene Document you can create a field which is the URL
> > field. If you are not searching for words within the field, I would
> > probably make it a keyword field type so you don't tokenize it into
> > multiple Terms.
> >
> > Then you can great a multi-field search.
> >
> >
> > url:www.apache.org AND lucene
> >
> > Where url is the field where the URL exists and the term you want to
> > search for in your default field is Lucene.
> >
> > To answer what I think your second question is I will restate the
> > question.
> >
> > Can Lucene support subsearching.
> > Well yes and no. I will answer how to accomplish this, there is also
> > some information in the FAQ about this.
> >
> > You can just add criteria to the search so
> >
> > url:www.apache.org AND lucene AND indexing
> >
> > This will return the subset of information.
> >
> > If you are going to do the same search over and over again, you may
> > also want to look at filters, which basically keep a bitset of a Lucene
> > search results so you don't actually have to do the search again, just
> > an intersection of two bitsets.
> >
> > When you get the Hits back you can get the information from what ever
> > field you want including the URL field that you will create.
> >
> > I hope this helps and is on the mark. If not, the answer in can you use
> > Lucene to accomplish the task the answer is typically yes (The
> > questions then become just how much work has to be done on top of
> > Lucene, or is Lucene the right tool).
> >
> > --Peter
> >
> >
> >
> > On Tuesday, October 22, 2002, at 04:32 PM, nandkumar rayanker wrote:
> >
> > > Hi,
> > >
> > > Forther to the request already made in my previous
> > > mail I would like to know:
> > >
> > > - Whether I can use lucene to search the remote site
> > > or not?
> > >
> > > Here is what I wnt to do.
> > > -Install Licene and search and create search info for
> > > a given URL.
> > >
> > > -Search the info from search info already created .
> > >
> > > Can do this sort of things using Lucene or not?
> > >
> > > thanks and regards
> > > Nandkumar
> > >
> > > --- nandkumar rayanker <nr...@sbcglobal.net>
> > > wrote:
> > >> Hi,
> > >>
> > >> I need to develop search java stand alone
> > >> application,
> > >> which takes "SearchString" and "URL/URLS"
> > >>
> > >> "SearchString": string to be searched in web
> > >>
> > >> URL/URLS" : List of URLs where string needs to
> > >> searched.
> > >> return: List of URL/URLS where "SearchString" is
> > >> found.
> > >>
> > >> thanks & regards
> > >> Nandkumar
> > >>
> > >> --
> > >> To unsubscribe, e-mail:
> > >> <ma...@jakarta.apache.org>
> > >> For additional commands, e-mail:
> > >> <ma...@jakarta.apache.org>
> > >>
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <ma...@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <ma...@jakarta.apache.org>
> > >
> > >
> >
> >
> > --
> > To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> >
> >
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 

Re: Bitset Filters

Posted by Kelvin Tan <ke...@relevanz.com>.
Terry,

What you typically want to do is something along the lines of 

BitSet bits = new BitSet(reader.maxDoc());
Term t = new Term(field, fieldValue);
TermDocs termDocs = reader.termDocs(t);
try
{
    while (termDocs.next())
    {
        int docNumber = termDocs.doc();
        bits.set(docNumber);
    }
}
finally
{
    if (termDocs != null) termDocs.close();
}

this searches for all documents containing the term t, then allowing these 
documents to be returned (note: everything else is disallowed by default).

Regards,
Kelvin


On Mon, 28 Oct 2002 08:35:28 -0500, Terry Steichen wrote:
>The Javadocs don't say much.  But thanks anyway.
>
>Terry
>
>----- Original Message -----
>From: "Peter Carlson" <ca...@bookandhammer.com>
>To: "Lucene Users List" <lu...@jakarta.apache.org>
>Sent: Monday, October 28, 2002 12:22 AM
>Subject: Re: Bitset Filters
>
>
>>Check out the java docs on the Filter class.
>>
>>http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/
>>Filter.html
>>
>>--Peter
>>
>>On Friday, October 25, 2002, at 03:08 PM, Terry Steichen wrote:
>>
>>>Peter,
>>>
>>>Could you give, or point to, a couple of examples on how to use
>>>bitset
>>>filters in the way you describe below?
>>>
>>>Regards,
>>>
>>>Terry
>>>
>>>----- Original Message -----
>>>From: "Peter Carlson" <ca...@bookandhammer.com>
>>>To: "Lucene Users List" <lu...@jakarta.apache.org>
>>>Sent: Tuesday, October 22, 2002 11:26 PM
>>>Subject: Re: Need Help URGENT
>>>
>>>
>>>>I think the answer is yes.
>>>>
>>>>When creating a Lucene Document you can create a field which is
>>>>the
>>>>URL
>>>>field. If you are not searching for words within the field, I
>>>>would
>>>>probably make it a keyword field type so you don't tokenize it
>>>>into
>>>>multiple Terms.
>>>>
>>>>Then you can great a multi-field search.
>>>>
>>>>
>>>>url:www.apache.org AND lucene
>>>>
>>>>Where url is the field where the URL exists and the term you want
>>>>to
>>>>search for in your default field is Lucene.
>>>>
>>>>To answer what I think your second question is I will restate
>>>>the
>>>>question.
>>>>
>>>>Can Lucene support subsearching.
>>>>Well yes and no. I will answer how to accomplish this, there is
>>>>also
>>>>some information in the FAQ about this.
>>>>
>>>>You can just add criteria to the search so
>>>>
>>>>url:www.apache.org AND lucene AND indexing
>>>>
>>>>This will return the subset of information.
>>>>
>>>>If you are going to do the same search over and over again, you
>>>>may
>>>>also want to look at filters, which basically keep a bitset of a
>>>>Lucene
>>>>search results so you don't actually have to do the search again,
>>>>just
>>>>an intersection of two bitsets.
>>>>
>>>>When you get the Hits back you can get the information from what
>>>>ever
>>>>field you want including the URL field that you will create.
>>>>
>>>>I hope this helps and is on the mark. If not, the answer in can
>>>>you
>>>>use
>>>>Lucene to accomplish the task the answer is typically yes (The
>>>>questions then become just how much work has to be done on top
>>>>of
>>>>Lucene, or is Lucene the right tool).
>>>>
>>>>--Peter
>>>>
>>>>
>>>>
>>>>On Tuesday, October 22, 2002, at 04:32 PM, nandkumar rayanker
>>>>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Forther to the request already made in my previous
>>>>> mail I would like to know:
>>>>>
>>>>> - Whether I can use lucene to search the remote site
>>>>> or not?
>>>>>
>>>>> Here is what I wnt to do.
>>>>> -Install Licene and search and create search info for
>>>>> a given URL.
>>>>>
>>>>> -Search the info from search info already created .
>>>>>
>>>>> Can do this sort of things using Lucene or not?
>>>>>
>>>>> thanks and regards
>>>>> Nandkumar
>>>>>
>>>>> --- nandkumar rayanker <nr...@sbcglobal.net>
>>>>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I need to develop search java stand alone
>>>>>> application,
>>>>>> which takes "SearchString" and "URL/URLS"
>>>>>>
>>>>>> "SearchString": string to be searched in web
>>>>>>
>>>>>> URL/URLS" : List of URLs where string needs to
>>>>>> searched.
>>>>>> return: List of URL/URLS where "SearchString" is
>>>>>> found.
>>>>>>
>>>>>> thanks & regards
>>>>>> Nandkumar
>>>>>>
>>>>>> --
>>>>>> To unsubscribe, e-mail:
>>>>>> <ma...@jakarta.apache.org>
>>>>>> For additional commands, e-mail:
>>>>>> <ma...@jakarta.apache.org>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe, e-mail:
>>>>> <ma...@jakarta.apache.org>
>>>>> For additional commands, e-mail:
>>>>> <ma...@jakarta.apache.org>
>>>>>
>>>>>
>>>>
>>>>
>>>>--
>>>>To unsubscribe, e-mail:
>>><ma...@jakarta.apache.org>
>>>>For additional commands, e-mail:
>>><ma...@jakarta.apache.org>
>>>>
>>>>
>>>
>>>
>>>--
>>>To unsubscribe, e-mail:
>>><ma...@jakarta.apache.org>
>>>For additional commands, e-mail:
>>><ma...@jakarta.apache.org>
>>>
>>>
>>
>>
>>--
>>To unsubscribe, e-mail:
><ma...@jakarta.apache.org>
>>For additional commands, e-mail:
><ma...@jakarta.apache.org>
>>
>
>
>--
>To unsubscribe, e-mail:   <mailto:lucene-user-
>unsubscribe@jakarta.apache.org>
>For additional commands, e-mail: <mailto:lucene-user-
>help@jakarta.apache.org>





--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Bitset Filters

Posted by Terry Steichen <te...@net-frame.com>.
The Javadocs don't say much.  But thanks anyway.

Terry

----- Original Message -----
From: "Peter Carlson" <ca...@bookandhammer.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Monday, October 28, 2002 12:22 AM
Subject: Re: Bitset Filters


> Check out the java docs on the Filter class.
>
> http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/
> Filter.html
>
> --Peter
>
> On Friday, October 25, 2002, at 03:08 PM, Terry Steichen wrote:
>
> > Peter,
> >
> > Could you give, or point to, a couple of examples on how to use bitset
> > filters in the way you describe below?
> >
> > Regards,
> >
> > Terry
> >
> > ----- Original Message -----
> > From: "Peter Carlson" <ca...@bookandhammer.com>
> > To: "Lucene Users List" <lu...@jakarta.apache.org>
> > Sent: Tuesday, October 22, 2002 11:26 PM
> > Subject: Re: Need Help URGENT
> >
> >
> >> I think the answer is yes.
> >>
> >> When creating a Lucene Document you can create a field which is the
> >> URL
> >> field. If you are not searching for words within the field, I would
> >> probably make it a keyword field type so you don't tokenize it into
> >> multiple Terms.
> >>
> >> Then you can great a multi-field search.
> >>
> >>
> >> url:www.apache.org AND lucene
> >>
> >> Where url is the field where the URL exists and the term you want to
> >> search for in your default field is Lucene.
> >>
> >> To answer what I think your second question is I will restate the
> >> question.
> >>
> >> Can Lucene support subsearching.
> >> Well yes and no. I will answer how to accomplish this, there is also
> >> some information in the FAQ about this.
> >>
> >> You can just add criteria to the search so
> >>
> >> url:www.apache.org AND lucene AND indexing
> >>
> >> This will return the subset of information.
> >>
> >> If you are going to do the same search over and over again, you may
> >> also want to look at filters, which basically keep a bitset of a
> >> Lucene
> >> search results so you don't actually have to do the search again, just
> >> an intersection of two bitsets.
> >>
> >> When you get the Hits back you can get the information from what ever
> >> field you want including the URL field that you will create.
> >>
> >> I hope this helps and is on the mark. If not, the answer in can you
> >> use
> >> Lucene to accomplish the task the answer is typically yes (The
> >> questions then become just how much work has to be done on top of
> >> Lucene, or is Lucene the right tool).
> >>
> >> --Peter
> >>
> >>
> >>
> >> On Tuesday, October 22, 2002, at 04:32 PM, nandkumar rayanker wrote:
> >>
> >>> Hi,
> >>>
> >>> Forther to the request already made in my previous
> >>> mail I would like to know:
> >>>
> >>> - Whether I can use lucene to search the remote site
> >>> or not?
> >>>
> >>> Here is what I wnt to do.
> >>> -Install Licene and search and create search info for
> >>> a given URL.
> >>>
> >>> -Search the info from search info already created .
> >>>
> >>> Can do this sort of things using Lucene or not?
> >>>
> >>> thanks and regards
> >>> Nandkumar
> >>>
> >>> --- nandkumar rayanker <nr...@sbcglobal.net>
> >>> wrote:
> >>>> Hi,
> >>>>
> >>>> I need to develop search java stand alone
> >>>> application,
> >>>> which takes "SearchString" and "URL/URLS"
> >>>>
> >>>> "SearchString": string to be searched in web
> >>>>
> >>>> URL/URLS" : List of URLs where string needs to
> >>>> searched.
> >>>> return: List of URL/URLS where "SearchString" is
> >>>> found.
> >>>>
> >>>> thanks & regards
> >>>> Nandkumar
> >>>>
> >>>> --
> >>>> To unsubscribe, e-mail:
> >>>> <ma...@jakarta.apache.org>
> >>>> For additional commands, e-mail:
> >>>> <ma...@jakarta.apache.org>
> >>>>
> >>>
> >>>
> >>> --
> >>> To unsubscribe, e-mail:
> >>> <ma...@jakarta.apache.org>
> >>> For additional commands, e-mail:
> >>> <ma...@jakarta.apache.org>
> >>>
> >>>
> >>
> >>
> >> --
> >> To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org>
> >> For additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> >>
> >>
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> >
> >
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Bitset Filters

Posted by Peter Carlson <ca...@bookandhammer.com>.
Check out the java docs on the Filter class.

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/ 
Filter.html

--Peter

On Friday, October 25, 2002, at 03:08 PM, Terry Steichen wrote:

> Peter,
>
> Could you give, or point to, a couple of examples on how to use bitset
> filters in the way you describe below?
>
> Regards,
>
> Terry
>
> ----- Original Message -----
> From: "Peter Carlson" <ca...@bookandhammer.com>
> To: "Lucene Users List" <lu...@jakarta.apache.org>
> Sent: Tuesday, October 22, 2002 11:26 PM
> Subject: Re: Need Help URGENT
>
>
>> I think the answer is yes.
>>
>> When creating a Lucene Document you can create a field which is the  
>> URL
>> field. If you are not searching for words within the field, I would
>> probably make it a keyword field type so you don't tokenize it into
>> multiple Terms.
>>
>> Then you can great a multi-field search.
>>
>>
>> url:www.apache.org AND lucene
>>
>> Where url is the field where the URL exists and the term you want to
>> search for in your default field is Lucene.
>>
>> To answer what I think your second question is I will restate the
>> question.
>>
>> Can Lucene support subsearching.
>> Well yes and no. I will answer how to accomplish this, there is also
>> some information in the FAQ about this.
>>
>> You can just add criteria to the search so
>>
>> url:www.apache.org AND lucene AND indexing
>>
>> This will return the subset of information.
>>
>> If you are going to do the same search over and over again, you may
>> also want to look at filters, which basically keep a bitset of a  
>> Lucene
>> search results so you don't actually have to do the search again, just
>> an intersection of two bitsets.
>>
>> When you get the Hits back you can get the information from what ever
>> field you want including the URL field that you will create.
>>
>> I hope this helps and is on the mark. If not, the answer in can you  
>> use
>> Lucene to accomplish the task the answer is typically yes (The
>> questions then become just how much work has to be done on top of
>> Lucene, or is Lucene the right tool).
>>
>> --Peter
>>
>>
>>
>> On Tuesday, October 22, 2002, at 04:32 PM, nandkumar rayanker wrote:
>>
>>> Hi,
>>>
>>> Forther to the request already made in my previous
>>> mail I would like to know:
>>>
>>> - Whether I can use lucene to search the remote site
>>> or not?
>>>
>>> Here is what I wnt to do.
>>> -Install Licene and search and create search info for
>>> a given URL.
>>>
>>> -Search the info from search info already created .
>>>
>>> Can do this sort of things using Lucene or not?
>>>
>>> thanks and regards
>>> Nandkumar
>>>
>>> --- nandkumar rayanker <nr...@sbcglobal.net>
>>> wrote:
>>>> Hi,
>>>>
>>>> I need to develop search java stand alone
>>>> application,
>>>> which takes "SearchString" and "URL/URLS"
>>>>
>>>> "SearchString": string to be searched in web
>>>>
>>>> URL/URLS" : List of URLs where string needs to
>>>> searched.
>>>> return: List of URL/URLS where "SearchString" is
>>>> found.
>>>>
>>>> thanks & regards
>>>> Nandkumar
>>>>
>>>> --
>>>> To unsubscribe, e-mail:
>>>> <ma...@jakarta.apache.org>
>>>> For additional commands, e-mail:
>>>> <ma...@jakarta.apache.org>
>>>>
>>>
>>>
>>> --
>>> To unsubscribe, e-mail:
>>> <ma...@jakarta.apache.org>
>>> For additional commands, e-mail:
>>> <ma...@jakarta.apache.org>
>>>
>>>
>>
>>
>> --
>> To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
>> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
>>
>>
>
>
> --
> To unsubscribe, e-mail:    
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:  
> <ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>