You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Varun Gupta <va...@gmail.com> on 2010/10/26 15:07:38 UTC

How do I this in Solr?

Hi,

I have lot of small documents (each containing 1 to 15 words) indexed in
Solr. For the search query, I want the search results to contain only those
documents that satisfy this criteria "All of the words of the search result
document are present in the search query"

For example:
If I have the following documents indexed: "nokia n95", "GPS", "android",
"samsung", "samsung andriod", "nokia andriod", "mobile with GPS"

If I search with the text "samsung andriod GPS", search results should only
conain "samsung", "GPS", "andriod" and "samsung andriod".

Is there a way to do this in Solr.

--
Thanks
Varun Gupta

Re: How do I this in Solr?

Posted by Varun Gupta <va...@gmail.com>.
Toke, the search query will contain 4-5 words on an average (excluding the
stopwords).

Mike, I don't care about the result count. Excluding the terms at the client
side may be a good idea. Is there any way to alter scoring such that the
docs containing only the searched-for terms are shown first? Can I use term
frequency to do such kind of thing?

--
Thanks
Varun Gupta

On Wed, Oct 27, 2010 at 7:13 PM, Mike Sokolov <so...@ifactory.com> wrote:

> Yes I missed that requirement (as Steven also pointed out in a private
> e-mail).  I now agree that the combinatorics are required.
>
> Another possibility to consider (if the queries are large, which actually
> seems unlikely) is to use the default behavior where all terms are optional,
> sort by relevance, and truncate the result list on the client side after
> some unwanted term is found.  I *think* the scoring should find only docs
> with the searched-for terms first, although if there are a lot of repeated
> terms maybe not? Also result counts will be screwy.
>
> -Mike
>
>
> On 10/27/2010 09:34 AM, Toke Eskildsen wrote:
>
>> That does not work either as it requires that all the terms in the query
>> are present in the document. The original poster did not state this
>> requirement. On the contrary, his examples were mostly single-word
>> matches, implying an OR-search at the core.
>>
>> The query-explosion still seems like the only working idea. Maybe Varun
>> could comment on the maximum numbers of terms that his queries will
>> contain?
>>
>> Regards,
>> Toke Eskildsen
>>
>> On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote:
>>
>>
>>> Right - my point was to combine this with the previous approaches to
>>> form a query like:
>>>
>>> samsung AND android AND GPS AND word_count:3
>>>
>>> in order to exclude documents containing additional words. This would
>>> avoid the combinatoric explosion problem otehrs had alluded to earlier.
>>> Of course this would fail because android is "mis-" spelled :)
>>>
>>> -Mike
>>>
>>> On 10/27/2010 08:45 AM, Steven A Rowe wrote:
>>>
>>>
>>>> I'm pretty sure the word-count strategy won't work.
>>>>
>>>>
>>>>
>>>>
>>>>> If I search with the text "samsung andriod GPS", search results
>>>>> should only conain "samsung", "GPS", "andriod" and "samsung andriod".
>>>>>
>>>>>
>>>>>
>>>> Using the word-count strategy, a document containing "samsung andriod
>>>> PDQ" would be a hit, but Varun doesn't want it, because it contains a word
>>>> that is not in the query.
>>>>
>>>> Steve
>>>>
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Michael Sokolov [mailto:sokolov@ifactory.com]
>>>>> Sent: Wednesday, October 27, 2010 7:44 AM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: RE: How do I this in Solr?
>>>>>
>>>>> You might try adding a field containing the word count and making sure
>>>>> that
>>>>> matches the query's word count?
>>>>>
>>>>> This would require you to tokenize the query and document yourself,
>>>>> perhaps.
>>>>>
>>>>> -Mike
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Varun Gupta [mailto:varun.vgupta@gmail.com]
>>>>>> Sent: Tuesday, October 26, 2010 11:26 PM
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Subject: Re: How do I this in Solr?
>>>>>>
>>>>>> Thanks everybody for the inputs.
>>>>>>
>>>>>> Looks like Steven's solution is the closest one but will lead
>>>>>> to performance issues when the query string has many terms.
>>>>>>
>>>>>> I will try to implement the two filters suggested by Steven
>>>>>> and see how the performance matches up.
>>>>>>
>>>>>> --
>>>>>> Thanks
>>>>>> Varun Gupta
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???)
>>>>>> <sc...@udngroup.com>wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I think you have to write a "yet exact match" handler
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> yourself (I mean
>>>>>>
>>>>>>
>>>>>>
>>>>>>> yet cause it's not quite exact match we normally know).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Steve's answer
>>>>>>
>>>>>>
>>>>>>
>>>>>>> is quite near your request. You can do further work based
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> on his solution.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> At the last step, I'll suggest you eat up all blank within query
>>>>>>> string and query result, respevtively&   only returns those results
>>>>>>> that has equal string length as the query string's.
>>>>>>>
>>>>>>> For example, giving:
>>>>>>> *query string = "Samsung with GPS"
>>>>>>> *query results:
>>>>>>> resutl 1 = "Samsung has lots of mobile with GPS"
>>>>>>> result 2 = "with GPS Samsng"
>>>>>>> result 3 = "GPS mobile with vendors, such as Sony, Samsung"
>>>>>>>
>>>>>>> they become:
>>>>>>> *query result = "SamsungwithGPS" (length =14) *query results:
>>>>>>> resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29) result 2 =
>>>>>>> "withGPSSamsng" (length =14) result 3 =
>>>>>>> "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
>>>>>>>
>>>>>>> so result 2 matches your request.
>>>>>>>
>>>>>>> In this way, you can avoid case-sensitive,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> word-order-rearrange load
>>>>>>
>>>>>>
>>>>>>
>>>>>>> of works. Furthermore, you can do refined work, such as
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> remove white
>>>>>>
>>>>>>
>>>>>>
>>>>>>> characters, etc.
>>>>>>>
>>>>>>> Scott @ Taiwan
>>>>>>>
>>>>>>>
>>>>>>> ----- Original Message ----- From: "Varun Gupta"
>>>>>>> <va...@gmail.com>
>>>>>>>
>>>>>>> To:<so...@lucene.apache.org>
>>>>>>> Sent: Tuesday, October 26, 2010 9:07 PM
>>>>>>>
>>>>>>> Subject: How do I this in Solr?
>>>>>>>
>>>>>>>
>>>>>>>   Hi,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I have lot of small documents (each containing 1 to 15
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> words) indexed
>>>>>>
>>>>>>
>>>>>>
>>>>>>> in Solr. For the search query, I want the search results
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> to contain
>>>>>>
>>>>>>
>>>>>>
>>>>>>> only those documents that satisfy this criteria "All of
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> the words of
>>>>>>
>>>>>>
>>>>>>
>>>>>>> the search result document are present in the search query"
>>>>>>>>
>>>>>>>> For example:
>>>>>>>> If I have the following documents indexed: "nokia n95", "GPS",
>>>>>>>> "android", "samsung", "samsung andriod", "nokia andriod",
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> "mobile with GPS"
>>>>>>
>>>>>>
>>>>>>
>>>>>>> If I search with the text "samsung andriod GPS", search results
>>>>>>>> should only conain "samsung", "GPS", "andriod" and
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> "samsung andriod".
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Is there a way to do this in Solr.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks
>>>>>>>> Varun Gupta
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ----------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>> ----------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> %<&b6G$J0T.'$$'d(l/f,r!C
>>>>>>> Checked by AVG - www.avg.com
>>>>>>> Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date:
>>>>>>> 10/26/10 14:34:00
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: How do I this in Solr?

Posted by Mike Sokolov <so...@ifactory.com>.
Yes I missed that requirement (as Steven also pointed out in a private 
e-mail).  I now agree that the combinatorics are required.

Another possibility to consider (if the queries are large, which 
actually seems unlikely) is to use the default behavior where all terms 
are optional, sort by relevance, and truncate the result list on the 
client side after some unwanted term is found.  I *think* the scoring 
should find only docs with the searched-for terms first, although if 
there are a lot of repeated terms maybe not? Also result counts will be 
screwy.

-Mike

On 10/27/2010 09:34 AM, Toke Eskildsen wrote:
> That does not work either as it requires that all the terms in the query
> are present in the document. The original poster did not state this
> requirement. On the contrary, his examples were mostly single-word
> matches, implying an OR-search at the core.
>
> The query-explosion still seems like the only working idea. Maybe Varun
> could comment on the maximum numbers of terms that his queries will
> contain?
>
> Regards,
> Toke Eskildsen
>
> On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote:
>    
>> Right - my point was to combine this with the previous approaches to
>> form a query like:
>>
>> samsung AND android AND GPS AND word_count:3
>>
>> in order to exclude documents containing additional words. This would
>> avoid the combinatoric explosion problem otehrs had alluded to earlier.
>> Of course this would fail because android is "mis-" spelled :)
>>
>> -Mike
>>
>> On 10/27/2010 08:45 AM, Steven A Rowe wrote:
>>      
>>> I'm pretty sure the word-count strategy won't work.
>>>
>>>
>>>        
>>>> If I search with the text "samsung andriod GPS", search results
>>>> should only conain "samsung", "GPS", "andriod" and "samsung andriod".
>>>>
>>>>          
>>> Using the word-count strategy, a document containing "samsung andriod PDQ" would be a hit, but Varun doesn't want it, because it contains a word that is not in the query.
>>>
>>> Steve
>>>
>>>
>>>        
>>>> -----Original Message-----
>>>> From: Michael Sokolov [mailto:sokolov@ifactory.com]
>>>> Sent: Wednesday, October 27, 2010 7:44 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: RE: How do I this in Solr?
>>>>
>>>> You might try adding a field containing the word count and making sure
>>>> that
>>>> matches the query's word count?
>>>>
>>>> This would require you to tokenize the query and document yourself,
>>>> perhaps.
>>>>
>>>> -Mike
>>>>
>>>>
>>>>          
>>>>> -----Original Message-----
>>>>> From: Varun Gupta [mailto:varun.vgupta@gmail.com]
>>>>> Sent: Tuesday, October 26, 2010 11:26 PM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Re: How do I this in Solr?
>>>>>
>>>>> Thanks everybody for the inputs.
>>>>>
>>>>> Looks like Steven's solution is the closest one but will lead
>>>>> to performance issues when the query string has many terms.
>>>>>
>>>>> I will try to implement the two filters suggested by Steven
>>>>> and see how the performance matches up.
>>>>>
>>>>> --
>>>>> Thanks
>>>>> Varun Gupta
>>>>>
>>>>>
>>>>> On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???)
>>>>> <sc...@udngroup.com>wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> I think you have to write a "yet exact match" handler
>>>>>>
>>>>>>              
>>>>> yourself (I mean
>>>>>
>>>>>            
>>>>>> yet cause it's not quite exact match we normally know).
>>>>>>
>>>>>>              
>>>>> Steve's answer
>>>>>
>>>>>            
>>>>>> is quite near your request. You can do further work based
>>>>>>
>>>>>>              
>>>>> on his solution.
>>>>>
>>>>>            
>>>>>> At the last step, I'll suggest you eat up all blank within query
>>>>>> string and query result, respevtively&   only returns those results
>>>>>> that has equal string length as the query string's.
>>>>>>
>>>>>> For example, giving:
>>>>>> *query string = "Samsung with GPS"
>>>>>> *query results:
>>>>>> resutl 1 = "Samsung has lots of mobile with GPS"
>>>>>> result 2 = "with GPS Samsng"
>>>>>> result 3 = "GPS mobile with vendors, such as Sony, Samsung"
>>>>>>
>>>>>> they become:
>>>>>> *query result = "SamsungwithGPS" (length =14) *query results:
>>>>>> resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29) result 2 =
>>>>>> "withGPSSamsng" (length =14) result 3 =
>>>>>> "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
>>>>>>
>>>>>> so result 2 matches your request.
>>>>>>
>>>>>> In this way, you can avoid case-sensitive,
>>>>>>
>>>>>>              
>>>>> word-order-rearrange load
>>>>>
>>>>>            
>>>>>> of works. Furthermore, you can do refined work, such as
>>>>>>
>>>>>>              
>>>>> remove white
>>>>>
>>>>>            
>>>>>> characters, etc.
>>>>>>
>>>>>> Scott @ Taiwan
>>>>>>
>>>>>>
>>>>>> ----- Original Message ----- From: "Varun Gupta"
>>>>>> <va...@gmail.com>
>>>>>>
>>>>>> To:<so...@lucene.apache.org>
>>>>>> Sent: Tuesday, October 26, 2010 9:07 PM
>>>>>>
>>>>>> Subject: How do I this in Solr?
>>>>>>
>>>>>>
>>>>>>    Hi,
>>>>>>
>>>>>>              
>>>>>>> I have lot of small documents (each containing 1 to 15
>>>>>>>
>>>>>>>                
>>>>> words) indexed
>>>>>
>>>>>            
>>>>>>> in Solr. For the search query, I want the search results
>>>>>>>
>>>>>>>                
>>>>> to contain
>>>>>
>>>>>            
>>>>>>> only those documents that satisfy this criteria "All of
>>>>>>>
>>>>>>>                
>>>>> the words of
>>>>>
>>>>>            
>>>>>>> the search result document are present in the search query"
>>>>>>>
>>>>>>> For example:
>>>>>>> If I have the following documents indexed: "nokia n95", "GPS",
>>>>>>> "android", "samsung", "samsung andriod", "nokia andriod",
>>>>>>>
>>>>>>>                
>>>>> "mobile with GPS"
>>>>>
>>>>>            
>>>>>>> If I search with the text "samsung andriod GPS", search results
>>>>>>> should only conain "samsung", "GPS", "andriod" and
>>>>>>>
>>>>>>>                
>>>>> "samsung andriod".
>>>>>
>>>>>            
>>>>>>> Is there a way to do this in Solr.
>>>>>>>
>>>>>>> --
>>>>>>> Thanks
>>>>>>> Varun Gupta
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>
>>>>>>
>>>>>>              
>>>>> ----------------------------------------------------------------------
>>>>>
>>>>>            
>>>>>> ----------
>>>>>>
>>>>>>
>>>>>>
>>>>>> %<&b6G$J0T.'$$'d(l/f,r!C
>>>>>> Checked by AVG - www.avg.com
>>>>>> Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date:
>>>>>> 10/26/10 14:34:00
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>>
>>>>>            
>>>
>>>        
>
>    

Re: How do I this in Solr?

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
That does not work either as it requires that all the terms in the query
are present in the document. The original poster did not state this
requirement. On the contrary, his examples were mostly single-word
matches, implying an OR-search at the core.

The query-explosion still seems like the only working idea. Maybe Varun
could comment on the maximum numbers of terms that his queries will
contain?

Regards,
Toke Eskildsen

On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote:
> Right - my point was to combine this with the previous approaches to 
> form a query like:
> 
> samsung AND android AND GPS AND word_count:3
> 
> in order to exclude documents containing additional words. This would 
> avoid the combinatoric explosion problem otehrs had alluded to earlier. 
> Of course this would fail because android is "mis-" spelled :)
> 
> -Mike
> 
> On 10/27/2010 08:45 AM, Steven A Rowe wrote:
> > I'm pretty sure the word-count strategy won't work.
> >
> >    
> >> If I search with the text "samsung andriod GPS", search results
> >> should only conain "samsung", "GPS", "andriod" and "samsung andriod".
> >>      
> > Using the word-count strategy, a document containing "samsung andriod PDQ" would be a hit, but Varun doesn't want it, because it contains a word that is not in the query.
> >
> > Steve
> >
> >    
> >> -----Original Message-----
> >> From: Michael Sokolov [mailto:sokolov@ifactory.com]
> >> Sent: Wednesday, October 27, 2010 7:44 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: RE: How do I this in Solr?
> >>
> >> You might try adding a field containing the word count and making sure
> >> that
> >> matches the query's word count?
> >>
> >> This would require you to tokenize the query and document yourself,
> >> perhaps.
> >>
> >> -Mike
> >>
> >>      
> >>> -----Original Message-----
> >>> From: Varun Gupta [mailto:varun.vgupta@gmail.com]
> >>> Sent: Tuesday, October 26, 2010 11:26 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: How do I this in Solr?
> >>>
> >>> Thanks everybody for the inputs.
> >>>
> >>> Looks like Steven's solution is the closest one but will lead
> >>> to performance issues when the query string has many terms.
> >>>
> >>> I will try to implement the two filters suggested by Steven
> >>> and see how the performance matches up.
> >>>
> >>> --
> >>> Thanks
> >>> Varun Gupta
> >>>
> >>>
> >>> On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???)
> >>> <sc...@udngroup.com>wrote:
> >>>
> >>>        
> >>>> I think you have to write a "yet exact match" handler
> >>>>          
> >>> yourself (I mean
> >>>        
> >>>> yet cause it's not quite exact match we normally know).
> >>>>          
> >>> Steve's answer
> >>>        
> >>>> is quite near your request. You can do further work based
> >>>>          
> >>> on his solution.
> >>>        
> >>>> At the last step, I'll suggest you eat up all blank within query
> >>>> string and query result, respevtively&  only returns those results
> >>>> that has equal string length as the query string's.
> >>>>
> >>>> For example, giving:
> >>>> *query string = "Samsung with GPS"
> >>>> *query results:
> >>>> resutl 1 = "Samsung has lots of mobile with GPS"
> >>>> result 2 = "with GPS Samsng"
> >>>> result 3 = "GPS mobile with vendors, such as Sony, Samsung"
> >>>>
> >>>> they become:
> >>>> *query result = "SamsungwithGPS" (length =14) *query results:
> >>>> resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29) result 2 =
> >>>> "withGPSSamsng" (length =14) result 3 =
> >>>> "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
> >>>>
> >>>> so result 2 matches your request.
> >>>>
> >>>> In this way, you can avoid case-sensitive,
> >>>>          
> >>> word-order-rearrange load
> >>>        
> >>>> of works. Furthermore, you can do refined work, such as
> >>>>          
> >>> remove white
> >>>        
> >>>> characters, etc.
> >>>>
> >>>> Scott @ Taiwan
> >>>>
> >>>>
> >>>> ----- Original Message ----- From: "Varun Gupta"
> >>>> <va...@gmail.com>
> >>>>
> >>>> To:<so...@lucene.apache.org>
> >>>> Sent: Tuesday, October 26, 2010 9:07 PM
> >>>>
> >>>> Subject: How do I this in Solr?
> >>>>
> >>>>
> >>>>   Hi,
> >>>>          
> >>>>> I have lot of small documents (each containing 1 to 15
> >>>>>            
> >>> words) indexed
> >>>        
> >>>>> in Solr. For the search query, I want the search results
> >>>>>            
> >>> to contain
> >>>        
> >>>>> only those documents that satisfy this criteria "All of
> >>>>>            
> >>> the words of
> >>>        
> >>>>> the search result document are present in the search query"
> >>>>>
> >>>>> For example:
> >>>>> If I have the following documents indexed: "nokia n95", "GPS",
> >>>>> "android", "samsung", "samsung andriod", "nokia andriod",
> >>>>>            
> >>> "mobile with GPS"
> >>>        
> >>>>> If I search with the text "samsung andriod GPS", search results
> >>>>> should only conain "samsung", "GPS", "andriod" and
> >>>>>            
> >>> "samsung andriod".
> >>>        
> >>>>> Is there a way to do this in Solr.
> >>>>>
> >>>>> --
> >>>>> Thanks
> >>>>> Varun Gupta
> >>>>>
> >>>>>
> >>>>>            
> >>>>
> >>>>
> >>>>          
> >>> ----------------------------------------------------------------------
> >>>        
> >>>> ----------
> >>>>
> >>>>
> >>>>
> >>>> %<&b6G$J0T.'$$'d(l/f,r!C
> >>>> Checked by AVG - www.avg.com
> >>>> Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date:
> >>>> 10/26/10 14:34:00
> >>>>
> >>>>
> >>>>          
> >>>        
> >    



RE: How do I this in Solr?

Posted by Steven A Rowe <sa...@syr.edu>.
(Resending my reply, since it unintentionally went to Mike alone rather than to the list:)

But Varun wants documents containing fewer than 3 words to match, so you can't just AND all the terms from the query:

> If I search with the text "samsung andriod GPS", search results
> should only conain "samsung", "GPS", "andriod" and "samsung andriod".

(When Varun says "search results should only conain ...", he means that those strings are the entire contents of matching documents.)

Steve

> -----Original Message-----
> From: Mike Sokolov [mailto:sokolov@ifactory.com]
> Sent: Wednesday, October 27, 2010 9:02 AM
> To: solr-user@lucene.apache.org
> Cc: Steven A Rowe
> Subject: Re: How do I this in Solr?
> 
> Right - my point was to combine this with the previous approaches to
> form a query like:
> 
> samsung AND android AND GPS AND word_count:3
> 
> in order to exclude documents containing additional words. This would
> avoid the combinatoric explosion problem otehrs had alluded to earlier.
> Of course this would fail because android is "mis-" spelled :)
> 
> -Mike
> 
> On 10/27/2010 08:45 AM, Steven A Rowe wrote:
> > I'm pretty sure the word-count strategy won't work.
> >
> >
> >> If I search with the text "samsung andriod GPS", search results
> >> should only conain "samsung", "GPS", "andriod" and "samsung andriod".
> >>
> > Using the word-count strategy, a document containing "samsung andriod
> PDQ" would be a hit, but Varun doesn't want it, because it contains a word
> that is not in the query.
> >
> > Steve
> >
> >
> >> -----Original Message-----
> >> From: Michael Sokolov [mailto:sokolov@ifactory.com]
> >> Sent: Wednesday, October 27, 2010 7:44 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: RE: How do I this in Solr?
> >>
> >> You might try adding a field containing the word count and making sure
> >> that
> >> matches the query's word count?
> >>
> >> This would require you to tokenize the query and document yourself,
> >> perhaps.
> >>
> >> -Mike
> >>
> >>
> >>> -----Original Message-----
> >>> From: Varun Gupta [mailto:varun.vgupta@gmail.com]
> >>> Sent: Tuesday, October 26, 2010 11:26 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: How do I this in Solr?
> >>>
> >>> Thanks everybody for the inputs.
> >>>
> >>> Looks like Steven's solution is the closest one but will lead
> >>> to performance issues when the query string has many terms.
> >>>
> >>> I will try to implement the two filters suggested by Steven
> >>> and see how the performance matches up.
> >>>
> >>> --
> >>> Thanks
> >>> Varun Gupta
> >>>
> >>>
> >>> On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???)
> >>> <sc...@udngroup.com>wrote:
> >>>
> >>>
> >>>> I think you have to write a "yet exact match" handler
> >>>>
> >>> yourself (I mean
> >>>
> >>>> yet cause it's not quite exact match we normally know).
> >>>>
> >>> Steve's answer
> >>>
> >>>> is quite near your request. You can do further work based
> >>>>
> >>> on his solution.
> >>>
> >>>> At the last step, I'll suggest you eat up all blank within query
> >>>> string and query result, respevtively&  only returns those results
> >>>> that has equal string length as the query string's.
> >>>>
> >>>> For example, giving:
> >>>> *query string = "Samsung with GPS"
> >>>> *query results:
> >>>> resutl 1 = "Samsung has lots of mobile with GPS"
> >>>> result 2 = "with GPS Samsng"
> >>>> result 3 = "GPS mobile with vendors, such as Sony, Samsung"
> >>>>
> >>>> they become:
> >>>> *query result = "SamsungwithGPS" (length =14) *query results:
> >>>> resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29) result 2 =
> >>>> "withGPSSamsng" (length =14) result 3 =
> >>>> "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
> >>>>
> >>>> so result 2 matches your request.
> >>>>
> >>>> In this way, you can avoid case-sensitive,
> >>>>
> >>> word-order-rearrange load
> >>>
> >>>> of works. Furthermore, you can do refined work, such as
> >>>>
> >>> remove white
> >>>
> >>>> characters, etc.
> >>>>
> >>>> Scott @ Taiwan
> >>>>
> >>>>
> >>>> ----- Original Message ----- From: "Varun Gupta"
> >>>> <va...@gmail.com>
> >>>>
> >>>> To:<so...@lucene.apache.org>
> >>>> Sent: Tuesday, October 26, 2010 9:07 PM
> >>>>
> >>>> Subject: How do I this in Solr?
> >>>>
> >>>>
> >>>>   Hi,
> >>>>
> >>>>> I have lot of small documents (each containing 1 to 15
> >>>>>
> >>> words) indexed
> >>>
> >>>>> in Solr. For the search query, I want the search results
> >>>>>
> >>> to contain
> >>>
> >>>>> only those documents that satisfy this criteria "All of
> >>>>>
> >>> the words of
> >>>
> >>>>> the search result document are present in the search query"
> >>>>>
> >>>>> For example:
> >>>>> If I have the following documents indexed: "nokia n95", "GPS",
> >>>>> "android", "samsung", "samsung andriod", "nokia andriod",
> >>>>>
> >>> "mobile with GPS"
> >>>
> >>>>> If I search with the text "samsung andriod GPS", search results
> >>>>> should only conain "samsung", "GPS", "andriod" and
> >>>>>
> >>> "samsung andriod".
> >>>
> >>>>> Is there a way to do this in Solr.
> >>>>>
> >>>>> --
> >>>>> Thanks
> >>>>> Varun Gupta
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>> ----------------------------------------------------------------------
> >>>
> >>>> ----------
> >>>>
> >>>>
> >>>>
> >>>> %<&b6G$J0T.'$$'d(l/f,r!C
> >>>> Checked by AVG - www.avg.com
> >>>> Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date:
> >>>> 10/26/10 14:34:00
> >>>>
> >>>>
> >>>>
> >>>
> >

Re: How do I this in Solr?

Posted by Mike Sokolov <so...@ifactory.com>.
Right - my point was to combine this with the previous approaches to 
form a query like:

samsung AND android AND GPS AND word_count:3

in order to exclude documents containing additional words. This would 
avoid the combinatoric explosion problem otehrs had alluded to earlier. 
Of course this would fail because android is "mis-" spelled :)

-Mike

On 10/27/2010 08:45 AM, Steven A Rowe wrote:
> I'm pretty sure the word-count strategy won't work.
>
>    
>> If I search with the text "samsung andriod GPS", search results
>> should only conain "samsung", "GPS", "andriod" and "samsung andriod".
>>      
> Using the word-count strategy, a document containing "samsung andriod PDQ" would be a hit, but Varun doesn't want it, because it contains a word that is not in the query.
>
> Steve
>
>    
>> -----Original Message-----
>> From: Michael Sokolov [mailto:sokolov@ifactory.com]
>> Sent: Wednesday, October 27, 2010 7:44 AM
>> To: solr-user@lucene.apache.org
>> Subject: RE: How do I this in Solr?
>>
>> You might try adding a field containing the word count and making sure
>> that
>> matches the query's word count?
>>
>> This would require you to tokenize the query and document yourself,
>> perhaps.
>>
>> -Mike
>>
>>      
>>> -----Original Message-----
>>> From: Varun Gupta [mailto:varun.vgupta@gmail.com]
>>> Sent: Tuesday, October 26, 2010 11:26 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: How do I this in Solr?
>>>
>>> Thanks everybody for the inputs.
>>>
>>> Looks like Steven's solution is the closest one but will lead
>>> to performance issues when the query string has many terms.
>>>
>>> I will try to implement the two filters suggested by Steven
>>> and see how the performance matches up.
>>>
>>> --
>>> Thanks
>>> Varun Gupta
>>>
>>>
>>> On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???)
>>> <sc...@udngroup.com>wrote:
>>>
>>>        
>>>> I think you have to write a "yet exact match" handler
>>>>          
>>> yourself (I mean
>>>        
>>>> yet cause it's not quite exact match we normally know).
>>>>          
>>> Steve's answer
>>>        
>>>> is quite near your request. You can do further work based
>>>>          
>>> on his solution.
>>>        
>>>> At the last step, I'll suggest you eat up all blank within query
>>>> string and query result, respevtively&  only returns those results
>>>> that has equal string length as the query string's.
>>>>
>>>> For example, giving:
>>>> *query string = "Samsung with GPS"
>>>> *query results:
>>>> resutl 1 = "Samsung has lots of mobile with GPS"
>>>> result 2 = "with GPS Samsng"
>>>> result 3 = "GPS mobile with vendors, such as Sony, Samsung"
>>>>
>>>> they become:
>>>> *query result = "SamsungwithGPS" (length =14) *query results:
>>>> resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29) result 2 =
>>>> "withGPSSamsng" (length =14) result 3 =
>>>> "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
>>>>
>>>> so result 2 matches your request.
>>>>
>>>> In this way, you can avoid case-sensitive,
>>>>          
>>> word-order-rearrange load
>>>        
>>>> of works. Furthermore, you can do refined work, such as
>>>>          
>>> remove white
>>>        
>>>> characters, etc.
>>>>
>>>> Scott @ Taiwan
>>>>
>>>>
>>>> ----- Original Message ----- From: "Varun Gupta"
>>>> <va...@gmail.com>
>>>>
>>>> To:<so...@lucene.apache.org>
>>>> Sent: Tuesday, October 26, 2010 9:07 PM
>>>>
>>>> Subject: How do I this in Solr?
>>>>
>>>>
>>>>   Hi,
>>>>          
>>>>> I have lot of small documents (each containing 1 to 15
>>>>>            
>>> words) indexed
>>>        
>>>>> in Solr. For the search query, I want the search results
>>>>>            
>>> to contain
>>>        
>>>>> only those documents that satisfy this criteria "All of
>>>>>            
>>> the words of
>>>        
>>>>> the search result document are present in the search query"
>>>>>
>>>>> For example:
>>>>> If I have the following documents indexed: "nokia n95", "GPS",
>>>>> "android", "samsung", "samsung andriod", "nokia andriod",
>>>>>            
>>> "mobile with GPS"
>>>        
>>>>> If I search with the text "samsung andriod GPS", search results
>>>>> should only conain "samsung", "GPS", "andriod" and
>>>>>            
>>> "samsung andriod".
>>>        
>>>>> Is there a way to do this in Solr.
>>>>>
>>>>> --
>>>>> Thanks
>>>>> Varun Gupta
>>>>>
>>>>>
>>>>>            
>>>>
>>>>
>>>>          
>>> ----------------------------------------------------------------------
>>>        
>>>> ----------
>>>>
>>>>
>>>>
>>>> %<&b6G$J0T.'$$'d(l/f,r!C
>>>> Checked by AVG - www.avg.com
>>>> Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date:
>>>> 10/26/10 14:34:00
>>>>
>>>>
>>>>          
>>>        
>    

RE: How do I this in Solr?

Posted by Steven A Rowe <sa...@syr.edu>.
I'm pretty sure the word-count strategy won't work.

> If I search with the text "samsung andriod GPS", search results
> should only conain "samsung", "GPS", "andriod" and "samsung andriod".

Using the word-count strategy, a document containing "samsung andriod PDQ" would be a hit, but Varun doesn't want it, because it contains a word that is not in the query.

Steve

> -----Original Message-----
> From: Michael Sokolov [mailto:sokolov@ifactory.com]
> Sent: Wednesday, October 27, 2010 7:44 AM
> To: solr-user@lucene.apache.org
> Subject: RE: How do I this in Solr?
> 
> You might try adding a field containing the word count and making sure
> that
> matches the query's word count?
> 
> This would require you to tokenize the query and document yourself,
> perhaps.
> 
> -Mike
> 
> > -----Original Message-----
> > From: Varun Gupta [mailto:varun.vgupta@gmail.com]
> > Sent: Tuesday, October 26, 2010 11:26 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: How do I this in Solr?
> >
> > Thanks everybody for the inputs.
> >
> > Looks like Steven's solution is the closest one but will lead
> > to performance issues when the query string has many terms.
> >
> > I will try to implement the two filters suggested by Steven
> > and see how the performance matches up.
> >
> > --
> > Thanks
> > Varun Gupta
> >
> >
> > On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???)
> > <sc...@udngroup.com>wrote:
> >
> > > I think you have to write a "yet exact match" handler
> > yourself (I mean
> > > yet cause it's not quite exact match we normally know).
> > Steve's answer
> > > is quite near your request. You can do further work based
> > on his solution.
> > >
> > > At the last step, I'll suggest you eat up all blank within query
> > > string and query result, respevtively & only returns those results
> > > that has equal string length as the query string's.
> > >
> > > For example, giving:
> > > *query string = "Samsung with GPS"
> > > *query results:
> > > resutl 1 = "Samsung has lots of mobile with GPS"
> > > result 2 = "with GPS Samsng"
> > > result 3 = "GPS mobile with vendors, such as Sony, Samsung"
> > >
> > > they become:
> > > *query result = "SamsungwithGPS" (length =14) *query results:
> > > resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29) result 2 =
> > > "withGPSSamsng" (length =14) result 3 =
> > > "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
> > >
> > > so result 2 matches your request.
> > >
> > > In this way, you can avoid case-sensitive,
> > word-order-rearrange load
> > > of works. Furthermore, you can do refined work, such as
> > remove white
> > > characters, etc.
> > >
> > > Scott @ Taiwan
> > >
> > >
> > > ----- Original Message ----- From: "Varun Gupta"
> > > <va...@gmail.com>
> > >
> > > To: <so...@lucene.apache.org>
> > > Sent: Tuesday, October 26, 2010 9:07 PM
> > >
> > > Subject: How do I this in Solr?
> > >
> > >
> > >  Hi,
> > >>
> > >> I have lot of small documents (each containing 1 to 15
> > words) indexed
> > >> in Solr. For the search query, I want the search results
> > to contain
> > >> only those documents that satisfy this criteria "All of
> > the words of
> > >> the search result document are present in the search query"
> > >>
> > >> For example:
> > >> If I have the following documents indexed: "nokia n95", "GPS",
> > >> "android", "samsung", "samsung andriod", "nokia andriod",
> > "mobile with GPS"
> > >>
> > >> If I search with the text "samsung andriod GPS", search results
> > >> should only conain "samsung", "GPS", "andriod" and
> > "samsung andriod".
> > >>
> > >> Is there a way to do this in Solr.
> > >>
> > >> --
> > >> Thanks
> > >> Varun Gupta
> > >>
> > >>
> > >
> > >
> > >
> > ----------------------------------------------------------------------
> > > ----------
> > >
> > >
> > >
> > > %<&b6G$J0T.'$$'d(l/f,r!C
> > > Checked by AVG - www.avg.com
> > > Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date:
> > > 10/26/10 14:34:00
> > >
> > >
> >


Re: How do I this in Solr?

Posted by Varun Gupta <va...@gmail.com>.
I haven't been able to work on it because of some other commitments. The
MemoryIndex approach seems promising. Only thing I will have to check is the
memory requirement as I have close to 2 million documents.

Will let you know if I can make it work.

Thanks a lot!

--
Varun Gupta

On Sat, Nov 6, 2010 at 3:48 AM, Steven A Rowe <sa...@syr.edu> wrote:

> Hi Varun,
>
> On 10/26/2010 at 11:26 PM, Varun Gupta wrote:
> > I will try to implement the two filters suggested by Steven and see how
> > the performance matches up.
>
> Have you made any progress?
>
> I was thinking about your use case, and it occurred to me that you could
> get what you want by reversing the problem, using Lucene's MemoryIndex <
> http://lucene.apache.org/java/3_0_2/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html>.
>  (As far as I can tell, this functionality -- i.e. standing queries a.k.a.
> routing a.k.a. filtering -- is not present in Solr.)
>
> You can load your query (as a document) into a MemoryIndex, and then use
> each of your documents to query against it, something like (untested!):
>
>        Map<String,Query> documents = new HashMap<String,Query>();
>        Analyzer analyzer = new WhitespaceAnalyzer();
>        QueryParser parser = new QueryParser("content", analyzer);
>        parser.setDefaultOperator(QueryParser.Operator.AND);
>        documents.put("ID001", parser.parse("nokia n95"));
>        documents.put("ID002", parser.parse("GPS"));
>        documents.put("ID003", parser.parse("android"));
>        documents.put("ID004", parser.parse("samsung"));
>      documents.put("ID005", parser.parse("samsung android"));
>      documents.put("ID006", parser.parse("nokia android"));
>      documents.put("ID007", parser.parse("mobile with GPS"));
>
>        MemoryIndex index = new MemoryIndex();
>        index.addField("content", "samsung with GPS", analyzer);
>
>        for (Map.Entry<String,Query> entry : documents.entrySet()) {
>          Query query = entry.getValue();
>          if (index.search(query) > 0.0f) {
>            String docId = entry.getKey();
>            // Do something with the hits here ...
>          }
>        }
>
> In the above example, the documents "samsung", "GPS", "android" and
> "samsung android" would be hits, and the other documents would not be, just
> as you wanted.
>
> MemoryIndex is designed to be very fast for this kind of usage, so even
> 100's of thousands of documents should be feasible.
>
> Steve
>
> > -----Original Message-----
> > From: Varun Gupta [mailto:varun.vgupta@gmail.com]
> > Sent: Tuesday, October 26, 2010 11:26 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: How do I this in Solr?
> >
> > Thanks everybody for the inputs.
> >
> > Looks like Steven's solution is the closest one but will lead to
> > performance
> > issues when the query string has many terms.
> >
> > I will try to implement the two filters suggested by Steven and see how
> > the
> > performance matches up.
> >
> > --
> > Thanks
> > Varun Gupta
> >
> >
> > On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹)
> > <sc...@udngroup.com>wrote:
> >
> > > I think you have to write a "yet exact match" handler yourself (I mean
> > yet
> > > cause it's not quite exact match we normally know). Steve's answer is
> > quite
> > > near your request. You can do further work based on his solution.
> > >
> > > At the last step, I'll suggest you eat up all blank within query string
> > and
> > > query result, respevtively & only returns those results that has equal
> > > string length as the query string's.
> > >
> > > For example, giving:
> > > *query string = "Samsung with GPS"
> > > *query results:
> > > resutl 1 = "Samsung has lots of mobile with GPS"
> > > result 2 = "with GPS Samsng"
> > > result 3 = "GPS mobile with vendors, such as Sony, Samsung"
> > >
> > > they become:
> > > *query result = "SamsungwithGPS" (length =14)
> > > *query results:
> > > resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29)
> > > result 2 = "withGPSSamsng" (length =14)
> > > result 3 = "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
> > >
> > > so result 2 matches your request.
> > >
> > > In this way, you can avoid case-sensitive, word-order-rearrange load of
> > > works. Furthermore, you can do refined work, such as remove white
> > > characters, etc.
> > >
> > > Scott @ Taiwan
> > >
> > >
> > > ----- Original Message ----- From: "Varun Gupta"
> > <va...@gmail.com>
> > >
> > > To: <so...@lucene.apache.org>
> > > Sent: Tuesday, October 26, 2010 9:07 PM
> > >
> > > Subject: How do I this in Solr?
> > >
> > >
> > >  Hi,
> > >>
> > >> I have lot of small documents (each containing 1 to 15 words) indexed
> > in
> > >> Solr. For the search query, I want the search results to contain only
> > >> those
> > >> documents that satisfy this criteria "All of the words of the search
> > >> result
> > >> document are present in the search query"
> > >>
> > >> For example:
> > >> If I have the following documents indexed: "nokia n95", "GPS",
> > "android",
> > >> "samsung", "samsung andriod", "nokia andriod", "mobile with GPS"
> > >>
> > >> If I search with the text "samsung andriod GPS", search results should
> > >> only
> > >> conain "samsung", "GPS", "andriod" and "samsung andriod".
> > >>
> > >> Is there a way to do this in Solr.
> > >>
> > >> --
> > >> Thanks
> > >> Varun Gupta
> > >>
> > >>
> > >
> > >
> > >
> ------------------------------------------------------------------------
> > --------
> > >
> > >
> > >
> > > %<&b6G$J0T.'$$'d(l/f,r!C
> > > Checked by AVG - www.avg.com
> > > Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date:
> 10/26/10
> > > 14:34:00
> > >
> > >
>

RE: How do I this in Solr?

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Varun,

On 10/26/2010 at 11:26 PM, Varun Gupta wrote:
> I will try to implement the two filters suggested by Steven and see how
> the performance matches up.

Have you made any progress?

I was thinking about your use case, and it occurred to me that you could get what you want by reversing the problem, using Lucene's MemoryIndex <http://lucene.apache.org/java/3_0_2/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html>.  (As far as I can tell, this functionality -- i.e. standing queries a.k.a. routing a.k.a. filtering -- is not present in Solr.)

You can load your query (as a document) into a MemoryIndex, and then use each of your documents to query against it, something like (untested!):

	Map<String,Query> documents = new HashMap<String,Query>();
	Analyzer analyzer = new WhitespaceAnalyzer();
	QueryParser parser = new QueryParser("content", analyzer);
	parser.setDefaultOperator(QueryParser.Operator.AND);
	documents.put("ID001", parser.parse("nokia n95"));
	documents.put("ID002", parser.parse("GPS"));
	documents.put("ID003", parser.parse("android"));
	documents.put("ID004", parser.parse("samsung"));
      documents.put("ID005", parser.parse("samsung android"));
      documents.put("ID006", parser.parse("nokia android"));
      documents.put("ID007", parser.parse("mobile with GPS"));

	MemoryIndex index = new MemoryIndex();
	index.addField("content", "samsung with GPS", analyzer);

	for (Map.Entry<String,Query> entry : documents.entrySet()) {
	  Query query = entry.getValue();
	  if (index.search(query) > 0.0f) {
	    String docId = entry.getKey();
	    // Do something with the hits here ...
	  }
	}

In the above example, the documents "samsung", "GPS", "android" and "samsung android" would be hits, and the other documents would not be, just as you wanted.

MemoryIndex is designed to be very fast for this kind of usage, so even 100's of thousands of documents should be feasible.

Steve

> -----Original Message-----
> From: Varun Gupta [mailto:varun.vgupta@gmail.com]
> Sent: Tuesday, October 26, 2010 11:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How do I this in Solr?
> 
> Thanks everybody for the inputs.
> 
> Looks like Steven's solution is the closest one but will lead to
> performance
> issues when the query string has many terms.
> 
> I will try to implement the two filters suggested by Steven and see how
> the
> performance matches up.
> 
> --
> Thanks
> Varun Gupta
> 
> 
> On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹)
> <sc...@udngroup.com>wrote:
> 
> > I think you have to write a "yet exact match" handler yourself (I mean
> yet
> > cause it's not quite exact match we normally know). Steve's answer is
> quite
> > near your request. You can do further work based on his solution.
> >
> > At the last step, I'll suggest you eat up all blank within query string
> and
> > query result, respevtively & only returns those results that has equal
> > string length as the query string's.
> >
> > For example, giving:
> > *query string = "Samsung with GPS"
> > *query results:
> > resutl 1 = "Samsung has lots of mobile with GPS"
> > result 2 = "with GPS Samsng"
> > result 3 = "GPS mobile with vendors, such as Sony, Samsung"
> >
> > they become:
> > *query result = "SamsungwithGPS" (length =14)
> > *query results:
> > resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29)
> > result 2 = "withGPSSamsng" (length =14)
> > result 3 = "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
> >
> > so result 2 matches your request.
> >
> > In this way, you can avoid case-sensitive, word-order-rearrange load of
> > works. Furthermore, you can do refined work, such as remove white
> > characters, etc.
> >
> > Scott @ Taiwan
> >
> >
> > ----- Original Message ----- From: "Varun Gupta"
> <va...@gmail.com>
> >
> > To: <so...@lucene.apache.org>
> > Sent: Tuesday, October 26, 2010 9:07 PM
> >
> > Subject: How do I this in Solr?
> >
> >
> >  Hi,
> >>
> >> I have lot of small documents (each containing 1 to 15 words) indexed
> in
> >> Solr. For the search query, I want the search results to contain only
> >> those
> >> documents that satisfy this criteria "All of the words of the search
> >> result
> >> document are present in the search query"
> >>
> >> For example:
> >> If I have the following documents indexed: "nokia n95", "GPS",
> "android",
> >> "samsung", "samsung andriod", "nokia andriod", "mobile with GPS"
> >>
> >> If I search with the text "samsung andriod GPS", search results should
> >> only
> >> conain "samsung", "GPS", "andriod" and "samsung andriod".
> >>
> >> Is there a way to do this in Solr.
> >>
> >> --
> >> Thanks
> >> Varun Gupta
> >>
> >>
> >
> >
> > ------------------------------------------------------------------------
> --------
> >
> >
> >
> > %<&b6G$J0T.'$$'d(l/f,r!C
> > Checked by AVG - www.avg.com
> > Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10
> > 14:34:00
> >
> >

RE: How do I this in Solr?

Posted by Michael Sokolov <so...@ifactory.com>.
You might try adding a field containing the word count and making sure that
matches the query's word count?

This would require you to tokenize the query and document yourself, perhaps.

-Mike 

> -----Original Message-----
> From: Varun Gupta [mailto:varun.vgupta@gmail.com] 
> Sent: Tuesday, October 26, 2010 11:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How do I this in Solr?
> 
> Thanks everybody for the inputs.
> 
> Looks like Steven's solution is the closest one but will lead 
> to performance issues when the query string has many terms.
> 
> I will try to implement the two filters suggested by Steven 
> and see how the performance matches up.
> 
> --
> Thanks
> Varun Gupta
> 
> 
> On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???) 
> <sc...@udngroup.com>wrote:
> 
> > I think you have to write a "yet exact match" handler 
> yourself (I mean 
> > yet cause it's not quite exact match we normally know). 
> Steve's answer 
> > is quite near your request. You can do further work based 
> on his solution.
> >
> > At the last step, I'll suggest you eat up all blank within query 
> > string and query result, respevtively & only returns those results 
> > that has equal string length as the query string's.
> >
> > For example, giving:
> > *query string = "Samsung with GPS"
> > *query results:
> > resutl 1 = "Samsung has lots of mobile with GPS"
> > result 2 = "with GPS Samsng"
> > result 3 = "GPS mobile with vendors, such as Sony, Samsung"
> >
> > they become:
> > *query result = "SamsungwithGPS" (length =14) *query results:
> > resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29) result 2 = 
> > "withGPSSamsng" (length =14) result 3 = 
> > "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
> >
> > so result 2 matches your request.
> >
> > In this way, you can avoid case-sensitive, 
> word-order-rearrange load 
> > of works. Furthermore, you can do refined work, such as 
> remove white 
> > characters, etc.
> >
> > Scott @ Taiwan
> >
> >
> > ----- Original Message ----- From: "Varun Gupta" 
> > <va...@gmail.com>
> >
> > To: <so...@lucene.apache.org>
> > Sent: Tuesday, October 26, 2010 9:07 PM
> >
> > Subject: How do I this in Solr?
> >
> >
> >  Hi,
> >>
> >> I have lot of small documents (each containing 1 to 15 
> words) indexed 
> >> in Solr. For the search query, I want the search results 
> to contain 
> >> only those documents that satisfy this criteria "All of 
> the words of 
> >> the search result document are present in the search query"
> >>
> >> For example:
> >> If I have the following documents indexed: "nokia n95", "GPS", 
> >> "android", "samsung", "samsung andriod", "nokia andriod", 
> "mobile with GPS"
> >>
> >> If I search with the text "samsung andriod GPS", search results 
> >> should only conain "samsung", "GPS", "andriod" and 
> "samsung andriod".
> >>
> >> Is there a way to do this in Solr.
> >>
> >> --
> >> Thanks
> >> Varun Gupta
> >>
> >>
> >
> >
> > 
> ----------------------------------------------------------------------
> > ----------
> >
> >
> >
> > %<&b6G$J0T.'$$'d(l/f,r!C
> > Checked by AVG - www.avg.com
> > Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 
> > 10/26/10 14:34:00
> >
> >
> 


Re: How do I this in Solr?

Posted by Varun Gupta <va...@gmail.com>.
Thanks everybody for the inputs.

Looks like Steven's solution is the closest one but will lead to performance
issues when the query string has many terms.

I will try to implement the two filters suggested by Steven and see how the
performance matches up.

--
Thanks
Varun Gupta


On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹) <sc...@udngroup.com>wrote:

> I think you have to write a "yet exact match" handler yourself (I mean yet
> cause it's not quite exact match we normally know). Steve's answer is quite
> near your request. You can do further work based on his solution.
>
> At the last step, I'll suggest you eat up all blank within query string and
> query result, respevtively & only returns those results that has equal
> string length as the query string's.
>
> For example, giving:
> *query string = "Samsung with GPS"
> *query results:
> resutl 1 = "Samsung has lots of mobile with GPS"
> result 2 = "with GPS Samsng"
> result 3 = "GPS mobile with vendors, such as Sony, Samsung"
>
> they become:
> *query result = "SamsungwithGPS" (length =14)
> *query results:
> resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29)
> result 2 = "withGPSSamsng" (length =14)
> result 3 = "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
>
> so result 2 matches your request.
>
> In this way, you can avoid case-sensitive, word-order-rearrange load of
> works. Furthermore, you can do refined work, such as remove white
> characters, etc.
>
> Scott @ Taiwan
>
>
> ----- Original Message ----- From: "Varun Gupta" <va...@gmail.com>
>
> To: <so...@lucene.apache.org>
> Sent: Tuesday, October 26, 2010 9:07 PM
>
> Subject: How do I this in Solr?
>
>
>  Hi,
>>
>> I have lot of small documents (each containing 1 to 15 words) indexed in
>> Solr. For the search query, I want the search results to contain only
>> those
>> documents that satisfy this criteria "All of the words of the search
>> result
>> document are present in the search query"
>>
>> For example:
>> If I have the following documents indexed: "nokia n95", "GPS", "android",
>> "samsung", "samsung andriod", "nokia andriod", "mobile with GPS"
>>
>> If I search with the text "samsung andriod GPS", search results should
>> only
>> conain "samsung", "GPS", "andriod" and "samsung andriod".
>>
>> Is there a way to do this in Solr.
>>
>> --
>> Thanks
>> Varun Gupta
>>
>>
>
>
> --------------------------------------------------------------------------------
>
>
>
> %<&b6G$J0T.'$$'d(l/f,r!C
> Checked by AVG - www.avg.com
> Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10
> 14:34:00
>
>

Re: How do I this in Solr?

Posted by "scott chu (朱炎詹)" <sc...@udngroup.com>.
I think you have to write a "yet exact match" handler yourself (I mean yet 
cause it's not quite exact match we normally know). Steve's answer is quite 
near your request. You can do further work based on his solution.

At the last step, I'll suggest you eat up all blank within query string and 
query result, respevtively & only returns those results that has equal 
string length as the query string's.

For example, giving:
*query string = "Samsung with GPS"
*query results:
resutl 1 = "Samsung has lots of mobile with GPS"
result 2 = "with GPS Samsng"
result 3 = "GPS mobile with vendors, such as Sony, Samsung"

they become:
*query result = "SamsungwithGPS" (length =14)
*query results:
resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29)
result 2 = "withGPSSamsng" (length =14)
result 3 = "GPSmobilewithvendors,suchasSony,Samsung" (length =43)

so result 2 matches your request.

In this way, you can avoid case-sensitive, word-order-rearrange load of 
works. Furthermore, you can do refined work, such as remove white 
characters, etc.

Scott @ Taiwan


----- Original Message ----- 
From: "Varun Gupta" <va...@gmail.com>
To: <so...@lucene.apache.org>
Sent: Tuesday, October 26, 2010 9:07 PM
Subject: How do I this in Solr?


> Hi,
>
> I have lot of small documents (each containing 1 to 15 words) indexed in
> Solr. For the search query, I want the search results to contain only 
> those
> documents that satisfy this criteria "All of the words of the search 
> result
> document are present in the search query"
>
> For example:
> If I have the following documents indexed: "nokia n95", "GPS", "android",
> "samsung", "samsung andriod", "nokia andriod", "mobile with GPS"
>
> If I search with the text "samsung andriod GPS", search results should 
> only
> conain "samsung", "GPS", "andriod" and "samsung andriod".
>
> Is there a way to do this in Solr.
>
> --
> Thanks
> Varun Gupta
>


--------------------------------------------------------------------------------



%<&b6G$J0T.'$$'d(l/f,r!C
Checked by AVG - www.avg.com
Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10 
14:34:00


Re: How do I this in Solr?

Posted by Lance Norskog <go...@gmail.com>.
There is also a feature called a 'filter'. If you use certain words a 
lot, you can make filter queries with just those words.  Look for 
'filter' and 'fq=' on the wiki.

But really you can have hundreds of words in a query and not have a 
performance problem. Solr/Lucene is very fast. In benchmarking I have 
trouble sending enough requests to make several processors run at the 
same time.

Varun Gupta wrote:
> Hi,
>
> I have lot of small documents (each containing 1 to 15 words) indexed in
> Solr. For the search query, I want the search results to contain only those
> documents that satisfy this criteria "All of the words of the search result
> document are present in the search query"
>
> For example:
> If I have the following documents indexed: "nokia n95", "GPS", "android",
> "samsung", "samsung andriod", "nokia andriod", "mobile with GPS"
>
> If I search with the text "samsung andriod GPS", search results should only
> conain "samsung", "GPS", "andriod" and "samsung andriod".
>
> Is there a way to do this in Solr.
>
> --
> Thanks
> Varun Gupta
>
>    

RE: How do I this in Solr?

Posted by Steven A Rowe <sa...@syr.edu>.
Dennis,

I wasn't trying to force your admission of my rectitude - I was just getting frustrated that the conversation was moving in spiral fashion, and was worried that you might have intentionally engineered that.

I'm glad to hear that you weren't flame baiting.

Steve


> -----Original Message-----
> From: Dennis Gearon [mailto:gearond@sbcglobal.net]
> Sent: Tuesday, October 26, 2010 3:35 PM
> To: solr-user@lucene.apache.org
> Subject: RE: How do I this in Solr?
> 
> I'm the LAST person anyone will ever need to worry about flame baiting.
> You did notice that I retracted what I said and supported your point of
> view?
> 
> Sorry if my cryptic comment sounded critical. I was wrong, you were right
> :-)
> Dennis Gearon
> 
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a
> better idea to learn from others’ mistakes, so you do not have to make
> them yourself. from
> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> EARTH has a Right To Life,
>   otherwise we all die.
> 
> 
> --- On Tue, 10/26/10, Steven A Rowe <sa...@syr.edu> wrote:
> 
> > From: Steven A Rowe <sa...@syr.edu>
> > Subject: RE: How do I this in Solr?
> > To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> > Date: Tuesday, October 26, 2010, 12:27 PM
> > Hi Dennis,
> >
> > You wrote:
> > > If Solr is like Google, once documents matching only
> > the ANDed items
> > > in the query ran out, then those that had only two of
> > the terms, then
> > > only 1 of the terms, and then those close to it would
> > start showing up.
> > [...]
> > > Plus, if he wants terms that contain ONLY those words,
> > and no others, an
> > > ANDed query would not do that, right? ANDed queries
> > return results that
> > > must have ALL the terms listed, and could have lots of
> > other words, right?
> >
> > This is *exactly* what I just said: ANDed queries (i.e.,
> > requiring all query terms) will not satisfy Varun's
> > requirements.
> >
> > Your participation in this thread looks an awful lot like
> > flame-bating: Someone else asks a question, I answer with a
> > possible solution, you give a one-word "overkill" response,
> > I say why it's not overkill.  You then ask if anybody
> > knows the answer to the original question, and then parrot
> > my response to your "overkill" statement.  Really????
> >
> > Get your shit together or shut up.  Please.
> >
> > Steve
> >
> > > -----Original Message-----
> > > From: Dennis Gearon [mailto:gearond@sbcglobal.net]
> > > Sent: Tuesday, October 26, 2010 3:14 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: RE: How do I this in Solr?
> > >
> > >
> > >
> > > Dennis Gearon
> > >
> > > Signature Warning
> > > ----------------
> > > It is always a good idea to learn from your own
> > mistakes. It is usually a
> > > better idea to learn from others’ mistakes, so you
> > do not have to make
> > > them yourself. from
> > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> > >
> > > EARTH has a Right To Life,
> > >   otherwise we all die.
> > >
> > >
> > > --- On Tue, 10/26/10, Steven A Rowe <sa...@syr.edu>
> > wrote:
> > >
> > > > From: Steven A Rowe <sa...@syr.edu>
> > > > Subject: RE: How do I this in Solr?
> > > > To: "solr-user@lucene.apache.org"
> > <so...@lucene.apache.org>
> > > > Date: Tuesday, October 26, 2010, 12:10 PM
> > > > Dennis,
> > > >
> > > > Do you mean to say that you read my earlier post,
> > and
> > > > disagree that it would solve the problem?  Or
> > have you
> > > > simply not read it?
> > > >
> > > > Steve
> > > >
> > > > > -----Original Message-----
> > > > > From: Dennis Gearon [mailto:gearond@sbcglobal.net]
> > > > > Sent: Tuesday, October 26, 2010 3:00 PM
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: RE: How do I this in Solr?
> > > > >
> > > > > Good point. Since I might need such a query
> > myself
> > > > someday, how *IS* that
> > > > > done?
> > > > >
> > > > >
> > > > > Dennis Gearon
> > > > >
> > > > > Signature Warning
> > > > > ----------------
> > > > > It is always a good idea to learn from your
> > own
> > > > mistakes. It is usually a
> > > > > better idea to learn from others’
> > mistakes, so you
> > > > do not have to make
> > > > > them yourself. from
> > > > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> > >
> > > > >
> > > > > EARTH has a Right To Life,
> > > > >   otherwise we all die.
> > > > >
> > > > >
> > > > > --- On Tue, 10/26/10, Steven A Rowe <sa...@syr.edu>
> > > > wrote:
> > > > >
> > > > > > From: Steven A Rowe <sa...@syr.edu>
> > > > > > Subject: RE: How do I this in Solr?
> > > > > > To: "solr-user@lucene.apache.org"
> > > > <so...@lucene.apache.org>
> > > > > > Date: Tuesday, October 26, 2010, 11:46
> > AM
> > > > > > Um, maybe I'm way off base, but when
> > > > > > Varun said:
> > > > > >
> > > > > > > If I search with the text "samsung
> > andriod
> > > > GPS",
> > > > > > > search results should only conain
> > "samsung",
> > > > "GPS",
> > > > > > > "andriod" and "samsung andriod".
> > > > > >
> > > > > > I interpreted that to mean that hit
> > documents
> > > > should
> > > > > > contain terms from the query, and
> > nothing else.
> > > > Making
> > > > > > all terms required doesn't do this.
> > > > > >
> > > > > > Steve
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Matthew Hall [mailto:mhall@informatics.jax.org]
> > > > > > > Sent: Tuesday, October 26, 2010
> > 2:30 PM
> > > > > > > To: solr-user@lucene.apache.org
> > > > > > > Subject: Re: How do I this in
> > Solr?
> > > > > > >
> > > > > > > Um.. you could change your default
> > clause to
> > > > AND
> > > > > > rather than or.
> > > > > > >
> > > > > > > That should do the trick.
> > > > > > >
> > > > > > > Matt
> > > > > > >
> > > > > > > On 10/26/2010 2:26 PM, Dennis
> > Gearon wrote:
> > > > > > > > Overkill?
> > > > > > > >
> > > > > > > > Dennis Gearon
> > > > > > > >> I can't think of a way to
> > do it
> > > > without
> > > > > > writing new
> > > > > > > >> analysis filters.
> > > > > > > >>
> > > > > > > >> But I think you could do
> > what you
> > > > want with
> > > > > > two filters
> > > > > > > >> (this is untested):
> > > > > > > >>
> > > > > > > >> 1. An index-time filter
> > that
> > > > outputs a single
> > > > > > token
> > > > > > > >> consisting of all of the
> > input
> > > > tokens, sorted
> > > > > > in a
> > > > > > > >> consistent way, e.g.:
> > > > > > > >>
> > > > > > > >>     "mobile with
> > GPS"
> > > > > > ->  "GPS mobile
> > > > > > > >> with"
> > > > > > > >>     "samsung
> > android"
> > > > > > ->  "android
> > > > > > > >> samsung"
> > > > > > > >>
> > > > > > > >> 2. A query-time filter
> > that outputs
> > > > one token
> > > > > > per input
> > > > > > > >> term combination, sorted
> > in the
> > > > same
> > > > > > consistent way as the
> > > > > > > >> index-time filter, e.g.:
> > > > > > > >>
> > > > > > > >>     "samsung andriod
> > > > > > GPS"
> > > > > > > >>       ->
> > > > > > > >>
> > "samsung","android","GPS",
> > > > > > > >>          "android
> > > > > > > >> samsung","GPS
> > samsung","android
> > > > GPS"
> > > > > > > >>          "android
> > > > > > GPS
> > > > > > > >> samsung"
> > > > > > > >>
> > > > > > > >> Steve
> > > > > > > >>
> > > > > > > >>> -----Original
> > Message-----
> > > > > > > >>> From: Varun Gupta
> > [mailto:varun.vgupta@gmail.com]
> > > > > > > >>> Sent: Tuesday,
> > October 26, 2010
> > > > 9:08 AM
> > > > > > > >>> To: solr-user@lucene.apache.org
> > > > > > > >>> Subject: How do I
> > this in
> > > > Solr?
> > > > > > > >>>
> > > > > > > >>> Hi,
> > > > > > > >>>
> > > > > > > >>> I have lot of small
> > documents
> > > > (each
> > > > > > containing 1 to 15
> > > > > > > >> words) indexed in
> > > > > > > >>> Solr. For the search
> > query, I
> > > > want the
> > > > > > search results
> > > > > > > >> to contain only
> > > > > > > >>> those
> > > > > > > >>> documents that
> > satisfy this
> > > > criteria "All
> > > > > > of the words
> > > > > > > >> of the search
> > > > > > > >>> result
> > > > > > > >>> document are present
> > in the
> > > > search
> > > > > > query"
> > > > > > > >>>
> > > > > > > >>> For example:
> > > > > > > >>> If I have the
> > following
> > > > documents
> > > > > > indexed: "nokia
> > > > > > > >> n95", "GPS", "android",
> > > > > > > >>> "samsung", "samsung
> > andriod",
> > > > "nokia
> > > > > > andriod", "mobile
> > > > > > > >> with GPS"
> > > > > > > >>> If I search with the
> > text
> > > > "samsung
> > > > > > andriod GPS",
> > > > > > > >> search results should
> > > > > > > >>> only
> > > > > > > >>> conain "samsung",
> > "GPS",
> > > > "andriod" and
> > > > > > "samsung
> > > > > > > >> andriod".
> > > > > > > >>> Is there a way to do
> > this in
> > > > Solr.
> > > > > > > >>>
> > > > > > > >>> --
> > > > > > > >>> Thanks
> > > > > > > >>> Varun Gupta
> > > > > >
> > > > > >
> > > >
> >

RE: How do I this in Solr?

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Matt,

I think your concern about performance is spot-on, though.

The combinatorial explosion would be at query time, not at index time - my solution has a single token indexed per document. My suggested query-time filter would generate the following number of output terms, where C(n,k) is the combination of n things taken k at a time, n is the number of input query terms, and k is the number of concatenated input query terms forming one output query term:

    C(n,1)+C(n,2)...+C(n,n-1)+C(n,n)

For small queries this would not be a problem:

	1 input query term -> 1 output query term
	2 input query terms -> 3 output query terms
	3 input query terms -> 7 output query terms
	4 input query terms -> 15 output query terms

But for larger queries, it could be fairly expensive:

	10 input query terms -> 1,023 output query terms
	...
	15 input query terms -> 32,767 output query terms

This is exactly (2^n - 1) output query terms, where n is the number of input terms.

32k query terms might be too slow to be functional.

Steve

> -----Original Message-----
> From: Matthew Hall [mailto:mhall@informatics.jax.org]
> Sent: Tuesday, October 26, 2010 3:51 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How do I this in Solr?
> 
> Bah.. nope this would miss documents that only match a subset of the
> given terms.
> 
> I'm going to have to go with Steven's approach as the right choice here.
> 
> Matt
> 
> On 10/26/2010 3:44 PM, Matthew Hall wrote:
> > Indeed, I'd missed the second part of his requirements, my and
> > solution is sadly insufficient to this task.
> >
> > The combinatorial part of you solution worries me a bit though Steven,
> > because his documents that are on the larger side of his corpus would
> > likely slow down query performance a bit while the filter calculates
> > all of the possibilities for a given document.
> >
> > I'm wondering if a slightly hybrid approach would be valid:
> >
> > Have a filter that calculates the total number of terms for a given
> > document.  And then add a clause into your query at runtime that would
> > match what the filter would come up with:
> >
> > So:
> >
> > text:"Nokia" AND text:"Mobile" AND text:"GPS" AND termCount: 3
> >
> > Something like that anyhow.
> >
> > Matt
> >
> > On 10/26/2010 3:35 PM, Dennis Gearon wrote:
> >> I'm the LAST person anyone will ever need to worry about flame
> >> baiting. You did notice that I retracted what I said and supported
> >> your point of view?
> >>
> >> Sorry if my cryptic comment sounded critical. I was wrong, you were
> >> right :-)
> >> Dennis Gearon
> >>
> >> Signature Warning
> >> ----------------
> >> It is always a good idea to learn from your own mistakes. It is
> >> usually a better idea to learn from others’ mistakes, so you do not
> >> have to make them yourself. from
> >> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> >>
> >> EARTH has a Right To Life,
> >>    otherwise we all die.
> >>
> >>
> >> --- On Tue, 10/26/10, Steven A Rowe<sa...@syr.edu>  wrote:
> >>
> >>> From: Steven A Rowe<sa...@syr.edu>
> >>> Subject: RE: How do I this in Solr?
> >>> To: "solr-user@lucene.apache.org"<so...@lucene.apache.org>
> >>> Date: Tuesday, October 26, 2010, 12:27 PM
> >>> Hi Dennis,
> >>>
> >>> You wrote:
> >>>> If Solr is like Google, once documents matching only
> >>> the ANDed items
> >>>> in the query ran out, then those that had only two of
> >>> the terms, then
> >>>> only 1 of the terms, and then those close to it would
> >>> start showing up.
> >>> [...]
> >>>> Plus, if he wants terms that contain ONLY those words,
> >>> and no others, an
> >>>> ANDed query would not do that, right? ANDed queries
> >>> return results that
> >>>> must have ALL the terms listed, and could have lots of
> >>> other words, right?
> >>>
> >>> This is *exactly* what I just said: ANDed queries (i.e.,
> >>> requiring all query terms) will not satisfy Varun's
> >>> requirements.
> >>>
> >>> Your participation in this thread looks an awful lot like
> >>> flame-bating: Someone else asks a question, I answer with a
> >>> possible solution, you give a one-word "overkill" response,
> >>> I say why it's not overkill.  You then ask if anybody
> >>> knows the answer to the original question, and then parrot
> >>> my response to your "overkill" statement.  Really????
> >>>
> >>> Get your shit together or shut up.  Please.
> >>>
> >>> Steve
> >>>
> >>>> -----Original Message-----
> >>>> From: Dennis Gearon [mailto:gearond@sbcglobal.net]
> >>>> Sent: Tuesday, October 26, 2010 3:14 PM
> >>>> To: solr-user@lucene.apache.org
> >>>> Subject: RE: How do I this in Solr?
> >>>>
> >>>>
> >>>>
> >>>> Dennis Gearon
> >>>>
> >>>> Signature Warning
> >>>> ----------------
> >>>> It is always a good idea to learn from your own
> >>> mistakes. It is usually a
> >>>> better idea to learn from others’ mistakes, so you
> >>> do not have to make
> >>>> them yourself. from
> >>>> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> >>>> EARTH has a Right To Life,
> >>>>     otherwise we all die.
> >>>>
> >>>>
> >>>> --- On Tue, 10/26/10, Steven A Rowe<sa...@syr.edu>
> >>> wrote:
> >>>>> From: Steven A Rowe<sa...@syr.edu>
> >>>>> Subject: RE: How do I this in Solr?
> >>>>> To: "solr-user@lucene.apache.org"
> >>> <so...@lucene.apache.org>
> >>>>> Date: Tuesday, October 26, 2010, 12:10 PM
> >>>>> Dennis,
> >>>>>
> >>>>> Do you mean to say that you read my earlier post,
> >>> and
> >>>>> disagree that it would solve the problem?  Or
> >>> have you
> >>>>> simply not read it?
> >>>>>
> >>>>> Steve
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Dennis Gearon [mailto:gearond@sbcglobal.net]
> >>>>>> Sent: Tuesday, October 26, 2010 3:00 PM
> >>>>>> To: solr-user@lucene.apache.org
> >>>>>> Subject: RE: How do I this in Solr?
> >>>>>>
> >>>>>> Good point. Since I might need such a query
> >>> myself
> >>>>> someday, how *IS* that
> >>>>>> done?
> >>>>>>
> >>>>>>
> >>>>>> Dennis Gearon
> >>>>>>
> >>>>>> Signature Warning
> >>>>>> ----------------
> >>>>>> It is always a good idea to learn from your
> >>> own
> >>>>> mistakes. It is usually a
> >>>>>> better idea to learn from others’
> >>> mistakes, so you
> >>>>> do not have to make
> >>>>>> them yourself. from
> >>>>>> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> >>>>>> EARTH has a Right To Life,
> >>>>>>     otherwise we all die.
> >>>>>>
> >>>>>>
> >>>>>> --- On Tue, 10/26/10, Steven A Rowe<sa...@syr.edu>
> >>>>> wrote:
> >>>>>>> From: Steven A Rowe<sa...@syr.edu>
> >>>>>>> Subject: RE: How do I this in Solr?
> >>>>>>> To: "solr-user@lucene.apache.org"
> >>>>> <so...@lucene.apache.org>
> >>>>>>> Date: Tuesday, October 26, 2010, 11:46
> >>> AM
> >>>>>>> Um, maybe I'm way off base, but when
> >>>>>>> Varun said:
> >>>>>>>
> >>>>>>>> If I search with the text "samsung
> >>> andriod
> >>>>> GPS",
> >>>>>>>> search results should only conain
> >>> "samsung",
> >>>>> "GPS",
> >>>>>>>> "andriod" and "samsung andriod".
> >>>>>>> I interpreted that to mean that hit
> >>> documents
> >>>>> should
> >>>>>>> contain terms from the query, and
> >>> nothing else.
> >>>>> Making
> >>>>>>> all terms required doesn't do this.
> >>>>>>>
> >>>>>>> Steve
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Matthew Hall [mailto:mhall@informatics.jax.org]
> >>>>>>>> Sent: Tuesday, October 26, 2010
> >>> 2:30 PM
> >>>>>>>> To: solr-user@lucene.apache.org
> >>>>>>>> Subject: Re: How do I this in
> >>> Solr?
> >>>>>>>> Um.. you could change your default
> >>> clause to
> >>>>> AND
> >>>>>>> rather than or.
> >>>>>>>> That should do the trick.
> >>>>>>>>
> >>>>>>>> Matt
> >>>>>>>>
> >>>>>>>> On 10/26/2010 2:26 PM, Dennis
> >>> Gearon wrote:
> >>>>>>>>> Overkill?
> >>>>>>>>>
> >>>>>>>>> Dennis Gearon
> >>>>>>>>>> I can't think of a way to
> >>> do it
> >>>>> without
> >>>>>>> writing new
> >>>>>>>>>> analysis filters.
> >>>>>>>>>>
> >>>>>>>>>> But I think you could do
> >>> what you
> >>>>> want with
> >>>>>>> two filters
> >>>>>>>>>> (this is untested):
> >>>>>>>>>>
> >>>>>>>>>> 1. An index-time filter
> >>> that
> >>>>> outputs a single
> >>>>>>> token
> >>>>>>>>>> consisting of all of the
> >>> input
> >>>>> tokens, sorted
> >>>>>>> in a
> >>>>>>>>>> consistent way, e.g.:
> >>>>>>>>>>
> >>>>>>>>>>       "mobile with
> >>> GPS"
> >>>>>>> ->   "GPS mobile
> >>>>>>>>>> with"
> >>>>>>>>>>       "samsung
> >>> android"
> >>>>>>> ->   "android
> >>>>>>>>>> samsung"
> >>>>>>>>>>
> >>>>>>>>>> 2. A query-time filter
> >>> that outputs
> >>>>> one token
> >>>>>>> per input
> >>>>>>>>>> term combination, sorted
> >>> in the
> >>>>> same
> >>>>>>> consistent way as the
> >>>>>>>>>> index-time filter, e.g.:
> >>>>>>>>>>
> >>>>>>>>>>       "samsung andriod
> >>>>>>> GPS"
> >>>>>>>>>>         ->
> >>>>>>>>>>
> >>> "samsung","android","GPS",
> >>>>>>>>>>            "android
> >>>>>>>>>> samsung","GPS
> >>> samsung","android
> >>>>> GPS"
> >>>>>>>>>>            "android
> >>>>>>> GPS
> >>>>>>>>>> samsung"
> >>>>>>>>>>
> >>>>>>>>>> Steve
> >>>>>>>>>>
> >>>>>>>>>>> -----Original
> >>> Message-----
> >>>>>>>>>>> From: Varun Gupta
> >>> [mailto:varun.vgupta@gmail.com]
> >>>>>>>>>>> Sent: Tuesday,
> >>> October 26, 2010
> >>>>> 9:08 AM
> >>>>>>>>>>> To: solr-user@lucene.apache.org
> >>>>>>>>>>> Subject: How do I
> >>> this in
> >>>>> Solr?
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> I have lot of small
> >>> documents
> >>>>> (each
> >>>>>>> containing 1 to 15
> >>>>>>>>>> words) indexed in
> >>>>>>>>>>> Solr. For the search
> >>> query, I
> >>>>> want the
> >>>>>>> search results
> >>>>>>>>>> to contain only
> >>>>>>>>>>> those
> >>>>>>>>>>> documents that
> >>> satisfy this
> >>>>> criteria "All
> >>>>>>> of the words
> >>>>>>>>>> of the search
> >>>>>>>>>>> result
> >>>>>>>>>>> document are present
> >>> in the
> >>>>> search
> >>>>>>> query"
> >>>>>>>>>>> For example:
> >>>>>>>>>>> If I have the
> >>> following
> >>>>> documents
> >>>>>>> indexed: "nokia
> >>>>>>>>>> n95", "GPS", "android",
> >>>>>>>>>>> "samsung", "samsung
> >>> andriod",
> >>>>> "nokia
> >>>>>>> andriod", "mobile
> >>>>>>>>>> with GPS"
> >>>>>>>>>>> If I search with the
> >>> text
> >>>>> "samsung
> >>>>>>> andriod GPS",
> >>>>>>>>>> search results should
> >>>>>>>>>>> only
> >>>>>>>>>>> conain "samsung",
> >>> "GPS",
> >>>>> "andriod" and
> >>>>>>> "samsung
> >>>>>>>>>> andriod".
> >>>>>>>>>>> Is there a way to do
> >>> this in
> >>>>> Solr.
> >>>>>>>>>>> --
> >>>>>>>>>>> Thanks
> >>>>>>>>>>> Varun Gupta
> >>>>>>>
> >
> >
> 
> 
> --
> Matthew Hall
> Software Engineer
> Mouse Genome Informatics
> mhall@informatics.jax.org
> (207) 288-6012
> 


Re: How do I this in Solr?

Posted by Matthew Hall <mh...@informatics.jax.org>.
Bah.. nope this would miss documents that only match a subset of the 
given terms.

I'm going to have to go with Steven's approach as the right choice here.

Matt

On 10/26/2010 3:44 PM, Matthew Hall wrote:
> Indeed, I'd missed the second part of his requirements, my and 
> solution is sadly insufficient to this task.
>
> The combinatorial part of you solution worries me a bit though Steven, 
> because his documents that are on the larger side of his corpus would 
> likely slow down query performance a bit while the filter calculates 
> all of the possibilities for a given document.
>
> I'm wondering if a slightly hybrid approach would be valid:
>
> Have a filter that calculates the total number of terms for a given 
> document.  And then add a clause into your query at runtime that would 
> match what the filter would come up with:
>
> So:
>
> text:"Nokia" AND text:"Mobile" AND text:"GPS" AND termCount: 3
>
> Something like that anyhow.
>
> Matt
>
> On 10/26/2010 3:35 PM, Dennis Gearon wrote:
>> I'm the LAST person anyone will ever need to worry about flame 
>> baiting. You did notice that I retracted what I said and supported 
>> your point of view?
>>
>> Sorry if my cryptic comment sounded critical. I was wrong, you were 
>> right :-)
>> Dennis Gearon
>>
>> Signature Warning
>> ----------------
>> It is always a good idea to learn from your own mistakes. It is 
>> usually a better idea to learn from others’ mistakes, so you do not 
>> have to make them yourself. from 
>> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>
>> EARTH has a Right To Life,
>>    otherwise we all die.
>>
>>
>> --- On Tue, 10/26/10, Steven A Rowe<sa...@syr.edu>  wrote:
>>
>>> From: Steven A Rowe<sa...@syr.edu>
>>> Subject: RE: How do I this in Solr?
>>> To: "solr-user@lucene.apache.org"<so...@lucene.apache.org>
>>> Date: Tuesday, October 26, 2010, 12:27 PM
>>> Hi Dennis,
>>>
>>> You wrote:
>>>> If Solr is like Google, once documents matching only
>>> the ANDed items
>>>> in the query ran out, then those that had only two of
>>> the terms, then
>>>> only 1 of the terms, and then those close to it would
>>> start showing up.
>>> [...]
>>>> Plus, if he wants terms that contain ONLY those words,
>>> and no others, an
>>>> ANDed query would not do that, right? ANDed queries
>>> return results that
>>>> must have ALL the terms listed, and could have lots of
>>> other words, right?
>>>
>>> This is *exactly* what I just said: ANDed queries (i.e.,
>>> requiring all query terms) will not satisfy Varun's
>>> requirements.
>>>
>>> Your participation in this thread looks an awful lot like
>>> flame-bating: Someone else asks a question, I answer with a
>>> possible solution, you give a one-word "overkill" response,
>>> I say why it's not overkill.  You then ask if anybody
>>> knows the answer to the original question, and then parrot
>>> my response to your "overkill" statement.  Really????
>>>
>>> Get your shit together or shut up.  Please.
>>>
>>> Steve
>>>
>>>> -----Original Message-----
>>>> From: Dennis Gearon [mailto:gearond@sbcglobal.net]
>>>> Sent: Tuesday, October 26, 2010 3:14 PM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: RE: How do I this in Solr?
>>>>
>>>>
>>>>
>>>> Dennis Gearon
>>>>
>>>> Signature Warning
>>>> ----------------
>>>> It is always a good idea to learn from your own
>>> mistakes. It is usually a
>>>> better idea to learn from others’ mistakes, so you
>>> do not have to make
>>>> them yourself. from
>>>> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>>> EARTH has a Right To Life,
>>>>     otherwise we all die.
>>>>
>>>>
>>>> --- On Tue, 10/26/10, Steven A Rowe<sa...@syr.edu>
>>> wrote:
>>>>> From: Steven A Rowe<sa...@syr.edu>
>>>>> Subject: RE: How do I this in Solr?
>>>>> To: "solr-user@lucene.apache.org"
>>> <so...@lucene.apache.org>
>>>>> Date: Tuesday, October 26, 2010, 12:10 PM
>>>>> Dennis,
>>>>>
>>>>> Do you mean to say that you read my earlier post,
>>> and
>>>>> disagree that it would solve the problem?  Or
>>> have you
>>>>> simply not read it?
>>>>>
>>>>> Steve
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Dennis Gearon [mailto:gearond@sbcglobal.net]
>>>>>> Sent: Tuesday, October 26, 2010 3:00 PM
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Subject: RE: How do I this in Solr?
>>>>>>
>>>>>> Good point. Since I might need such a query
>>> myself
>>>>> someday, how *IS* that
>>>>>> done?
>>>>>>
>>>>>>
>>>>>> Dennis Gearon
>>>>>>
>>>>>> Signature Warning
>>>>>> ----------------
>>>>>> It is always a good idea to learn from your
>>> own
>>>>> mistakes. It is usually a
>>>>>> better idea to learn from others’
>>> mistakes, so you
>>>>> do not have to make
>>>>>> them yourself. from
>>>>>> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>>>>> EARTH has a Right To Life,
>>>>>>     otherwise we all die.
>>>>>>
>>>>>>
>>>>>> --- On Tue, 10/26/10, Steven A Rowe<sa...@syr.edu>
>>>>> wrote:
>>>>>>> From: Steven A Rowe<sa...@syr.edu>
>>>>>>> Subject: RE: How do I this in Solr?
>>>>>>> To: "solr-user@lucene.apache.org"
>>>>> <so...@lucene.apache.org>
>>>>>>> Date: Tuesday, October 26, 2010, 11:46
>>> AM
>>>>>>> Um, maybe I'm way off base, but when
>>>>>>> Varun said:
>>>>>>>
>>>>>>>> If I search with the text "samsung
>>> andriod
>>>>> GPS",
>>>>>>>> search results should only conain
>>> "samsung",
>>>>> "GPS",
>>>>>>>> "andriod" and "samsung andriod".
>>>>>>> I interpreted that to mean that hit
>>> documents
>>>>> should
>>>>>>> contain terms from the query, and
>>> nothing else.
>>>>> Making
>>>>>>> all terms required doesn't do this.
>>>>>>>
>>>>>>> Steve
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Matthew Hall [mailto:mhall@informatics.jax.org]
>>>>>>>> Sent: Tuesday, October 26, 2010
>>> 2:30 PM
>>>>>>>> To: solr-user@lucene.apache.org
>>>>>>>> Subject: Re: How do I this in
>>> Solr?
>>>>>>>> Um.. you could change your default
>>> clause to
>>>>> AND
>>>>>>> rather than or.
>>>>>>>> That should do the trick.
>>>>>>>>
>>>>>>>> Matt
>>>>>>>>
>>>>>>>> On 10/26/2010 2:26 PM, Dennis
>>> Gearon wrote:
>>>>>>>>> Overkill?
>>>>>>>>>
>>>>>>>>> Dennis Gearon
>>>>>>>>>> I can't think of a way to
>>> do it
>>>>> without
>>>>>>> writing new
>>>>>>>>>> analysis filters.
>>>>>>>>>>
>>>>>>>>>> But I think you could do
>>> what you
>>>>> want with
>>>>>>> two filters
>>>>>>>>>> (this is untested):
>>>>>>>>>>
>>>>>>>>>> 1. An index-time filter
>>> that
>>>>> outputs a single
>>>>>>> token
>>>>>>>>>> consisting of all of the
>>> input
>>>>> tokens, sorted
>>>>>>> in a
>>>>>>>>>> consistent way, e.g.:
>>>>>>>>>>
>>>>>>>>>>       "mobile with
>>> GPS"
>>>>>>> ->   "GPS mobile
>>>>>>>>>> with"
>>>>>>>>>>       "samsung
>>> android"
>>>>>>> ->   "android
>>>>>>>>>> samsung"
>>>>>>>>>>
>>>>>>>>>> 2. A query-time filter
>>> that outputs
>>>>> one token
>>>>>>> per input
>>>>>>>>>> term combination, sorted
>>> in the
>>>>> same
>>>>>>> consistent way as the
>>>>>>>>>> index-time filter, e.g.:
>>>>>>>>>>
>>>>>>>>>>       "samsung andriod
>>>>>>> GPS"
>>>>>>>>>>         ->
>>>>>>>>>>
>>> "samsung","android","GPS",
>>>>>>>>>>            "android
>>>>>>>>>> samsung","GPS
>>> samsung","android
>>>>> GPS"
>>>>>>>>>>            "android
>>>>>>> GPS
>>>>>>>>>> samsung"
>>>>>>>>>>
>>>>>>>>>> Steve
>>>>>>>>>>
>>>>>>>>>>> -----Original
>>> Message-----
>>>>>>>>>>> From: Varun Gupta
>>> [mailto:varun.vgupta@gmail.com]
>>>>>>>>>>> Sent: Tuesday,
>>> October 26, 2010
>>>>> 9:08 AM
>>>>>>>>>>> To: solr-user@lucene.apache.org
>>>>>>>>>>> Subject: How do I
>>> this in
>>>>> Solr?
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have lot of small
>>> documents
>>>>> (each
>>>>>>> containing 1 to 15
>>>>>>>>>> words) indexed in
>>>>>>>>>>> Solr. For the search
>>> query, I
>>>>> want the
>>>>>>> search results
>>>>>>>>>> to contain only
>>>>>>>>>>> those
>>>>>>>>>>> documents that
>>> satisfy this
>>>>> criteria "All
>>>>>>> of the words
>>>>>>>>>> of the search
>>>>>>>>>>> result
>>>>>>>>>>> document are present
>>> in the
>>>>> search
>>>>>>> query"
>>>>>>>>>>> For example:
>>>>>>>>>>> If I have the
>>> following
>>>>> documents
>>>>>>> indexed: "nokia
>>>>>>>>>> n95", "GPS", "android",
>>>>>>>>>>> "samsung", "samsung
>>> andriod",
>>>>> "nokia
>>>>>>> andriod", "mobile
>>>>>>>>>> with GPS"
>>>>>>>>>>> If I search with the
>>> text
>>>>> "samsung
>>>>>>> andriod GPS",
>>>>>>>>>> search results should
>>>>>>>>>>> only
>>>>>>>>>>> conain "samsung",
>>> "GPS",
>>>>> "andriod" and
>>>>>>> "samsung
>>>>>>>>>> andriod".
>>>>>>>>>>> Is there a way to do
>>> this in
>>>>> Solr.
>>>>>>>>>>> -- 
>>>>>>>>>>> Thanks
>>>>>>>>>>> Varun Gupta
>>>>>>>
>
>


-- 
Matthew Hall
Software Engineer
Mouse Genome Informatics
mhall@informatics.jax.org
(207) 288-6012



Re: How do I this in Solr?

Posted by Matthew Hall <mh...@informatics.jax.org>.
Indeed, I'd missed the second part of his requirements, my and solution 
is sadly insufficient to this task.

The combinatorial part of you solution worries me a bit though Steven, 
because his documents that are on the larger side of his corpus would 
likely slow down query performance a bit while the filter calculates all 
of the possibilities for a given document.

I'm wondering if a slightly hybrid approach would be valid:

Have a filter that calculates the total number of terms for a given 
document.  And then add a clause into your query at runtime that would 
match what the filter would come up with:

So:

text:"Nokia" AND text:"Mobile" AND text:"GPS" AND termCount: 3

Something like that anyhow.

Matt

On 10/26/2010 3:35 PM, Dennis Gearon wrote:
> I'm the LAST person anyone will ever need to worry about flame baiting. You did notice that I retracted what I said and supported your point of view?
>
> Sorry if my cryptic comment sounded critical. I was wrong, you were right :-)
> Dennis Gearon
>
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
> EARTH has a Right To Life,
>    otherwise we all die.
>
>
> --- On Tue, 10/26/10, Steven A Rowe<sa...@syr.edu>  wrote:
>
>> From: Steven A Rowe<sa...@syr.edu>
>> Subject: RE: How do I this in Solr?
>> To: "solr-user@lucene.apache.org"<so...@lucene.apache.org>
>> Date: Tuesday, October 26, 2010, 12:27 PM
>> Hi Dennis,
>>
>> You wrote:
>>> If Solr is like Google, once documents matching only
>> the ANDed items
>>> in the query ran out, then those that had only two of
>> the terms, then
>>> only 1 of the terms, and then those close to it would
>> start showing up.
>> [...]
>>> Plus, if he wants terms that contain ONLY those words,
>> and no others, an
>>> ANDed query would not do that, right? ANDed queries
>> return results that
>>> must have ALL the terms listed, and could have lots of
>> other words, right?
>>
>> This is *exactly* what I just said: ANDed queries (i.e.,
>> requiring all query terms) will not satisfy Varun's
>> requirements.
>>
>> Your participation in this thread looks an awful lot like
>> flame-bating: Someone else asks a question, I answer with a
>> possible solution, you give a one-word "overkill" response,
>> I say why it's not overkill.  You then ask if anybody
>> knows the answer to the original question, and then parrot
>> my response to your "overkill" statement.  Really????
>>
>> Get your shit together or shut up.  Please.
>>
>> Steve
>>
>>> -----Original Message-----
>>> From: Dennis Gearon [mailto:gearond@sbcglobal.net]
>>> Sent: Tuesday, October 26, 2010 3:14 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: RE: How do I this in Solr?
>>>
>>>
>>>
>>> Dennis Gearon
>>>
>>> Signature Warning
>>> ----------------
>>> It is always a good idea to learn from your own
>> mistakes. It is usually a
>>> better idea to learn from others’ mistakes, so you
>> do not have to make
>>> them yourself. from
>>> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>> EARTH has a Right To Life,
>>>     otherwise we all die.
>>>
>>>
>>> --- On Tue, 10/26/10, Steven A Rowe<sa...@syr.edu>
>> wrote:
>>>> From: Steven A Rowe<sa...@syr.edu>
>>>> Subject: RE: How do I this in Solr?
>>>> To: "solr-user@lucene.apache.org"
>> <so...@lucene.apache.org>
>>>> Date: Tuesday, October 26, 2010, 12:10 PM
>>>> Dennis,
>>>>
>>>> Do you mean to say that you read my earlier post,
>> and
>>>> disagree that it would solve the problem?  Or
>> have you
>>>> simply not read it?
>>>>
>>>> Steve
>>>>
>>>>> -----Original Message-----
>>>>> From: Dennis Gearon [mailto:gearond@sbcglobal.net]
>>>>> Sent: Tuesday, October 26, 2010 3:00 PM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: RE: How do I this in Solr?
>>>>>
>>>>> Good point. Since I might need such a query
>> myself
>>>> someday, how *IS* that
>>>>> done?
>>>>>
>>>>>
>>>>> Dennis Gearon
>>>>>
>>>>> Signature Warning
>>>>> ----------------
>>>>> It is always a good idea to learn from your
>> own
>>>> mistakes. It is usually a
>>>>> better idea to learn from others’
>> mistakes, so you
>>>> do not have to make
>>>>> them yourself. from
>>>>> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>>>> EARTH has a Right To Life,
>>>>>     otherwise we all die.
>>>>>
>>>>>
>>>>> --- On Tue, 10/26/10, Steven A Rowe<sa...@syr.edu>
>>>> wrote:
>>>>>> From: Steven A Rowe<sa...@syr.edu>
>>>>>> Subject: RE: How do I this in Solr?
>>>>>> To: "solr-user@lucene.apache.org"
>>>> <so...@lucene.apache.org>
>>>>>> Date: Tuesday, October 26, 2010, 11:46
>> AM
>>>>>> Um, maybe I'm way off base, but when
>>>>>> Varun said:
>>>>>>
>>>>>>> If I search with the text "samsung
>> andriod
>>>> GPS",
>>>>>>> search results should only conain
>> "samsung",
>>>> "GPS",
>>>>>>> "andriod" and "samsung andriod".
>>>>>> I interpreted that to mean that hit
>> documents
>>>> should
>>>>>> contain terms from the query, and
>> nothing else.
>>>> Making
>>>>>> all terms required doesn't do this.
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Matthew Hall [mailto:mhall@informatics.jax.org]
>>>>>>> Sent: Tuesday, October 26, 2010
>> 2:30 PM
>>>>>>> To: solr-user@lucene.apache.org
>>>>>>> Subject: Re: How do I this in
>> Solr?
>>>>>>> Um.. you could change your default
>> clause to
>>>> AND
>>>>>> rather than or.
>>>>>>> That should do the trick.
>>>>>>>
>>>>>>> Matt
>>>>>>>
>>>>>>> On 10/26/2010 2:26 PM, Dennis
>> Gearon wrote:
>>>>>>>> Overkill?
>>>>>>>>
>>>>>>>> Dennis Gearon
>>>>>>>>> I can't think of a way to
>> do it
>>>> without
>>>>>> writing new
>>>>>>>>> analysis filters.
>>>>>>>>>
>>>>>>>>> But I think you could do
>> what you
>>>> want with
>>>>>> two filters
>>>>>>>>> (this is untested):
>>>>>>>>>
>>>>>>>>> 1. An index-time filter
>> that
>>>> outputs a single
>>>>>> token
>>>>>>>>> consisting of all of the
>> input
>>>> tokens, sorted
>>>>>> in a
>>>>>>>>> consistent way, e.g.:
>>>>>>>>>
>>>>>>>>>       "mobile with
>> GPS"
>>>>>> ->   "GPS mobile
>>>>>>>>> with"
>>>>>>>>>       "samsung
>> android"
>>>>>> ->   "android
>>>>>>>>> samsung"
>>>>>>>>>
>>>>>>>>> 2. A query-time filter
>> that outputs
>>>> one token
>>>>>> per input
>>>>>>>>> term combination, sorted
>> in the
>>>> same
>>>>>> consistent way as the
>>>>>>>>> index-time filter, e.g.:
>>>>>>>>>
>>>>>>>>>       "samsung andriod
>>>>>> GPS"
>>>>>>>>>         ->
>>>>>>>>>
>> "samsung","android","GPS",
>>>>>>>>>            "android
>>>>>>>>> samsung","GPS
>> samsung","android
>>>> GPS"
>>>>>>>>>            "android
>>>>>> GPS
>>>>>>>>> samsung"
>>>>>>>>>
>>>>>>>>> Steve
>>>>>>>>>
>>>>>>>>>> -----Original
>> Message-----
>>>>>>>>>> From: Varun Gupta
>> [mailto:varun.vgupta@gmail.com]
>>>>>>>>>> Sent: Tuesday,
>> October 26, 2010
>>>> 9:08 AM
>>>>>>>>>> To: solr-user@lucene.apache.org
>>>>>>>>>> Subject: How do I
>> this in
>>>> Solr?
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have lot of small
>> documents
>>>> (each
>>>>>> containing 1 to 15
>>>>>>>>> words) indexed in
>>>>>>>>>> Solr. For the search
>> query, I
>>>> want the
>>>>>> search results
>>>>>>>>> to contain only
>>>>>>>>>> those
>>>>>>>>>> documents that
>> satisfy this
>>>> criteria "All
>>>>>> of the words
>>>>>>>>> of the search
>>>>>>>>>> result
>>>>>>>>>> document are present
>> in the
>>>> search
>>>>>> query"
>>>>>>>>>> For example:
>>>>>>>>>> If I have the
>> following
>>>> documents
>>>>>> indexed: "nokia
>>>>>>>>> n95", "GPS", "android",
>>>>>>>>>> "samsung", "samsung
>> andriod",
>>>> "nokia
>>>>>> andriod", "mobile
>>>>>>>>> with GPS"
>>>>>>>>>> If I search with the
>> text
>>>> "samsung
>>>>>> andriod GPS",
>>>>>>>>> search results should
>>>>>>>>>> only
>>>>>>>>>> conain "samsung",
>> "GPS",
>>>> "andriod" and
>>>>>> "samsung
>>>>>>>>> andriod".
>>>>>>>>>> Is there a way to do
>> this in
>>>> Solr.
>>>>>>>>>> --
>>>>>>>>>> Thanks
>>>>>>>>>> Varun Gupta
>>>>>>



RE: How do I this in Solr?

Posted by Dennis Gearon <ge...@sbcglobal.net>.
I'm the LAST person anyone will ever need to worry about flame baiting. You did notice that I retracted what I said and supported your point of view?

Sorry if my cryptic comment sounded critical. I was wrong, you were right :-)
Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe <sa...@syr.edu> wrote:

> From: Steven A Rowe <sa...@syr.edu>
> Subject: RE: How do I this in Solr?
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Date: Tuesday, October 26, 2010, 12:27 PM
> Hi Dennis,
> 
> You wrote:
> > If Solr is like Google, once documents matching only
> the ANDed items
> > in the query ran out, then those that had only two of
> the terms, then
> > only 1 of the terms, and then those close to it would
> start showing up.
> [...]
> > Plus, if he wants terms that contain ONLY those words,
> and no others, an
> > ANDed query would not do that, right? ANDed queries
> return results that
> > must have ALL the terms listed, and could have lots of
> other words, right?
> 
> This is *exactly* what I just said: ANDed queries (i.e.,
> requiring all query terms) will not satisfy Varun's
> requirements.
> 
> Your participation in this thread looks an awful lot like
> flame-bating: Someone else asks a question, I answer with a
> possible solution, you give a one-word "overkill" response,
> I say why it's not overkill.  You then ask if anybody
> knows the answer to the original question, and then parrot
> my response to your "overkill" statement.  Really????
> 
> Get your shit together or shut up.  Please.
> 
> Steve
> 
> > -----Original Message-----
> > From: Dennis Gearon [mailto:gearond@sbcglobal.net]
> > Sent: Tuesday, October 26, 2010 3:14 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: How do I this in Solr?
> > 
> > 
> > 
> > Dennis Gearon
> > 
> > Signature Warning
> > ----------------
> > It is always a good idea to learn from your own
> mistakes. It is usually a
> > better idea to learn from others’ mistakes, so you
> do not have to make
> > them yourself. from
> > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

> > 
> > EARTH has a Right To Life,
> >   otherwise we all die.
> > 
> > 
> > --- On Tue, 10/26/10, Steven A Rowe <sa...@syr.edu>
> wrote:
> > 
> > > From: Steven A Rowe <sa...@syr.edu>
> > > Subject: RE: How do I this in Solr?
> > > To: "solr-user@lucene.apache.org"
> <so...@lucene.apache.org>
> > > Date: Tuesday, October 26, 2010, 12:10 PM
> > > Dennis,
> > >
> > > Do you mean to say that you read my earlier post,
> and
> > > disagree that it would solve the problem?  Or
> have you
> > > simply not read it?
> > >
> > > Steve
> > >
> > > > -----Original Message-----
> > > > From: Dennis Gearon [mailto:gearond@sbcglobal.net]
> > > > Sent: Tuesday, October 26, 2010 3:00 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: RE: How do I this in Solr?
> > > >
> > > > Good point. Since I might need such a query
> myself
> > > someday, how *IS* that
> > > > done?
> > > >
> > > >
> > > > Dennis Gearon
> > > >
> > > > Signature Warning
> > > > ----------------
> > > > It is always a good idea to learn from your
> own
> > > mistakes. It is usually a
> > > > better idea to learn from others’
> mistakes, so you
> > > do not have to make
> > > > them yourself. from
> > > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

> > 
> > > >
> > > > EARTH has a Right To Life,
> > > >   otherwise we all die.
> > > >
> > > >
> > > > --- On Tue, 10/26/10, Steven A Rowe <sa...@syr.edu>
> > > wrote:
> > > >
> > > > > From: Steven A Rowe <sa...@syr.edu>
> > > > > Subject: RE: How do I this in Solr?
> > > > > To: "solr-user@lucene.apache.org"
> > > <so...@lucene.apache.org>
> > > > > Date: Tuesday, October 26, 2010, 11:46
> AM
> > > > > Um, maybe I'm way off base, but when
> > > > > Varun said:
> > > > >
> > > > > > If I search with the text "samsung
> andriod
> > > GPS",
> > > > > > search results should only conain
> "samsung",
> > > "GPS",
> > > > > > "andriod" and "samsung andriod".
> > > > >
> > > > > I interpreted that to mean that hit
> documents
> > > should
> > > > > contain terms from the query, and
> nothing else.
> > > Making
> > > > > all terms required doesn't do this.
> > > > >
> > > > > Steve
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Matthew Hall [mailto:mhall@informatics.jax.org]
> > > > > > Sent: Tuesday, October 26, 2010
> 2:30 PM
> > > > > > To: solr-user@lucene.apache.org
> > > > > > Subject: Re: How do I this in
> Solr?
> > > > > >
> > > > > > Um.. you could change your default
> clause to
> > > AND
> > > > > rather than or.
> > > > > >
> > > > > > That should do the trick.
> > > > > >
> > > > > > Matt
> > > > > >
> > > > > > On 10/26/2010 2:26 PM, Dennis
> Gearon wrote:
> > > > > > > Overkill?
> > > > > > >
> > > > > > > Dennis Gearon
> > > > > > >> I can't think of a way to
> do it
> > > without
> > > > > writing new
> > > > > > >> analysis filters.
> > > > > > >>
> > > > > > >> But I think you could do
> what you
> > > want with
> > > > > two filters
> > > > > > >> (this is untested):
> > > > > > >>
> > > > > > >> 1. An index-time filter
> that
> > > outputs a single
> > > > > token
> > > > > > >> consisting of all of the
> input
> > > tokens, sorted
> > > > > in a
> > > > > > >> consistent way, e.g.:
> > > > > > >>
> > > > > > >>     "mobile with
> GPS"
> > > > > ->  "GPS mobile
> > > > > > >> with"
> > > > > > >>     "samsung
> android"
> > > > > ->  "android
> > > > > > >> samsung"
> > > > > > >>
> > > > > > >> 2. A query-time filter
> that outputs
> > > one token
> > > > > per input
> > > > > > >> term combination, sorted
> in the
> > > same
> > > > > consistent way as the
> > > > > > >> index-time filter, e.g.:
> > > > > > >>
> > > > > > >>     "samsung andriod
> > > > > GPS"
> > > > > > >>       ->
> > > > > > >>
> "samsung","android","GPS",
> > > > > > >>          "android
> > > > > > >> samsung","GPS
> samsung","android
> > > GPS"
> > > > > > >>          "android
> > > > > GPS
> > > > > > >> samsung"
> > > > > > >>
> > > > > > >> Steve
> > > > > > >>
> > > > > > >>> -----Original
> Message-----
> > > > > > >>> From: Varun Gupta
> [mailto:varun.vgupta@gmail.com]
> > > > > > >>> Sent: Tuesday,
> October 26, 2010
> > > 9:08 AM
> > > > > > >>> To: solr-user@lucene.apache.org
> > > > > > >>> Subject: How do I
> this in
> > > Solr?
> > > > > > >>>
> > > > > > >>> Hi,
> > > > > > >>>
> > > > > > >>> I have lot of small
> documents
> > > (each
> > > > > containing 1 to 15
> > > > > > >> words) indexed in
> > > > > > >>> Solr. For the search
> query, I
> > > want the
> > > > > search results
> > > > > > >> to contain only
> > > > > > >>> those
> > > > > > >>> documents that
> satisfy this
> > > criteria "All
> > > > > of the words
> > > > > > >> of the search
> > > > > > >>> result
> > > > > > >>> document are present
> in the
> > > search
> > > > > query"
> > > > > > >>>
> > > > > > >>> For example:
> > > > > > >>> If I have the
> following
> > > documents
> > > > > indexed: "nokia
> > > > > > >> n95", "GPS", "android",
> > > > > > >>> "samsung", "samsung
> andriod",
> > > "nokia
> > > > > andriod", "mobile
> > > > > > >> with GPS"
> > > > > > >>> If I search with the
> text
> > > "samsung
> > > > > andriod GPS",
> > > > > > >> search results should
> > > > > > >>> only
> > > > > > >>> conain "samsung",
> "GPS",
> > > "andriod" and
> > > > > "samsung
> > > > > > >> andriod".
> > > > > > >>> Is there a way to do
> this in
> > > Solr.
> > > > > > >>>
> > > > > > >>> --
> > > > > > >>> Thanks
> > > > > > >>> Varun Gupta
> > > > >
> > > > >
> > >
> 

RE: How do I this in Solr?

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Dennis,

You wrote:
> If Solr is like Google, once documents matching only the ANDed items
> in the query ran out, then those that had only two of the terms, then
> only 1 of the terms, and then those close to it would start showing up.
[...]
> Plus, if he wants terms that contain ONLY those words, and no others, an
> ANDed query would not do that, right? ANDed queries return results that
> must have ALL the terms listed, and could have lots of other words, right?

This is *exactly* what I just said: ANDed queries (i.e., requiring all query terms) will not satisfy Varun's requirements.

Your participation in this thread looks an awful lot like flame-bating: Someone else asks a question, I answer with a possible solution, you give a one-word "overkill" response, I say why it's not overkill.  You then ask if anybody knows the answer to the original question, and then parrot my response to your "overkill" statement.  Really????

Get your shit together or shut up.  Please.

Steve

> -----Original Message-----
> From: Dennis Gearon [mailto:gearond@sbcglobal.net]
> Sent: Tuesday, October 26, 2010 3:14 PM
> To: solr-user@lucene.apache.org
> Subject: RE: How do I this in Solr?
> 
> 
> 
> Dennis Gearon
> 
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a
> better idea to learn from others’ mistakes, so you do not have to make
> them yourself. from
> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> EARTH has a Right To Life,
>   otherwise we all die.
> 
> 
> --- On Tue, 10/26/10, Steven A Rowe <sa...@syr.edu> wrote:
> 
> > From: Steven A Rowe <sa...@syr.edu>
> > Subject: RE: How do I this in Solr?
> > To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> > Date: Tuesday, October 26, 2010, 12:10 PM
> > Dennis,
> >
> > Do you mean to say that you read my earlier post, and
> > disagree that it would solve the problem?  Or have you
> > simply not read it?
> >
> > Steve
> >
> > > -----Original Message-----
> > > From: Dennis Gearon [mailto:gearond@sbcglobal.net]
> > > Sent: Tuesday, October 26, 2010 3:00 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: RE: How do I this in Solr?
> > >
> > > Good point. Since I might need such a query myself
> > someday, how *IS* that
> > > done?
> > >
> > >
> > > Dennis Gearon
> > >
> > > Signature Warning
> > > ----------------
> > > It is always a good idea to learn from your own
> > mistakes. It is usually a
> > > better idea to learn from others’ mistakes, so you
> > do not have to make
> > > them yourself. from
> > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> > >
> > > EARTH has a Right To Life,
> > >   otherwise we all die.
> > >
> > >
> > > --- On Tue, 10/26/10, Steven A Rowe <sa...@syr.edu>
> > wrote:
> > >
> > > > From: Steven A Rowe <sa...@syr.edu>
> > > > Subject: RE: How do I this in Solr?
> > > > To: "solr-user@lucene.apache.org"
> > <so...@lucene.apache.org>
> > > > Date: Tuesday, October 26, 2010, 11:46 AM
> > > > Um, maybe I'm way off base, but when
> > > > Varun said:
> > > >
> > > > > If I search with the text "samsung andriod
> > GPS",
> > > > > search results should only conain "samsung",
> > "GPS",
> > > > > "andriod" and "samsung andriod".
> > > >
> > > > I interpreted that to mean that hit documents
> > should
> > > > contain terms from the query, and nothing else.
> > Making
> > > > all terms required doesn't do this.
> > > >
> > > > Steve
> > > >
> > > > > -----Original Message-----
> > > > > From: Matthew Hall [mailto:mhall@informatics.jax.org]
> > > > > Sent: Tuesday, October 26, 2010 2:30 PM
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: How do I this in Solr?
> > > > >
> > > > > Um.. you could change your default clause to
> > AND
> > > > rather than or.
> > > > >
> > > > > That should do the trick.
> > > > >
> > > > > Matt
> > > > >
> > > > > On 10/26/2010 2:26 PM, Dennis Gearon wrote:
> > > > > > Overkill?
> > > > > >
> > > > > > Dennis Gearon
> > > > > >> I can't think of a way to do it
> > without
> > > > writing new
> > > > > >> analysis filters.
> > > > > >>
> > > > > >> But I think you could do what you
> > want with
> > > > two filters
> > > > > >> (this is untested):
> > > > > >>
> > > > > >> 1. An index-time filter that
> > outputs a single
> > > > token
> > > > > >> consisting of all of the input
> > tokens, sorted
> > > > in a
> > > > > >> consistent way, e.g.:
> > > > > >>
> > > > > >>     "mobile with GPS"
> > > > ->  "GPS mobile
> > > > > >> with"
> > > > > >>     "samsung android"
> > > > ->  "android
> > > > > >> samsung"
> > > > > >>
> > > > > >> 2. A query-time filter that outputs
> > one token
> > > > per input
> > > > > >> term combination, sorted in the
> > same
> > > > consistent way as the
> > > > > >> index-time filter, e.g.:
> > > > > >>
> > > > > >>     "samsung andriod
> > > > GPS"
> > > > > >>       ->
> > > > > >> "samsung","android","GPS",
> > > > > >>          "android
> > > > > >> samsung","GPS samsung","android
> > GPS"
> > > > > >>          "android
> > > > GPS
> > > > > >> samsung"
> > > > > >>
> > > > > >> Steve
> > > > > >>
> > > > > >>> -----Original Message-----
> > > > > >>> From: Varun Gupta [mailto:varun.vgupta@gmail.com]
> > > > > >>> Sent: Tuesday, October 26, 2010
> > 9:08 AM
> > > > > >>> To: solr-user@lucene.apache.org
> > > > > >>> Subject: How do I this in
> > Solr?
> > > > > >>>
> > > > > >>> Hi,
> > > > > >>>
> > > > > >>> I have lot of small documents
> > (each
> > > > containing 1 to 15
> > > > > >> words) indexed in
> > > > > >>> Solr. For the search query, I
> > want the
> > > > search results
> > > > > >> to contain only
> > > > > >>> those
> > > > > >>> documents that satisfy this
> > criteria "All
> > > > of the words
> > > > > >> of the search
> > > > > >>> result
> > > > > >>> document are present in the
> > search
> > > > query"
> > > > > >>>
> > > > > >>> For example:
> > > > > >>> If I have the following
> > documents
> > > > indexed: "nokia
> > > > > >> n95", "GPS", "android",
> > > > > >>> "samsung", "samsung andriod",
> > "nokia
> > > > andriod", "mobile
> > > > > >> with GPS"
> > > > > >>> If I search with the text
> > "samsung
> > > > andriod GPS",
> > > > > >> search results should
> > > > > >>> only
> > > > > >>> conain "samsung", "GPS",
> > "andriod" and
> > > > "samsung
> > > > > >> andriod".
> > > > > >>> Is there a way to do this in
> > Solr.
> > > > > >>>
> > > > > >>> --
> > > > > >>> Thanks
> > > > > >>> Varun Gupta
> > > >
> > > >
> >

RE: How do I this in Solr?

Posted by Dennis Gearon <ge...@sbcglobal.net>.
Plus, if he wants terms that contain ONLY those words, and no others, an ANDed query would not do that, right? ANDed queries return results that must have ALL the terms listed, and could have lots of other words, right?


Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe <sa...@syr.edu> wrote:

> From: Steven A Rowe <sa...@syr.edu>
> Subject: RE: How do I this in Solr?
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Date: Tuesday, October 26, 2010, 12:10 PM
> Dennis,
> 
> Do you mean to say that you read my earlier post, and
> disagree that it would solve the problem?  Or have you
> simply not read it?
> 
> Steve
> 
> > -----Original Message-----
> > From: Dennis Gearon [mailto:gearond@sbcglobal.net]
> > Sent: Tuesday, October 26, 2010 3:00 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: How do I this in Solr?
> > 
> > Good point. Since I might need such a query myself
> someday, how *IS* that
> > done?
> > 
> > 
> > Dennis Gearon
> > 
> > Signature Warning
> > ----------------
> > It is always a good idea to learn from your own
> mistakes. It is usually a
> > better idea to learn from others’ mistakes, so you
> do not have to make
> > them yourself. from
> > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

> > 
> > EARTH has a Right To Life,
> >   otherwise we all die.
> > 
> > 
> > --- On Tue, 10/26/10, Steven A Rowe <sa...@syr.edu>
> wrote:
> > 
> > > From: Steven A Rowe <sa...@syr.edu>
> > > Subject: RE: How do I this in Solr?
> > > To: "solr-user@lucene.apache.org"
> <so...@lucene.apache.org>
> > > Date: Tuesday, October 26, 2010, 11:46 AM
> > > Um, maybe I'm way off base, but when
> > > Varun said:
> > >
> > > > If I search with the text "samsung andriod
> GPS",
> > > > search results should only conain "samsung",
> "GPS",
> > > > "andriod" and "samsung andriod".
> > >
> > > I interpreted that to mean that hit documents
> should
> > > contain terms from the query, and nothing else. 
> Making
> > > all terms required doesn't do this.
> > >
> > > Steve
> > >
> > > > -----Original Message-----
> > > > From: Matthew Hall [mailto:mhall@informatics.jax.org]
> > > > Sent: Tuesday, October 26, 2010 2:30 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: How do I this in Solr?
> > > >
> > > > Um.. you could change your default clause to
> AND
> > > rather than or.
> > > >
> > > > That should do the trick.
> > > >
> > > > Matt
> > > >
> > > > On 10/26/2010 2:26 PM, Dennis Gearon wrote:
> > > > > Overkill?
> > > > >
> > > > > Dennis Gearon
> > > > >> I can't think of a way to do it
> without
> > > writing new
> > > > >> analysis filters.
> > > > >>
> > > > >> But I think you could do what you
> want with
> > > two filters
> > > > >> (this is untested):
> > > > >>
> > > > >> 1. An index-time filter that
> outputs a single
> > > token
> > > > >> consisting of all of the input
> tokens, sorted
> > > in a
> > > > >> consistent way, e.g.:
> > > > >>
> > > > >>     "mobile with GPS"
> > > ->  "GPS mobile
> > > > >> with"
> > > > >>     "samsung android"
> > > ->  "android
> > > > >> samsung"
> > > > >>
> > > > >> 2. A query-time filter that outputs
> one token
> > > per input
> > > > >> term combination, sorted in the
> same
> > > consistent way as the
> > > > >> index-time filter, e.g.:
> > > > >>
> > > > >>     "samsung andriod
> > > GPS"
> > > > >>       ->
> > > > >> "samsung","android","GPS",
> > > > >>          "android
> > > > >> samsung","GPS samsung","android
> GPS"
> > > > >>          "android
> > > GPS
> > > > >> samsung"
> > > > >>
> > > > >> Steve
> > > > >>
> > > > >>> -----Original Message-----
> > > > >>> From: Varun Gupta [mailto:varun.vgupta@gmail.com]
> > > > >>> Sent: Tuesday, October 26, 2010
> 9:08 AM
> > > > >>> To: solr-user@lucene.apache.org
> > > > >>> Subject: How do I this in
> Solr?
> > > > >>>
> > > > >>> Hi,
> > > > >>>
> > > > >>> I have lot of small documents
> (each
> > > containing 1 to 15
> > > > >> words) indexed in
> > > > >>> Solr. For the search query, I
> want the
> > > search results
> > > > >> to contain only
> > > > >>> those
> > > > >>> documents that satisfy this
> criteria "All
> > > of the words
> > > > >> of the search
> > > > >>> result
> > > > >>> document are present in the
> search
> > > query"
> > > > >>>
> > > > >>> For example:
> > > > >>> If I have the following
> documents
> > > indexed: "nokia
> > > > >> n95", "GPS", "android",
> > > > >>> "samsung", "samsung andriod",
> "nokia
> > > andriod", "mobile
> > > > >> with GPS"
> > > > >>> If I search with the text
> "samsung
> > > andriod GPS",
> > > > >> search results should
> > > > >>> only
> > > > >>> conain "samsung", "GPS",
> "andriod" and
> > > "samsung
> > > > >> andriod".
> > > > >>> Is there a way to do this in
> Solr.
> > > > >>>
> > > > >>> --
> > > > >>> Thanks
> > > > >>> Varun Gupta
> > >
> > >
> 

RE: How do I this in Solr?

Posted by Dennis Gearon <ge...@sbcglobal.net>.
If Solr is like Google, once documents matching only the ANDed items in the query ran out, then those that had only two of the terms, then only 1 of the terms, and then those close to it would start showing up.

Is this correct?

If so, it wouldn't match his requirements.

Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe <sa...@syr.edu> wrote:

> From: Steven A Rowe <sa...@syr.edu>
> Subject: RE: How do I this in Solr?
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Date: Tuesday, October 26, 2010, 12:10 PM
> Dennis,
> 
> Do you mean to say that you read my earlier post, and
> disagree that it would solve the problem?  Or have you
> simply not read it?
> 
> Steve
> 
> > -----Original Message-----
> > From: Dennis Gearon [mailto:gearond@sbcglobal.net]
> > Sent: Tuesday, October 26, 2010 3:00 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: How do I this in Solr?
> > 
> > Good point. Since I might need such a query myself
> someday, how *IS* that
> > done?
> > 
> > 
> > Dennis Gearon
> > 
> > Signature Warning
> > ----------------
> > It is always a good idea to learn from your own
> mistakes. It is usually a
> > better idea to learn from others’ mistakes, so you
> do not have to make
> > them yourself. from
> > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

> > 
> > EARTH has a Right To Life,
> >   otherwise we all die.
> > 
> > 
> > --- On Tue, 10/26/10, Steven A Rowe <sa...@syr.edu>
> wrote:
> > 
> > > From: Steven A Rowe <sa...@syr.edu>
> > > Subject: RE: How do I this in Solr?
> > > To: "solr-user@lucene.apache.org"
> <so...@lucene.apache.org>
> > > Date: Tuesday, October 26, 2010, 11:46 AM
> > > Um, maybe I'm way off base, but when
> > > Varun said:
> > >
> > > > If I search with the text "samsung andriod
> GPS",
> > > > search results should only conain "samsung",
> "GPS",
> > > > "andriod" and "samsung andriod".
> > >
> > > I interpreted that to mean that hit documents
> should
> > > contain terms from the query, and nothing else. 
> Making
> > > all terms required doesn't do this.
> > >
> > > Steve
> > >
> > > > -----Original Message-----
> > > > From: Matthew Hall [mailto:mhall@informatics.jax.org]
> > > > Sent: Tuesday, October 26, 2010 2:30 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: How do I this in Solr?
> > > >
> > > > Um.. you could change your default clause to
> AND
> > > rather than or.
> > > >
> > > > That should do the trick.
> > > >
> > > > Matt
> > > >
> > > > On 10/26/2010 2:26 PM, Dennis Gearon wrote:
> > > > > Overkill?
> > > > >
> > > > > Dennis Gearon
> > > > >> I can't think of a way to do it
> without
> > > writing new
> > > > >> analysis filters.
> > > > >>
> > > > >> But I think you could do what you
> want with
> > > two filters
> > > > >> (this is untested):
> > > > >>
> > > > >> 1. An index-time filter that
> outputs a single
> > > token
> > > > >> consisting of all of the input
> tokens, sorted
> > > in a
> > > > >> consistent way, e.g.:
> > > > >>
> > > > >>     "mobile with GPS"
> > > ->  "GPS mobile
> > > > >> with"
> > > > >>     "samsung android"
> > > ->  "android
> > > > >> samsung"
> > > > >>
> > > > >> 2. A query-time filter that outputs
> one token
> > > per input
> > > > >> term combination, sorted in the
> same
> > > consistent way as the
> > > > >> index-time filter, e.g.:
> > > > >>
> > > > >>     "samsung andriod
> > > GPS"
> > > > >>       ->
> > > > >> "samsung","android","GPS",
> > > > >>          "android
> > > > >> samsung","GPS samsung","android
> GPS"
> > > > >>          "android
> > > GPS
> > > > >> samsung"
> > > > >>
> > > > >> Steve
> > > > >>
> > > > >>> -----Original Message-----
> > > > >>> From: Varun Gupta [mailto:varun.vgupta@gmail.com]
> > > > >>> Sent: Tuesday, October 26, 2010
> 9:08 AM
> > > > >>> To: solr-user@lucene.apache.org
> > > > >>> Subject: How do I this in
> Solr?
> > > > >>>
> > > > >>> Hi,
> > > > >>>
> > > > >>> I have lot of small documents
> (each
> > > containing 1 to 15
> > > > >> words) indexed in
> > > > >>> Solr. For the search query, I
> want the
> > > search results
> > > > >> to contain only
> > > > >>> those
> > > > >>> documents that satisfy this
> criteria "All
> > > of the words
> > > > >> of the search
> > > > >>> result
> > > > >>> document are present in the
> search
> > > query"
> > > > >>>
> > > > >>> For example:
> > > > >>> If I have the following
> documents
> > > indexed: "nokia
> > > > >> n95", "GPS", "android",
> > > > >>> "samsung", "samsung andriod",
> "nokia
> > > andriod", "mobile
> > > > >> with GPS"
> > > > >>> If I search with the text
> "samsung
> > > andriod GPS",
> > > > >> search results should
> > > > >>> only
> > > > >>> conain "samsung", "GPS",
> "andriod" and
> > > "samsung
> > > > >> andriod".
> > > > >>> Is there a way to do this in
> Solr.
> > > > >>>
> > > > >>> --
> > > > >>> Thanks
> > > > >>> Varun Gupta
> > >
> > >
> 

RE: How do I this in Solr?

Posted by Steven A Rowe <sa...@syr.edu>.
Dennis,

Do you mean to say that you read my earlier post, and disagree that it would solve the problem?  Or have you simply not read it?

Steve

> -----Original Message-----
> From: Dennis Gearon [mailto:gearond@sbcglobal.net]
> Sent: Tuesday, October 26, 2010 3:00 PM
> To: solr-user@lucene.apache.org
> Subject: RE: How do I this in Solr?
> 
> Good point. Since I might need such a query myself someday, how *IS* that
> done?
> 
> 
> Dennis Gearon
> 
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a
> better idea to learn from others’ mistakes, so you do not have to make
> them yourself. from
> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> EARTH has a Right To Life,
>   otherwise we all die.
> 
> 
> --- On Tue, 10/26/10, Steven A Rowe <sa...@syr.edu> wrote:
> 
> > From: Steven A Rowe <sa...@syr.edu>
> > Subject: RE: How do I this in Solr?
> > To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> > Date: Tuesday, October 26, 2010, 11:46 AM
> > Um, maybe I'm way off base, but when
> > Varun said:
> >
> > > If I search with the text "samsung andriod GPS",
> > > search results should only conain "samsung", "GPS",
> > > "andriod" and "samsung andriod".
> >
> > I interpreted that to mean that hit documents should
> > contain terms from the query, and nothing else.  Making
> > all terms required doesn't do this.
> >
> > Steve
> >
> > > -----Original Message-----
> > > From: Matthew Hall [mailto:mhall@informatics.jax.org]
> > > Sent: Tuesday, October 26, 2010 2:30 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: How do I this in Solr?
> > >
> > > Um.. you could change your default clause to AND
> > rather than or.
> > >
> > > That should do the trick.
> > >
> > > Matt
> > >
> > > On 10/26/2010 2:26 PM, Dennis Gearon wrote:
> > > > Overkill?
> > > >
> > > > Dennis Gearon
> > > >> I can't think of a way to do it without
> > writing new
> > > >> analysis filters.
> > > >>
> > > >> But I think you could do what you want with
> > two filters
> > > >> (this is untested):
> > > >>
> > > >> 1. An index-time filter that outputs a single
> > token
> > > >> consisting of all of the input tokens, sorted
> > in a
> > > >> consistent way, e.g.:
> > > >>
> > > >>     "mobile with GPS"
> > ->  "GPS mobile
> > > >> with"
> > > >>     "samsung android"
> > ->  "android
> > > >> samsung"
> > > >>
> > > >> 2. A query-time filter that outputs one token
> > per input
> > > >> term combination, sorted in the same
> > consistent way as the
> > > >> index-time filter, e.g.:
> > > >>
> > > >>     "samsung andriod
> > GPS"
> > > >>       ->
> > > >> "samsung","android","GPS",
> > > >>          "android
> > > >> samsung","GPS samsung","android GPS"
> > > >>          "android
> > GPS
> > > >> samsung"
> > > >>
> > > >> Steve
> > > >>
> > > >>> -----Original Message-----
> > > >>> From: Varun Gupta [mailto:varun.vgupta@gmail.com]
> > > >>> Sent: Tuesday, October 26, 2010 9:08 AM
> > > >>> To: solr-user@lucene.apache.org
> > > >>> Subject: How do I this in Solr?
> > > >>>
> > > >>> Hi,
> > > >>>
> > > >>> I have lot of small documents (each
> > containing 1 to 15
> > > >> words) indexed in
> > > >>> Solr. For the search query, I want the
> > search results
> > > >> to contain only
> > > >>> those
> > > >>> documents that satisfy this criteria "All
> > of the words
> > > >> of the search
> > > >>> result
> > > >>> document are present in the search
> > query"
> > > >>>
> > > >>> For example:
> > > >>> If I have the following documents
> > indexed: "nokia
> > > >> n95", "GPS", "android",
> > > >>> "samsung", "samsung andriod", "nokia
> > andriod", "mobile
> > > >> with GPS"
> > > >>> If I search with the text "samsung
> > andriod GPS",
> > > >> search results should
> > > >>> only
> > > >>> conain "samsung", "GPS", "andriod" and
> > "samsung
> > > >> andriod".
> > > >>> Is there a way to do this in Solr.
> > > >>>
> > > >>> --
> > > >>> Thanks
> > > >>> Varun Gupta
> >
> >

RE: How do I this in Solr?

Posted by Dennis Gearon <ge...@sbcglobal.net>.
Good point. Since I might need such a query myself someday, how *IS* that done?


Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe <sa...@syr.edu> wrote:

> From: Steven A Rowe <sa...@syr.edu>
> Subject: RE: How do I this in Solr?
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Date: Tuesday, October 26, 2010, 11:46 AM
> Um, maybe I'm way off base, but when
> Varun said:
> 
> > If I search with the text "samsung andriod GPS",
> > search results should only conain "samsung", "GPS",
> > "andriod" and "samsung andriod".
> 
> I interpreted that to mean that hit documents should
> contain terms from the query, and nothing else.  Making
> all terms required doesn't do this.
> 
> Steve
> 
> > -----Original Message-----
> > From: Matthew Hall [mailto:mhall@informatics.jax.org]
> > Sent: Tuesday, October 26, 2010 2:30 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: How do I this in Solr?
> > 
> > Um.. you could change your default clause to AND
> rather than or.
> > 
> > That should do the trick.
> > 
> > Matt
> > 
> > On 10/26/2010 2:26 PM, Dennis Gearon wrote:
> > > Overkill?
> > >
> > > Dennis Gearon
> > >> I can't think of a way to do it without
> writing new
> > >> analysis filters.
> > >>
> > >> But I think you could do what you want with
> two filters
> > >> (this is untested):
> > >>
> > >> 1. An index-time filter that outputs a single
> token
> > >> consisting of all of the input tokens, sorted
> in a
> > >> consistent way, e.g.:
> > >>
> > >>     "mobile with GPS"
> ->  "GPS mobile
> > >> with"
> > >>     "samsung android"
> ->  "android
> > >> samsung"
> > >>
> > >> 2. A query-time filter that outputs one token
> per input
> > >> term combination, sorted in the same
> consistent way as the
> > >> index-time filter, e.g.:
> > >>
> > >>     "samsung andriod
> GPS"
> > >>       ->
> > >> "samsung","android","GPS",
> > >>          "android
> > >> samsung","GPS samsung","android GPS"
> > >>          "android
> GPS
> > >> samsung"
> > >>
> > >> Steve
> > >>
> > >>> -----Original Message-----
> > >>> From: Varun Gupta [mailto:varun.vgupta@gmail.com]
> > >>> Sent: Tuesday, October 26, 2010 9:08 AM
> > >>> To: solr-user@lucene.apache.org
> > >>> Subject: How do I this in Solr?
> > >>>
> > >>> Hi,
> > >>>
> > >>> I have lot of small documents (each
> containing 1 to 15
> > >> words) indexed in
> > >>> Solr. For the search query, I want the
> search results
> > >> to contain only
> > >>> those
> > >>> documents that satisfy this criteria "All
> of the words
> > >> of the search
> > >>> result
> > >>> document are present in the search
> query"
> > >>>
> > >>> For example:
> > >>> If I have the following documents
> indexed: "nokia
> > >> n95", "GPS", "android",
> > >>> "samsung", "samsung andriod", "nokia
> andriod", "mobile
> > >> with GPS"
> > >>> If I search with the text "samsung
> andriod GPS",
> > >> search results should
> > >>> only
> > >>> conain "samsung", "GPS", "andriod" and
> "samsung
> > >> andriod".
> > >>> Is there a way to do this in Solr.
> > >>>
> > >>> --
> > >>> Thanks
> > >>> Varun Gupta
> 
> 

RE: How do I this in Solr?

Posted by Steven A Rowe <sa...@syr.edu>.
Um, maybe I'm way off base, but when Varun said:

> If I search with the text "samsung andriod GPS",
> search results should only conain "samsung", "GPS",
> "andriod" and "samsung andriod".

I interpreted that to mean that hit documents should contain terms from the query, and nothing else.  Making all terms required doesn't do this.

Steve

> -----Original Message-----
> From: Matthew Hall [mailto:mhall@informatics.jax.org]
> Sent: Tuesday, October 26, 2010 2:30 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How do I this in Solr?
> 
> Um.. you could change your default clause to AND rather than or.
> 
> That should do the trick.
> 
> Matt
> 
> On 10/26/2010 2:26 PM, Dennis Gearon wrote:
> > Overkill?
> >
> > Dennis Gearon
> >> I can't think of a way to do it without writing new
> >> analysis filters.
> >>
> >> But I think you could do what you want with two filters
> >> (this is untested):
> >>
> >> 1. An index-time filter that outputs a single token
> >> consisting of all of the input tokens, sorted in a
> >> consistent way, e.g.:
> >>
> >>     "mobile with GPS" ->  "GPS mobile
> >> with"
> >>     "samsung android" ->  "android
> >> samsung"
> >>
> >> 2. A query-time filter that outputs one token per input
> >> term combination, sorted in the same consistent way as the
> >> index-time filter, e.g.:
> >>
> >>     "samsung andriod GPS"
> >>       ->
> >> "samsung","android","GPS",
> >>          "android
> >> samsung","GPS samsung","android GPS"
> >>          "android GPS
> >> samsung"
> >>
> >> Steve
> >>
> >>> -----Original Message-----
> >>> From: Varun Gupta [mailto:varun.vgupta@gmail.com]
> >>> Sent: Tuesday, October 26, 2010 9:08 AM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: How do I this in Solr?
> >>>
> >>> Hi,
> >>>
> >>> I have lot of small documents (each containing 1 to 15
> >> words) indexed in
> >>> Solr. For the search query, I want the search results
> >> to contain only
> >>> those
> >>> documents that satisfy this criteria "All of the words
> >> of the search
> >>> result
> >>> document are present in the search query"
> >>>
> >>> For example:
> >>> If I have the following documents indexed: "nokia
> >> n95", "GPS", "android",
> >>> "samsung", "samsung andriod", "nokia andriod", "mobile
> >> with GPS"
> >>> If I search with the text "samsung andriod GPS",
> >> search results should
> >>> only
> >>> conain "samsung", "GPS", "andriod" and "samsung
> >> andriod".
> >>> Is there a way to do this in Solr.
> >>>
> >>> --
> >>> Thanks
> >>> Varun Gupta


Re: How do I this in Solr?

Posted by Matthew Hall <mh...@informatics.jax.org>.
Um.. you could change your default clause to AND rather than or.

That should do the trick.

Matt

On 10/26/2010 2:26 PM, Dennis Gearon wrote:
> Overkill?
>
> Dennis Gearon
>> I can't think of a way to do it without writing new
>> analysis filters.
>>
>> But I think you could do what you want with two filters
>> (this is untested):
>>
>> 1. An index-time filter that outputs a single token
>> consisting of all of the input tokens, sorted in a
>> consistent way, e.g.:
>>
>>     "mobile with GPS" ->  "GPS mobile
>> with"
>>     "samsung android" ->  "android
>> samsung"
>>
>> 2. A query-time filter that outputs one token per input
>> term combination, sorted in the same consistent way as the
>> index-time filter, e.g.:
>>
>>     "samsung andriod GPS"
>>       ->    
>> "samsung","android","GPS",
>>          "android
>> samsung","GPS samsung","android GPS"
>>          "android GPS
>> samsung"
>>
>> Steve
>>
>>> -----Original Message-----
>>> From: Varun Gupta [mailto:varun.vgupta@gmail.com]
>>> Sent: Tuesday, October 26, 2010 9:08 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: How do I this in Solr?
>>>
>>> Hi,
>>>
>>> I have lot of small documents (each containing 1 to 15
>> words) indexed in
>>> Solr. For the search query, I want the search results
>> to contain only
>>> those
>>> documents that satisfy this criteria "All of the words
>> of the search
>>> result
>>> document are present in the search query"
>>>
>>> For example:
>>> If I have the following documents indexed: "nokia
>> n95", "GPS", "android",
>>> "samsung", "samsung andriod", "nokia andriod", "mobile
>> with GPS"
>>> If I search with the text "samsung andriod GPS",
>> search results should
>>> only
>>> conain "samsung", "GPS", "andriod" and "samsung
>> andriod".
>>> Is there a way to do this in Solr.
>>>
>>> --
>>> Thanks
>>> Varun Gupta


RE: How do I this in Solr?

Posted by Dennis Gearon <ge...@sbcglobal.net>.
Overkill?

Dennis Gearon
> 
> I can't think of a way to do it without writing new
> analysis filters.
> 
> But I think you could do what you want with two filters
> (this is untested):
> 
> 1. An index-time filter that outputs a single token
> consisting of all of the input tokens, sorted in a
> consistent way, e.g.:
> 
>    "mobile with GPS" -> "GPS mobile
> with"
>    "samsung android" -> "android
> samsung"
> 
> 2. A query-time filter that outputs one token per input
> term combination, sorted in the same consistent way as the
> index-time filter, e.g.:
> 
>    "samsung andriod GPS"
>      ->   
> "samsung","android","GPS",
>         "android
> samsung","GPS samsung","android GPS"
>         "android GPS
> samsung"
> 
> Steve
> 
> > -----Original Message-----
> > From: Varun Gupta [mailto:varun.vgupta@gmail.com]
> > Sent: Tuesday, October 26, 2010 9:08 AM
> > To: solr-user@lucene.apache.org
> > Subject: How do I this in Solr?
> > 
> > Hi,
> > 
> > I have lot of small documents (each containing 1 to 15
> words) indexed in
> > Solr. For the search query, I want the search results
> to contain only
> > those
> > documents that satisfy this criteria "All of the words
> of the search
> > result
> > document are present in the search query"
> > 
> > For example:
> > If I have the following documents indexed: "nokia
> n95", "GPS", "android",
> > "samsung", "samsung andriod", "nokia andriod", "mobile
> with GPS"
> > 
> > If I search with the text "samsung andriod GPS",
> search results should
> > only
> > conain "samsung", "GPS", "andriod" and "samsung
> andriod".
> > 
> > Is there a way to do this in Solr.
> > 
> > --
> > Thanks
> > Varun Gupta
> 

RE: How do I this in Solr?

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Varun,

I can't think of a way to do it without writing new analysis filters.

But I think you could do what you want with two filters (this is untested):

1. An index-time filter that outputs a single token consisting of all of the input tokens, sorted in a consistent way, e.g.:

   "mobile with GPS" -> "GPS mobile with"
   "samsung android" -> "android samsung"

2. A query-time filter that outputs one token per input term combination, sorted in the same consistent way as the index-time filter, e.g.:

   "samsung andriod GPS"
	 ->	"samsung","android","GPS",
		"android samsung","GPS samsung","android GPS"
		"android GPS samsung"

Steve

> -----Original Message-----
> From: Varun Gupta [mailto:varun.vgupta@gmail.com]
> Sent: Tuesday, October 26, 2010 9:08 AM
> To: solr-user@lucene.apache.org
> Subject: How do I this in Solr?
> 
> Hi,
> 
> I have lot of small documents (each containing 1 to 15 words) indexed in
> Solr. For the search query, I want the search results to contain only
> those
> documents that satisfy this criteria "All of the words of the search
> result
> document are present in the search query"
> 
> For example:
> If I have the following documents indexed: "nokia n95", "GPS", "android",
> "samsung", "samsung andriod", "nokia andriod", "mobile with GPS"
> 
> If I search with the text "samsung andriod GPS", search results should
> only
> conain "samsung", "GPS", "andriod" and "samsung andriod".
> 
> Is there a way to do this in Solr.
> 
> --
> Thanks
> Varun Gupta

Re: How do I this in Solr?

Posted by Ken Stanley <do...@gmail.com>.
On Tue, Oct 26, 2010 at 9:15 AM, Savvas-Andreas Moysidis <
savvas.andreas.moysidis@googlemail.com> wrote:

> If I get your question right, you probably want to use the AND binary
> operator as in "samsung AND andriod AND GPS" or "+samsung +andriod +GPS"
>
>
N.b. For these queries you can also pass the q.op parameter in the request
to temporarily change the default operator to AND; this has the same effect
without having to build the query; i.e., you can just pass
"http://host:port/solr/select?q=samsung+android+gps&q.op=and"
as the query string (along with any other params you need).

Re: How do I this in Solr?

Posted by Savvas-Andreas Moysidis <sa...@googlemail.com>.
If I get your question right, you probably want to use the AND binary
operator as in "samsung AND andriod AND GPS" or "+samsung +andriod +GPS"

On 26 October 2010 14:07, Varun Gupta <va...@gmail.com> wrote:

> Hi,
>
> I have lot of small documents (each containing 1 to 15 words) indexed in
> Solr. For the search query, I want the search results to contain only those
> documents that satisfy this criteria "All of the words of the search result
> document are present in the search query"
>
> For example:
> If I have the following documents indexed: "nokia n95", "GPS", "android",
> "samsung", "samsung andriod", "nokia andriod", "mobile with GPS"
>
> If I search with the text "samsung andriod GPS", search results should only
> conain "samsung", "GPS", "andriod" and "samsung andriod".
>
> Is there a way to do this in Solr.
>
> --
> Thanks
> Varun Gupta
>