You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nicolas Paris <ni...@gmail.com> on 2018/04/22 18:04:33 UTC

query bag of word with negation

Hello

I wonder if there is a plain text query syntax to say:
give me all document that match:

wonderful pizza NOT peperoni

all those in a 5 distance word bag
then

pizza are wonderful -> would match
I made a wonderful pasta and pizza -> would match
Peperoni pizza are so wonderful -> would not match

I tested:
"wonderful pizza - peperoni"~5
without success

Thanks

Re: query bag of word with negation

Posted by Nicolas Paris <ni...@gmail.com>.
  1. Query terms containing other than just letters or digits may be placed
>> within double quotes so that  those other characters do not separate a term
>> into many terms. A dot (period) and white space are neither  letter nor
>> digit. Examples: "Now is the time for all good men" (spaces, quotes impose
>> ordering too), "goods.doc" (a dot).
>
>

> 2. Mode button "or" (the default) means match one or more terms, perhaps
>> scattered about. Mode button "and" means must match all terms, scattered or
>> not.
>
>

> 3. A one word query term may be prefixed by title: or url: to search on
>> those fields. A space must follow the colon, and the search term is case
>> sensitive. Examples: url: .ppt or title: Goodies. Many docs do not have a
>> formal internal title field, thus prefix title: may not work.
>
>

> 4. Compound queries can be built by joining terms with and or - and group
>> items with ( ). Not is expressed as a minus sign prefixing a term. A bare
>> space means use the Mode (or, and). Example: Nancy and Mary and -Jane and
>> -(Robert Daniel) which means both the first two and not Jane and neither of
>> the two guys.
>
>



5. A query of asterisk/star (*) means match everything. Examples: * for
>> everything (zero or more characters). Fussy, show all without term .pdf *
>> and -".pdf" For normal queries the program uses the edismax interface. A
>> few, such as url: foobar, reference the Lucene interface. This is specified
>> by the qagent= parameter, of edismax or empty respectively, in a search
>> request. Thus regular facilities can do most of this work.
>
>


> What this example does not address is your distance 5 critera. However,
>> the NOT facility may do the trick for you, though a minus sign is taken as
>> a literal minus sign or word separator if located within a quoted string.
>
>
​​Indeed sadly words can be anywhere in the document ​ (no notion of
distance​)

Thanks, Joe D.
>
>
​Thanks for the 5 details anyway​

Re: query bag of word with negation

Posted by Joe Doupnik <jr...@netlab1.net>.
On 22/04/2018 19:26, Joe Doupnik wrote:
> On 22/04/2018 19:04, Nicolas Paris wrote:
>> Hello
>>
>> I wonder if there is a plain text query syntax to say:
>> give me all document that match:
>>
>> wonderful pizza NOT peperoni
>>
>> all those in a 5 distance word bag
>> then
>>
>> pizza are wonderful -> would match
>> I made a wonderful pasta and pizza -> would match
>> Peperoni pizza are so wonderful -> would not match
>>
>> I tested:
>> "wonderful pizza - peperoni"~5
>> without success
>>
>> Thanks
>>
> ---------------
>     A partial answer to your question is contained in this Help screen 
> text from my Solr query program:
>
> Some hints about using this facility: 1. Query terms containing other 
> than just letters or digits may be placed within double quotes so that 
>  those other characters do not separate a term into many terms. A dot 
> (period) and white space are neither  letter nor digit. Examples: "Now 
> is the time for all good men" (spaces, quotes impose ordering too), 
> "goods.doc" (a dot). 2. Mode button "or" (the default) means match one 
> or more terms, perhaps scattered about. Mode button "and" means must 
> match all terms, scattered or not. 3. A one word query term may be 
> prefixed by title: or url: to search on those fields. A space must 
> follow the colon, and the search term is case sensitive. Examples: 
> url: .ppt or title: Goodies. Many docs do not have a formal internal 
> title field, thus prefix title: may not work. 4. Compound queries can 
> be built by joining terms with and or - and group items with ( ). Not 
> is expressed as a minus sign prefixing a term. A bare space means use 
> the Mode (or, and). Example: Nancy and Mary and -Jane and -(Robert 
> Daniel) which means both the first two and not Jane and neither of the 
> two guys. 5. A query of asterisk/star (*) means match everything. 
> Examples: * for everything (zero or more characters). Fussy, show all 
> without term .pdf * and -".pdf" For normal queries the program uses 
> the edismax interface. A few, such as url: foobar, reference the 
> Lucene interface. This is specified by the qagent= parameter, of 
> edismax or empty respectively, in a search request. Thus regular 
> facilities can do most of this work. What this example does not 
> address is your distance 5 critera. However, the NOT facility may do 
> the trick for you, though a minus sign is taken as a literal minus 
> sign or word separator if located within a quoted string. Thanks, Joe D.
>
>
----------
     Golly, that was well and truly munged by the receiver. Let me try 
again -
>     A partial answer to your question is contained in this Help screen 
> text from my Solr query program:
> Some hints about using this facility: 1. Query terms containing other 
> than just letters or digits may be placed within double quotes so that 
>  those other characters do not separate a term into many terms. A dot 
> (period) and white space are neither  letter nor digit. Examples: "Now 
> is the time for all good men" (spaces, quotes impose ordering too), 
> "goods.doc" (a dot). 2. Mode button "or" (the default) means match one 
> or more terms, perhaps scattered about. Mode button "and" means must 
> match all terms, scattered or not. 3. A one word query term may be 
> prefixed by title: or url: to search on those fields. A space must 
> follow the colon, and the search term is case sensitive. Examples: 
> url: .ppt or title: Goodies. Many docs do not have a formal internal 
> title field, thus prefix title: may not work. 4. Compound queries can 
> be built by joining terms with and or - and group items with ( ). Not 
> is expressed as a minus sign prefixing a term. A bare space means use 
> the Mode (or, and). Example: Nancy and Mary and -Jane and -(Robert 
> Daniel) which means both the first two and not Jane and neither of the 
> two guys. 5. A query of asterisk/star (*) means match everything. 
> Examples: * for everything (zero or more characters). Fussy, show all 
> without term .pdf * and -".pdf" For normal queries the program uses 
> the edismax interface. A few, such as url: foobar, reference the 
> Lucene interface. This is specified by the qagent= parameter, of 
> edismax or empty respectively, in a search request. Thus regular 
> facilities can do most of this work. What this example does not 
> address is your distance 5 critera. However, the NOT facility may do 
> the trick for you, though a minus sign is taken as a literal minus 
> sign or word separator if located within a quoted string.
     Hopefully that will be more readable.
     Thanks,
     Joe D.

Re: query bag of word with negation

Posted by Joe Doupnik <jr...@netlab1.net>.
On 22/04/2018 19:04, Nicolas Paris wrote:
> Hello
>
> I wonder if there is a plain text query syntax to say:
> give me all document that match:
>
> wonderful pizza NOT peperoni
>
> all those in a 5 distance word bag
> then
>
> pizza are wonderful -> would match
> I made a wonderful pasta and pizza -> would match
> Peperoni pizza are so wonderful -> would not match
>
> I tested:
> "wonderful pizza - peperoni"~5
> without success
>
> Thanks
>
---------------
     A partial answer to your question is contained in this Help screen 
text from my Solr query program:

Some hints about using this facility: 1. Query terms containing other 
than just letters or digits may be placed within double quotes so that 
  those other characters do not separate a term into many terms. A dot 
(period) and white space are neither  letter nor digit. Examples: "Now 
is the time for all good men" (spaces, quotes impose ordering too), 
"goods.doc" (a dot). 2. Mode button "or" (the default) means match one 
or more terms, perhaps scattered about. Mode button "and" means must 
match all terms, scattered or not. 3. A one word query term may be 
prefixed by title: or url: to search on those fields. A space must 
follow the colon, and the search term is case sensitive. Examples: url: 
.ppt or title: Goodies. Many docs do not have a formal internal title 
field, thus prefix title: may not work. 4. Compound queries can be built 
by joining terms with and or - and group items with ( ). Not is 
expressed as a minus sign prefixing a term. A bare space means use the 
Mode (or, and). Example: Nancy and Mary and -Jane and -(Robert Daniel) 
which means both the first two and not Jane and neither of the two guys. 
5. A query of asterisk/star (*) means match everything. Examples: * for 
everything (zero or more characters). Fussy, show all without term .pdf 
* and -".pdf" For normal queries the program uses the edismax interface. 
A few, such as url: foobar, reference the Lucene interface. This is 
specified by the qagent= parameter, of edismax or empty respectively, in 
a search request. Thus regular facilities can do most of this work. What 
this example does not address is your distance 5 critera. However, the 
NOT facility may do the trick for you, though a minus sign is taken as a 
literal minus sign or word separator if located within a quoted string. 
Thanks, Joe D.