You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nicolas Paris <ni...@gmail.com> on 2018/04/22 18:04:33 UTC
query bag of word with negation
Hello
I wonder if there is a plain text query syntax to say:
give me all document that match:
wonderful pizza NOT peperoni
all those in a 5 distance word bag
then
pizza are wonderful -> would match
I made a wonderful pasta and pizza -> would match
Peperoni pizza are so wonderful -> would not match
I tested:
"wonderful pizza - peperoni"~5
without success
Thanks
Re: query bag of word with negation
Posted by Nicolas Paris <ni...@gmail.com>.
1. Query terms containing other than just letters or digits may be placed
>> within double quotes so that those other characters do not separate a term
>> into many terms. A dot (period) and white space are neither letter nor
>> digit. Examples: "Now is the time for all good men" (spaces, quotes impose
>> ordering too), "goods.doc" (a dot).
>
>
> 2. Mode button "or" (the default) means match one or more terms, perhaps
>> scattered about. Mode button "and" means must match all terms, scattered or
>> not.
>
>
> 3. A one word query term may be prefixed by title: or url: to search on
>> those fields. A space must follow the colon, and the search term is case
>> sensitive. Examples: url: .ppt or title: Goodies. Many docs do not have a
>> formal internal title field, thus prefix title: may not work.
>
>
> 4. Compound queries can be built by joining terms with and or - and group
>> items with ( ). Not is expressed as a minus sign prefixing a term. A bare
>> space means use the Mode (or, and). Example: Nancy and Mary and -Jane and
>> -(Robert Daniel) which means both the first two and not Jane and neither of
>> the two guys.
>
>
5. A query of asterisk/star (*) means match everything. Examples: * for
>> everything (zero or more characters). Fussy, show all without term .pdf *
>> and -".pdf" For normal queries the program uses the edismax interface. A
>> few, such as url: foobar, reference the Lucene interface. This is specified
>> by the qagent= parameter, of edismax or empty respectively, in a search
>> request. Thus regular facilities can do most of this work.
>
>
> What this example does not address is your distance 5 critera. However,
>> the NOT facility may do the trick for you, though a minus sign is taken as
>> a literal minus sign or word separator if located within a quoted string.
>
>
Indeed sadly words can be anywhere in the document (no notion of
distance)
Thanks, Joe D.
>
>
Thanks for the 5 details anyway
Re: query bag of word with negation
Posted by Joe Doupnik <jr...@netlab1.net>.
On 22/04/2018 19:26, Joe Doupnik wrote:
> On 22/04/2018 19:04, Nicolas Paris wrote:
>> Hello
>>
>> I wonder if there is a plain text query syntax to say:
>> give me all document that match:
>>
>> wonderful pizza NOT peperoni
>>
>> all those in a 5 distance word bag
>> then
>>
>> pizza are wonderful -> would match
>> I made a wonderful pasta and pizza -> would match
>> Peperoni pizza are so wonderful -> would not match
>>
>> I tested:
>> "wonderful pizza - peperoni"~5
>> without success
>>
>> Thanks
>>
> ---------------
> A partial answer to your question is contained in this Help screen
> text from my Solr query program:
>
> Some hints about using this facility: 1. Query terms containing other
> than just letters or digits may be placed within double quotes so that
> those other characters do not separate a term into many terms. A dot
> (period) and white space are neither letter nor digit. Examples: "Now
> is the time for all good men" (spaces, quotes impose ordering too),
> "goods.doc" (a dot). 2. Mode button "or" (the default) means match one
> or more terms, perhaps scattered about. Mode button "and" means must
> match all terms, scattered or not. 3. A one word query term may be
> prefixed by title: or url: to search on those fields. A space must
> follow the colon, and the search term is case sensitive. Examples:
> url: .ppt or title: Goodies. Many docs do not have a formal internal
> title field, thus prefix title: may not work. 4. Compound queries can
> be built by joining terms with and or - and group items with ( ). Not
> is expressed as a minus sign prefixing a term. A bare space means use
> the Mode (or, and). Example: Nancy and Mary and -Jane and -(Robert
> Daniel) which means both the first two and not Jane and neither of the
> two guys. 5. A query of asterisk/star (*) means match everything.
> Examples: * for everything (zero or more characters). Fussy, show all
> without term .pdf * and -".pdf" For normal queries the program uses
> the edismax interface. A few, such as url: foobar, reference the
> Lucene interface. This is specified by the qagent= parameter, of
> edismax or empty respectively, in a search request. Thus regular
> facilities can do most of this work. What this example does not
> address is your distance 5 critera. However, the NOT facility may do
> the trick for you, though a minus sign is taken as a literal minus
> sign or word separator if located within a quoted string. Thanks, Joe D.
>
>
----------
Golly, that was well and truly munged by the receiver. Let me try
again -
> A partial answer to your question is contained in this Help screen
> text from my Solr query program:
> Some hints about using this facility: 1. Query terms containing other
> than just letters or digits may be placed within double quotes so that
> those other characters do not separate a term into many terms. A dot
> (period) and white space are neither letter nor digit. Examples: "Now
> is the time for all good men" (spaces, quotes impose ordering too),
> "goods.doc" (a dot). 2. Mode button "or" (the default) means match one
> or more terms, perhaps scattered about. Mode button "and" means must
> match all terms, scattered or not. 3. A one word query term may be
> prefixed by title: or url: to search on those fields. A space must
> follow the colon, and the search term is case sensitive. Examples:
> url: .ppt or title: Goodies. Many docs do not have a formal internal
> title field, thus prefix title: may not work. 4. Compound queries can
> be built by joining terms with and or - and group items with ( ). Not
> is expressed as a minus sign prefixing a term. A bare space means use
> the Mode (or, and). Example: Nancy and Mary and -Jane and -(Robert
> Daniel) which means both the first two and not Jane and neither of the
> two guys. 5. A query of asterisk/star (*) means match everything.
> Examples: * for everything (zero or more characters). Fussy, show all
> without term .pdf * and -".pdf" For normal queries the program uses
> the edismax interface. A few, such as url: foobar, reference the
> Lucene interface. This is specified by the qagent= parameter, of
> edismax or empty respectively, in a search request. Thus regular
> facilities can do most of this work. What this example does not
> address is your distance 5 critera. However, the NOT facility may do
> the trick for you, though a minus sign is taken as a literal minus
> sign or word separator if located within a quoted string.
Hopefully that will be more readable.
Thanks,
Joe D.
Re: query bag of word with negation
Posted by Joe Doupnik <jr...@netlab1.net>.
On 22/04/2018 19:04, Nicolas Paris wrote:
> Hello
>
> I wonder if there is a plain text query syntax to say:
> give me all document that match:
>
> wonderful pizza NOT peperoni
>
> all those in a 5 distance word bag
> then
>
> pizza are wonderful -> would match
> I made a wonderful pasta and pizza -> would match
> Peperoni pizza are so wonderful -> would not match
>
> I tested:
> "wonderful pizza - peperoni"~5
> without success
>
> Thanks
>
---------------
A partial answer to your question is contained in this Help screen
text from my Solr query program:
Some hints about using this facility: 1. Query terms containing other
than just letters or digits may be placed within double quotes so that
those other characters do not separate a term into many terms. A dot
(period) and white space are neither letter nor digit. Examples: "Now
is the time for all good men" (spaces, quotes impose ordering too),
"goods.doc" (a dot). 2. Mode button "or" (the default) means match one
or more terms, perhaps scattered about. Mode button "and" means must
match all terms, scattered or not. 3. A one word query term may be
prefixed by title: or url: to search on those fields. A space must
follow the colon, and the search term is case sensitive. Examples: url:
.ppt or title: Goodies. Many docs do not have a formal internal title
field, thus prefix title: may not work. 4. Compound queries can be built
by joining terms with and or - and group items with ( ). Not is
expressed as a minus sign prefixing a term. A bare space means use the
Mode (or, and). Example: Nancy and Mary and -Jane and -(Robert Daniel)
which means both the first two and not Jane and neither of the two guys.
5. A query of asterisk/star (*) means match everything. Examples: * for
everything (zero or more characters). Fussy, show all without term .pdf
* and -".pdf" For normal queries the program uses the edismax interface.
A few, such as url: foobar, reference the Lucene interface. This is
specified by the qagent= parameter, of edismax or empty respectively, in
a search request. Thus regular facilities can do most of this work. What
this example does not address is your distance 5 critera. However, the
NOT facility may do the trick for you, though a minus sign is taken as a
literal minus sign or word separator if located within a quoted string.
Thanks, Joe D.