You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Antonio Zippo <re...@yahoo.it> on 2008/11/24 12:12:41 UTC

AND query on multivalue text

hi all,

is it possible to have and AND query on a multivalue text?

i need to extract the record only if the words are contained inside the same value

for example
1st record:

<arr name="myText">
<str>The U.S. government has announced a massive rescue package for Citigroup, saying it would guarantee more than $300 billion in company assets</str>
<str>while injecting an additional $20 billion in capital into the embattled bank. </str>
</arr>

2nd record
<arr name="myText">
<str>bla bla bla guarantee bla bla bla</str>
<str>while injecting an additional $20 billion in capital into the embattled bank. </str>
</arr>


i need to search something as 
myText:billion AND guarantee

i need to be extracted only the record where the words exists in the same value (in this case only the first record) because in the 2nd record the two words are in different values

is it possible?

thanks


      

Re: AND query on multivalue text

Posted by Antonio Zippo <re...@yahoo.it>.


>On Nov 24, 2008, at 8:52 AM, Erik Hatcher wrote:

> 
> On Nov 24, 2008, at 8:37 AM, David Santamauro wrote:
>>>> i need to search something as
>>>> myText:billion AND guarantee
>>>> 
>>>> i need to be extracted only the record where the words exists in the same value (in this case only the first record) because in the 2nd record the two words are in different values
>>>> 
>>>> is it possible?
>>> 
>>> It's not possible with a purely boolean query like this, but it is possible with a sloppy phrase query where the position increment gap (see example schema.xml) is greater than the slop factor.
>>> 
>>>     Erik
>>> 
>> 
>> 
>> I think what is needed here is the concept of SAME, i.e., myText:billion SAME guarantee. I know a few full-text engines that can handle this operator one way or another. And without it, I don't quick understand the usefulness of multiValue fields.
> 
> Yeah, multi-valued fields are a bit awkward to grasp fully in Lucene.  Especially in this context where it's a full-text field.  Basically as far as indexing goes, there's no such thing as a "multi-valued" field.  An indexed field gets split into terms, and terms have positional information attached to them (thus a position increment gap can be used to but a big virtual gap between the last term of one field instance and the first term of the next one).  A multi-valued field gets stored (if it is set to be stored, that is) as separate strings, and is retrievable as the separate values.
> 
> Multi-valued fields are handy for facets where, say, a product can have multiple categories associated with it.  In this case it's a bit clearer.  It's the full-text multi-valued fields that seem a bit strange.
> 
>     Erik
> 

> 
> OK, it seems it is the multi-dimensional aspect that is missing
> 
> field[0]: A B C D
> field[1]:   B   D
> 
> ...and the concept of field array would need to be introduced (probably at the lucene level).
> 
> Do you know if there has been any serious thought given to this, i.e., the possibility of introducing a new SAME operator or is this a corner-case not > > worthy?
> 
> thanks
> David
> 

thanks for all the replies

maybe this could be an interesting request for the developers

bye


      

Re: AND query on multivalue text

Posted by David Santamauro <da...@gmail.com>.
On Nov 24, 2008, at 8:52 AM, Erik Hatcher wrote:

>
> On Nov 24, 2008, at 8:37 AM, David Santamauro wrote:
>>>> i need to search something as
>>>> myText:billion AND guarantee
>>>>
>>>> i need to be extracted only the record where the words exists in  
>>>> the same value (in this case only the first record) because in  
>>>> the 2nd record the two words are in different values
>>>>
>>>> is it possible?
>>>
>>> It's not possible with a purely boolean query like this, but it is  
>>> possible with a sloppy phrase query where the position increment  
>>> gap (see example schema.xml) is greater than the slop factor.
>>>
>>> 	Erik
>>>
>>
>>
>> I think what is needed here is the concept of SAME, i.e.,  
>> myText:billion SAME guarantee. I know a few full-text engines that  
>> can handle this operator one way or another. And without it, I  
>> don't quick understand the usefulness of multiValue fields.
>
> Yeah, multi-valued fields are a bit awkward to grasp fully in  
> Lucene.  Especially in this context where it's a full-text field.   
> Basically as far as indexing goes, there's no such thing as a "multi- 
> valued" field.  An indexed field gets split into terms, and terms  
> have positional information attached to them (thus a position  
> increment gap can be used to but a big virtual gap between the last  
> term of one field instance and the first term of the next one).  A  
> multi-valued field gets stored (if it is set to be stored, that is)  
> as separate strings, and is retrievable as the separate values.
>
> Multi-valued fields are handy for facets where, say, a product can  
> have multiple categories associated with it.  In this case it's a  
> bit clearer.  It's the full-text multi-valued fields that seem a bit  
> strange.
>
> 	Erik
>


OK, it seems it is the multi-dimensional aspect that is missing

field[0]: A B C D
field[1]:   B   D

...and the concept of field array would need to be introduced  
(probably at the lucene level).

Do you know if there has been any serious thought given to this, i.e.,  
the possibility of introducing a new SAME operator or is this a corner- 
case not worthy?

thanks
David






Re: AND query on multivalue text

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Nov 24, 2008, at 8:37 AM, David Santamauro wrote:
>>> i need to search something as
>>> myText:billion AND guarantee
>>>
>>> i need to be extracted only the record where the words exists in  
>>> the same value (in this case only the first record) because in the  
>>> 2nd record the two words are in different values
>>>
>>> is it possible?
>>
>> It's not possible with a purely boolean query like this, but it is  
>> possible with a sloppy phrase query where the position increment  
>> gap (see example schema.xml) is greater than the slop factor.
>>
>> 	Erik
>>
>
>
> I think what is needed here is the concept of SAME, i.e.,  
> myText:billion SAME guarantee. I know a few full-text engines that  
> can handle this operator one way or another. And without it, I don't  
> quick understand the usefulness of multiValue fields.

Yeah, multi-valued fields are a bit awkward to grasp fully in Lucene.   
Especially in this context where it's a full-text field.  Basically as  
far as indexing goes, there's no such thing as a "multi-valued"  
field.  An indexed field gets split into terms, and terms have  
positional information attached to them (thus a position increment gap  
can be used to but a big virtual gap between the last term of one  
field instance and the first term of the next one).  A multi-valued  
field gets stored (if it is set to be stored, that is) as separate  
strings, and is retrievable as the separate values.

Multi-valued fields are handy for facets where, say, a product can  
have multiple categories associated with it.  In this case it's a bit  
clearer.  It's the full-text multi-valued fields that seem a bit  
strange.

	Erik


Re: AND query on multivalue text

Posted by David Santamauro <da...@gmail.com>.
Hello all, I'm new to the list but want to say great work! ... see  
comment below

On Nov 24, 2008, at 7:59 AM, Erik Hatcher wrote:

>
> On Nov 24, 2008, at 6:12 AM, Antonio Zippo wrote:
>> is it possible to have and AND query on a multivalue text?
>>
>> i need to search something as
>> myText:billion AND guarantee
>>
>> i need to be extracted only the record where the words exists in  
>> the same value (in this case only the first record) because in the  
>> 2nd record the two words are in different values
>>
>> is it possible?
>
> It's not possible with a purely boolean query like this, but it is  
> possible with a sloppy phrase query where the position increment gap  
> (see example schema.xml) is greater than the slop factor.
>
> 	Erik
>


I think what is needed here is the concept of SAME, i.e.,  
myText:billion SAME guarantee. I know a few full-text engines that can  
handle this operator one way or another. And without it, I don't quick  
understand the usefulness of multiValue fields.

David






Re: AND query on multivalue text

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Nov 24, 2008, at 6:12 AM, Antonio Zippo wrote:
> is it possible to have and AND query on a multivalue text?
>
> i need to extract the record only if the words are contained inside  
> the same value
>
> for example
> 1st record:
>
> <arr name="myText">
> <str>The U.S. government has announced a massive rescue package for  
> Citigroup, saying it would guarantee more than $300 billion in  
> company assets</str>
> <str>while injecting an additional $20 billion in capital into the  
> embattled bank. </str>
> </arr>
>
> 2nd record
> <arr name="myText">
> <str>bla bla bla guarantee bla bla bla</str>
> <str>while injecting an additional $20 billion in capital into the  
> embattled bank. </str>
> </arr>
>
>
> i need to search something as
> myText:billion AND guarantee
>
> i need to be extracted only the record where the words exists in the  
> same value (in this case only the first record) because in the 2nd  
> record the two words are in different values
>
> is it possible?

It's not possible with a purely boolean query like this, but it is  
possible with a sloppy phrase query where the position increment gap  
(see example schema.xml) is greater than the slop factor.

	Erik