You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Antonio Zippo <re...@yahoo.it> on 2008/11/24 12:12:41 UTC
AND query on multivalue text
hi all,
is it possible to have and AND query on a multivalue text?
i need to extract the record only if the words are contained inside the same value
for example
1st record:
<arr name="myText">
<str>The U.S. government has announced a massive rescue package for Citigroup, saying it would guarantee more than $300 billion in company assets</str>
<str>while injecting an additional $20 billion in capital into the embattled bank. </str>
</arr>
2nd record
<arr name="myText">
<str>bla bla bla guarantee bla bla bla</str>
<str>while injecting an additional $20 billion in capital into the embattled bank. </str>
</arr>
i need to search something as
myText:billion AND guarantee
i need to be extracted only the record where the words exists in the same value (in this case only the first record) because in the 2nd record the two words are in different values
is it possible?
thanks
Re: AND query on multivalue text
Posted by Antonio Zippo <re...@yahoo.it>.
>On Nov 24, 2008, at 8:52 AM, Erik Hatcher wrote:
>
> On Nov 24, 2008, at 8:37 AM, David Santamauro wrote:
>>>> i need to search something as
>>>> myText:billion AND guarantee
>>>>
>>>> i need to be extracted only the record where the words exists in the same value (in this case only the first record) because in the 2nd record the two words are in different values
>>>>
>>>> is it possible?
>>>
>>> It's not possible with a purely boolean query like this, but it is possible with a sloppy phrase query where the position increment gap (see example schema.xml) is greater than the slop factor.
>>>
>>> Erik
>>>
>>
>>
>> I think what is needed here is the concept of SAME, i.e., myText:billion SAME guarantee. I know a few full-text engines that can handle this operator one way or another. And without it, I don't quick understand the usefulness of multiValue fields.
>
> Yeah, multi-valued fields are a bit awkward to grasp fully in Lucene. Especially in this context where it's a full-text field. Basically as far as indexing goes, there's no such thing as a "multi-valued" field. An indexed field gets split into terms, and terms have positional information attached to them (thus a position increment gap can be used to but a big virtual gap between the last term of one field instance and the first term of the next one). A multi-valued field gets stored (if it is set to be stored, that is) as separate strings, and is retrievable as the separate values.
>
> Multi-valued fields are handy for facets where, say, a product can have multiple categories associated with it. In this case it's a bit clearer. It's the full-text multi-valued fields that seem a bit strange.
>
> Erik
>
>
> OK, it seems it is the multi-dimensional aspect that is missing
>
> field[0]: A B C D
> field[1]: B D
>
> ...and the concept of field array would need to be introduced (probably at the lucene level).
>
> Do you know if there has been any serious thought given to this, i.e., the possibility of introducing a new SAME operator or is this a corner-case not > > worthy?
>
> thanks
> David
>
thanks for all the replies
maybe this could be an interesting request for the developers
bye
Re: AND query on multivalue text
Posted by David Santamauro <da...@gmail.com>.
On Nov 24, 2008, at 8:52 AM, Erik Hatcher wrote:
>
> On Nov 24, 2008, at 8:37 AM, David Santamauro wrote:
>>>> i need to search something as
>>>> myText:billion AND guarantee
>>>>
>>>> i need to be extracted only the record where the words exists in
>>>> the same value (in this case only the first record) because in
>>>> the 2nd record the two words are in different values
>>>>
>>>> is it possible?
>>>
>>> It's not possible with a purely boolean query like this, but it is
>>> possible with a sloppy phrase query where the position increment
>>> gap (see example schema.xml) is greater than the slop factor.
>>>
>>> Erik
>>>
>>
>>
>> I think what is needed here is the concept of SAME, i.e.,
>> myText:billion SAME guarantee. I know a few full-text engines that
>> can handle this operator one way or another. And without it, I
>> don't quick understand the usefulness of multiValue fields.
>
> Yeah, multi-valued fields are a bit awkward to grasp fully in
> Lucene. Especially in this context where it's a full-text field.
> Basically as far as indexing goes, there's no such thing as a "multi-
> valued" field. An indexed field gets split into terms, and terms
> have positional information attached to them (thus a position
> increment gap can be used to but a big virtual gap between the last
> term of one field instance and the first term of the next one). A
> multi-valued field gets stored (if it is set to be stored, that is)
> as separate strings, and is retrievable as the separate values.
>
> Multi-valued fields are handy for facets where, say, a product can
> have multiple categories associated with it. In this case it's a
> bit clearer. It's the full-text multi-valued fields that seem a bit
> strange.
>
> Erik
>
OK, it seems it is the multi-dimensional aspect that is missing
field[0]: A B C D
field[1]: B D
...and the concept of field array would need to be introduced
(probably at the lucene level).
Do you know if there has been any serious thought given to this, i.e.,
the possibility of introducing a new SAME operator or is this a corner-
case not worthy?
thanks
David
Re: AND query on multivalue text
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Nov 24, 2008, at 8:37 AM, David Santamauro wrote:
>>> i need to search something as
>>> myText:billion AND guarantee
>>>
>>> i need to be extracted only the record where the words exists in
>>> the same value (in this case only the first record) because in the
>>> 2nd record the two words are in different values
>>>
>>> is it possible?
>>
>> It's not possible with a purely boolean query like this, but it is
>> possible with a sloppy phrase query where the position increment
>> gap (see example schema.xml) is greater than the slop factor.
>>
>> Erik
>>
>
>
> I think what is needed here is the concept of SAME, i.e.,
> myText:billion SAME guarantee. I know a few full-text engines that
> can handle this operator one way or another. And without it, I don't
> quick understand the usefulness of multiValue fields.
Yeah, multi-valued fields are a bit awkward to grasp fully in Lucene.
Especially in this context where it's a full-text field. Basically as
far as indexing goes, there's no such thing as a "multi-valued"
field. An indexed field gets split into terms, and terms have
positional information attached to them (thus a position increment gap
can be used to but a big virtual gap between the last term of one
field instance and the first term of the next one). A multi-valued
field gets stored (if it is set to be stored, that is) as separate
strings, and is retrievable as the separate values.
Multi-valued fields are handy for facets where, say, a product can
have multiple categories associated with it. In this case it's a bit
clearer. It's the full-text multi-valued fields that seem a bit
strange.
Erik
Re: AND query on multivalue text
Posted by David Santamauro <da...@gmail.com>.
Hello all, I'm new to the list but want to say great work! ... see
comment below
On Nov 24, 2008, at 7:59 AM, Erik Hatcher wrote:
>
> On Nov 24, 2008, at 6:12 AM, Antonio Zippo wrote:
>> is it possible to have and AND query on a multivalue text?
>>
>> i need to search something as
>> myText:billion AND guarantee
>>
>> i need to be extracted only the record where the words exists in
>> the same value (in this case only the first record) because in the
>> 2nd record the two words are in different values
>>
>> is it possible?
>
> It's not possible with a purely boolean query like this, but it is
> possible with a sloppy phrase query where the position increment gap
> (see example schema.xml) is greater than the slop factor.
>
> Erik
>
I think what is needed here is the concept of SAME, i.e.,
myText:billion SAME guarantee. I know a few full-text engines that can
handle this operator one way or another. And without it, I don't quick
understand the usefulness of multiValue fields.
David
Re: AND query on multivalue text
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Nov 24, 2008, at 6:12 AM, Antonio Zippo wrote:
> is it possible to have and AND query on a multivalue text?
>
> i need to extract the record only if the words are contained inside
> the same value
>
> for example
> 1st record:
>
> <arr name="myText">
> <str>The U.S. government has announced a massive rescue package for
> Citigroup, saying it would guarantee more than $300 billion in
> company assets</str>
> <str>while injecting an additional $20 billion in capital into the
> embattled bank. </str>
> </arr>
>
> 2nd record
> <arr name="myText">
> <str>bla bla bla guarantee bla bla bla</str>
> <str>while injecting an additional $20 billion in capital into the
> embattled bank. </str>
> </arr>
>
>
> i need to search something as
> myText:billion AND guarantee
>
> i need to be extracted only the record where the words exists in the
> same value (in this case only the first record) because in the 2nd
> record the two words are in different values
>
> is it possible?
It's not possible with a purely boolean query like this, but it is
possible with a sloppy phrase query where the position increment gap
(see example schema.xml) is greater than the slop factor.
Erik