You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Brian Lamb <br...@journalexperts.com> on 2011/03/29 22:57:22 UTC

Matching on a multi valued field

Hi all,

I have a field set up like this:

<field name="common_names" multiValued="true" type="text" indexed="true"
stored="true" required="false" />

And I have some records:

RECORD1
<arr name="common_names">
  <str>man's best friend</str>
  <str>pooch</str>
</arr>

RECORD2
<arr name="common_names">
  <str>man's worst enemy</str>
  <str>friend to no one</str>
</arr>

Now if I do a search such as:
http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND df=common_names}man's
friend

Both records are returned. However, I only want RECORD1 returned. I
understand why RECORD2 is returned but how can I structure my query so that
only RECORD1 is returned?

Thanks,

Brian Lamb

Re: Matching on a multi valued field

Posted by Erick Erickson <er...@gmail.com>.
Two things need to be done. First, define positionIncrementGap
(see http://wiki.apache.org/solr/SchemaXml) for the field.

Then use phrase searches with the slop less than what you've
defined for positionIncrementGap.

Of course you'll have to have a positionIncrementGap larger than the
number of tokens in any single entry in your multiValued field, and you'll
have to re-index.

Best
Erick

On Tue, Mar 29, 2011 at 4:57 PM, Brian Lamb
<br...@journalexperts.com> wrote:
> Hi all,
>
> I have a field set up like this:
>
> <field name="common_names" multiValued="true" type="text" indexed="true"
> stored="true" required="false" />
>
> And I have some records:
>
> RECORD1
> <arr name="common_names">
>  <str>man's best friend</str>
>  <str>pooch</str>
> </arr>
>
> RECORD2
> <arr name="common_names">
>  <str>man's worst enemy</str>
>  <str>friend to no one</str>
> </arr>
>
> Now if I do a search such as:
> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND df=common_names}man's
> friend
>
> Both records are returned. However, I only want RECORD1 returned. I
> understand why RECORD2 is returned but how can I structure my query so that
> only RECORD1 is returned?
>
> Thanks,
>
> Brian Lamb
>

Re: Matching on a multi valued field

Posted by Michael Sokolov <so...@ifactory.com>.
Could you try creating fields dynamically: common_names_1, 
common_names_2, etc.

Keep track of the max number of fields and generate queries listing all 
the fields?

Gross, but it handles all the cases mentioned in the thread (wildcards, 
phrases, etc).

-Mike

On 3/29/2011 4:57 PM, Brian Lamb wrote:
> Hi all,
>
> I have a field set up like this:
>
> <field name="common_names" multiValued="true" type="text" indexed="true"
> stored="true" required="false" />
>
> And I have some records:
>
> RECORD1
> <arr name="common_names">
>    <str>man's best friend</str>
>    <str>pooch</str>
> </arr>
>
> RECORD2
> <arr name="common_names">
>    <str>man's worst enemy</str>
>    <str>friend to no one</str>
> </arr>
>
> Now if I do a search such as:
> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND df=common_names}man's
> friend
>
> Both records are returned. However, I only want RECORD1 returned. I
> understand why RECORD2 is returned but how can I structure my query so that
> only RECORD1 is returned?
>
> Thanks,
>
> Brian Lamb
>


Re: Matching on a multi valued field

Posted by Juan Pablo Mora <ju...@informa.es>.
I have not find any solution to this. The only thing is to denormalize your multivalue field into several docs with a single value field.

Try ComplexPhraseQueryParser (https://issues.apache.org/jira/browse/SOLR-1604) if you are using solr 1.4 version.


El 04/04/2011, a las 21:21, Brian Lamb escribió:

I just noticed Juan's response and I find that I am encountering that very issue in a few cases. Boosting is a good way to put the more relevant results to the top but it is possible to only have the correct results returned?

On Wed, Mar 30, 2011 at 11:51 AM, Brian Lamb <br...@journalexperts.com>> wrote:
Thank you all for your responses. The field had already been set up with positionIncrementGap=100 so I just needed to add in the slop.


On Tue, Mar 29, 2011 at 6:32 PM, Juan Pablo Mora <ju...@informa.es>> wrote:
>> A multiValued field
>> is actually a single field with all data separated with positionIncrement.
>> Try setting that value high enough and use a PhraseQuery.


That is true but you cannot do things like:

q="bar* foo*"~10 with default query search.

and if you use dismax you will have the same problems with multivalued fields. Imagine the situation:

Doc1:
       field A: ["foo bar","dooh"] 2 values

Doc2:
       field A: ["bar dooh", "whatever"] Another 2 values

the query:
       qt=dismax & qf= fieldA & q = ( bar dooh )

will return both Doc1 and Doc2. The only thing you can do in this situation is boost phrase query in Doc2 with parameter pf in order to get Doc2 in the first position of the results:

pf = fieldA^10000


Thanks,
JP.


El 29/03/2011, a las 23:14, Markus Jelsma escribió:

> orly, all replies came in while sending =)
>
>> Hi,
>>
>> Your filter query is looking for a match of "man's friend" in a single
>> field. Regardless of analysis of the common_names field, all terms are
>> present in the common_names field of both documents. A multiValued field
>> is actually a single field with all data separated with positionIncrement.
>> Try setting that value high enough and use a PhraseQuery.
>>
>> That should work
>>
>> Cheers,
>>
>>> Hi all,
>>>
>>> I have a field set up like this:
>>>
>>> <field name="common_names" multiValued="true" type="text" indexed="true"
>>> stored="true" required="false" />
>>>
>>> And I have some records:
>>>
>>> RECORD1
>>> <arr name="common_names">
>>>
>>>  <str>man's best friend</str>
>>>  <str>pooch</str>
>>>
>>> </arr>
>>>
>>> RECORD2
>>> <arr name="common_names">
>>>
>>>  <str>man's worst enemy</str>
>>>  <str>friend to no one</str>
>>>
>>> </arr>
>>>
>>> Now if I do a search such as:
>>> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND<http://localhost:8983/solr/search/?q=*:*&fq=%7B!q.op=AND>
>>> df=common_names}man's friend
>>>
>>> Both records are returned. However, I only want RECORD1 returned. I
>>> understand why RECORD2 is returned but how can I structure my query so
>>> that only RECORD1 is returned?
>>>
>>> Thanks,
>>>
>>> Brian Lamb





Re: Matching on a multi valued field

Posted by Jonathan Rochkind <ro...@jhu.edu>.
On 4/4/2011 3:21 PM, Brian Lamb wrote:
> I just noticed Juan's response and I find that I am encountering that very
> issue in a few cases. Boosting is a good way to put the more relevant
> results to the top but it is possible to only have the correct results
> returned?

Only what's already been said in the thread.  You can simulate a 
non-phrase non-wildcard search, forced to match all within the same 
value of a multi-valued, by using phrase queries with slop.  And it will 
only return hits that have all terms within the same value -- it's not a 
boosting solution.

But if you need wildcards, or you need to find an actual phrase in the 
same value as additional term(s) or phrase(s), no, you are out of luck 
in Solr.

That is, exactly what Juan said, he already said exactly this.

If someone can think of a clever way to write some Java to do this in a 
new query component, that would be useful.  I am not entirely sure how 
possible that is.  I guess you'd have to make sure that ALL matching 
tokens or phrases are within the positionIncrementGap of each other, not 
sure how feasible that is, I'm not too familiar with Solr/Lucene 
source.   But at any rate, there's no way to do it out of the box with 
Solr, no.


Re: Matching on a multi valued field

Posted by Brian Lamb <br...@journalexperts.com>.
I just noticed Juan's response and I find that I am encountering that very
issue in a few cases. Boosting is a good way to put the more relevant
results to the top but it is possible to only have the correct results
returned?

On Wed, Mar 30, 2011 at 11:51 AM, Brian Lamb
<br...@journalexperts.com>wrote:

> Thank you all for your responses. The field had already been set up with
> positionIncrementGap=100 so I just needed to add in the slop.
>
>
> On Tue, Mar 29, 2011 at 6:32 PM, Juan Pablo Mora <ju...@informa.es>wrote:
>
>> >> A multiValued field
>> >> is actually a single field with all data separated with
>> positionIncrement.
>> >> Try setting that value high enough and use a PhraseQuery.
>>
>>
>> That is true but you cannot do things like:
>>
>> q="bar* foo*"~10 with default query search.
>>
>> and if you use dismax you will have the same problems with multivalued
>> fields. Imagine the situation:
>>
>> Doc1:
>>        field A: ["foo bar","dooh"] 2 values
>>
>> Doc2:
>>        field A: ["bar dooh", "whatever"] Another 2 values
>>
>> the query:
>>        qt=dismax & qf= fieldA & q = ( bar dooh )
>>
>> will return both Doc1 and Doc2. The only thing you can do in this
>> situation is boost phrase query in Doc2 with parameter pf in order to get
>> Doc2 in the first position of the results:
>>
>> pf = fieldA^10000
>>
>>
>> Thanks,
>> JP.
>>
>>
>> El 29/03/2011, a las 23:14, Markus Jelsma escribió:
>>
>> > orly, all replies came in while sending =)
>> >
>> >> Hi,
>> >>
>> >> Your filter query is looking for a match of "man's friend" in a single
>> >> field. Regardless of analysis of the common_names field, all terms are
>> >> present in the common_names field of both documents. A multiValued
>> field
>> >> is actually a single field with all data separated with
>> positionIncrement.
>> >> Try setting that value high enough and use a PhraseQuery.
>> >>
>> >> That should work
>> >>
>> >> Cheers,
>> >>
>> >>> Hi all,
>> >>>
>> >>> I have a field set up like this:
>> >>>
>> >>> <field name="common_names" multiValued="true" type="text"
>> indexed="true"
>> >>> stored="true" required="false" />
>> >>>
>> >>> And I have some records:
>> >>>
>> >>> RECORD1
>> >>> <arr name="common_names">
>> >>>
>> >>>  <str>man's best friend</str>
>> >>>  <str>pooch</str>
>> >>>
>> >>> </arr>
>> >>>
>> >>> RECORD2
>> >>> <arr name="common_names">
>> >>>
>> >>>  <str>man's worst enemy</str>
>> >>>  <str>friend to no one</str>
>> >>>
>> >>> </arr>
>> >>>
>> >>> Now if I do a search such as:
>> >>> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND
>> >>> df=common_names}man's friend
>> >>>
>> >>> Both records are returned. However, I only want RECORD1 returned. I
>> >>> understand why RECORD2 is returned but how can I structure my query so
>> >>> that only RECORD1 is returned?
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Brian Lamb
>>
>>
>

Re: Matching on a multi valued field

Posted by Brian Lamb <br...@journalexperts.com>.
Thank you all for your responses. The field had already been set up with
positionIncrementGap=100 so I just needed to add in the slop.

On Tue, Mar 29, 2011 at 6:32 PM, Juan Pablo Mora <ju...@informa.es> wrote:

> >> A multiValued field
> >> is actually a single field with all data separated with
> positionIncrement.
> >> Try setting that value high enough and use a PhraseQuery.
>
>
> That is true but you cannot do things like:
>
> q="bar* foo*"~10 with default query search.
>
> and if you use dismax you will have the same problems with multivalued
> fields. Imagine the situation:
>
> Doc1:
>        field A: ["foo bar","dooh"] 2 values
>
> Doc2:
>        field A: ["bar dooh", "whatever"] Another 2 values
>
> the query:
>        qt=dismax & qf= fieldA & q = ( bar dooh )
>
> will return both Doc1 and Doc2. The only thing you can do in this situation
> is boost phrase query in Doc2 with parameter pf in order to get Doc2 in the
> first position of the results:
>
> pf = fieldA^10000
>
>
> Thanks,
> JP.
>
>
> El 29/03/2011, a las 23:14, Markus Jelsma escribió:
>
> > orly, all replies came in while sending =)
> >
> >> Hi,
> >>
> >> Your filter query is looking for a match of "man's friend" in a single
> >> field. Regardless of analysis of the common_names field, all terms are
> >> present in the common_names field of both documents. A multiValued field
> >> is actually a single field with all data separated with
> positionIncrement.
> >> Try setting that value high enough and use a PhraseQuery.
> >>
> >> That should work
> >>
> >> Cheers,
> >>
> >>> Hi all,
> >>>
> >>> I have a field set up like this:
> >>>
> >>> <field name="common_names" multiValued="true" type="text"
> indexed="true"
> >>> stored="true" required="false" />
> >>>
> >>> And I have some records:
> >>>
> >>> RECORD1
> >>> <arr name="common_names">
> >>>
> >>>  <str>man's best friend</str>
> >>>  <str>pooch</str>
> >>>
> >>> </arr>
> >>>
> >>> RECORD2
> >>> <arr name="common_names">
> >>>
> >>>  <str>man's worst enemy</str>
> >>>  <str>friend to no one</str>
> >>>
> >>> </arr>
> >>>
> >>> Now if I do a search such as:
> >>> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND
> >>> df=common_names}man's friend
> >>>
> >>> Both records are returned. However, I only want RECORD1 returned. I
> >>> understand why RECORD2 is returned but how can I structure my query so
> >>> that only RECORD1 is returned?
> >>>
> >>> Thanks,
> >>>
> >>> Brian Lamb
>
>

Re: Matching on a multi valued field

Posted by Juan Pablo Mora <ju...@informa.es>.
>> A multiValued field
>> is actually a single field with all data separated with positionIncrement.
>> Try setting that value high enough and use a PhraseQuery.


That is true but you cannot do things like:

q="bar* foo*"~10 with default query search.

and if you use dismax you will have the same problems with multivalued fields. Imagine the situation:

Doc1:
	field A: ["foo bar","dooh"] 2 values
	
Doc2:
	field A: ["bar dooh", "whatever"] Another 2 values

the query:
	qt=dismax & qf= fieldA & q = ( bar dooh )

will return both Doc1 and Doc2. The only thing you can do in this situation is boost phrase query in Doc2 with parameter pf in order to get Doc2 in the first position of the results:

pf = fieldA^10000
	

Thanks,
JP.


El 29/03/2011, a las 23:14, Markus Jelsma escribió:

> orly, all replies came in while sending =)
> 
>> Hi,
>> 
>> Your filter query is looking for a match of "man's friend" in a single
>> field. Regardless of analysis of the common_names field, all terms are
>> present in the common_names field of both documents. A multiValued field
>> is actually a single field with all data separated with positionIncrement.
>> Try setting that value high enough and use a PhraseQuery.
>> 
>> That should work
>> 
>> Cheers,
>> 
>>> Hi all,
>>> 
>>> I have a field set up like this:
>>> 
>>> <field name="common_names" multiValued="true" type="text" indexed="true"
>>> stored="true" required="false" />
>>> 
>>> And I have some records:
>>> 
>>> RECORD1
>>> <arr name="common_names">
>>> 
>>>  <str>man's best friend</str>
>>>  <str>pooch</str>
>>> 
>>> </arr>
>>> 
>>> RECORD2
>>> <arr name="common_names">
>>> 
>>>  <str>man's worst enemy</str>
>>>  <str>friend to no one</str>
>>> 
>>> </arr>
>>> 
>>> Now if I do a search such as:
>>> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND
>>> df=common_names}man's friend
>>> 
>>> Both records are returned. However, I only want RECORD1 returned. I
>>> understand why RECORD2 is returned but how can I structure my query so
>>> that only RECORD1 is returned?
>>> 
>>> Thanks,
>>> 
>>> Brian Lamb


Re: Matching on a multi valued field

Posted by Markus Jelsma <ma...@openindex.io>.
orly, all replies came in while sending =)

> Hi,
> 
> Your filter query is looking for a match of "man's friend" in a single
> field. Regardless of analysis of the common_names field, all terms are
> present in the common_names field of both documents. A multiValued field
> is actually a single field with all data separated with positionIncrement.
> Try setting that value high enough and use a PhraseQuery.
> 
> That should work
> 
> Cheers,
> 
> > Hi all,
> > 
> > I have a field set up like this:
> > 
> > <field name="common_names" multiValued="true" type="text" indexed="true"
> > stored="true" required="false" />
> > 
> > And I have some records:
> > 
> > RECORD1
> > <arr name="common_names">
> > 
> >   <str>man's best friend</str>
> >   <str>pooch</str>
> > 
> > </arr>
> > 
> > RECORD2
> > <arr name="common_names">
> > 
> >   <str>man's worst enemy</str>
> >   <str>friend to no one</str>
> > 
> > </arr>
> > 
> > Now if I do a search such as:
> > http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND
> > df=common_names}man's friend
> > 
> > Both records are returned. However, I only want RECORD1 returned. I
> > understand why RECORD2 is returned but how can I structure my query so
> > that only RECORD1 is returned?
> > 
> > Thanks,
> > 
> > Brian Lamb

Re: Matching on a multi valued field

Posted by Markus Jelsma <ma...@openindex.io>.
Hi,

Your filter query is looking for a match of "man's friend" in a single field. 
Regardless of analysis of the common_names field, all terms are present in the 
common_names field of both documents. A multiValued field is actually a single 
field with all data separated with positionIncrement. Try setting that value 
high enough and use a PhraseQuery. 

That should work

Cheers,

> Hi all,
> 
> I have a field set up like this:
> 
> <field name="common_names" multiValued="true" type="text" indexed="true"
> stored="true" required="false" />
> 
> And I have some records:
> 
> RECORD1
> <arr name="common_names">
>   <str>man's best friend</str>
>   <str>pooch</str>
> </arr>
> 
> RECORD2
> <arr name="common_names">
>   <str>man's worst enemy</str>
>   <str>friend to no one</str>
> </arr>
> 
> Now if I do a search such as:
> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND
> df=common_names}man's friend
> 
> Both records are returned. However, I only want RECORD1 returned. I
> understand why RECORD2 is returned but how can I structure my query so that
> only RECORD1 is returned?
> 
> Thanks,
> 
> Brian Lamb

Re: Matching on a multi valued field

Posted by Renaud Delbru <re...@deri.org>.
Hi,

you could try the SIREn plugin [1] which supports multi-valued fields.

[1] http://siren.sindice.com
-- 
Renaud Delbru

On 29/03/11 21:57, Brian Lamb wrote:
> Hi all,
>
> I have a field set up like this:
>
> <field name="common_names" multiValued="true" type="text" indexed="true"
> stored="true" required="false" />
>
> And I have some records:
>
> RECORD1
> <arr name="common_names">
>    <str>man's best friend</str>
>    <str>pooch</str>
> </arr>
>
> RECORD2
> <arr name="common_names">
>    <str>man's worst enemy</str>
>    <str>friend to no one</str>
> </arr>
>
> Now if I do a search such as:
> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND df=common_names}man's
> friend
>
> Both records are returned. However, I only want RECORD1 returned. I
> understand why RECORD2 is returned but how can I structure my query so that
> only RECORD1 is returned?
>
> Thanks,
>
> Brian Lamb
>


Re: Matching on a multi valued field

Posted by Jonathan Rochkind <ro...@jhu.edu>.
As far as I know, there's no support in Solr for "all words must match 
in the same value of a multi-valued field".

I agree it would be useful in some cases.

As long as you don't need to do an _actual_ phrase search, you can kind 
of fake it by using a phrase query, with the query slop set so high that 
it will encompass the whole field. Just make sure your 
positionIncrementGap in your solrconfig.xml is higher than your phrase 
slop, to keep your phrase slop from leaking over into another value of 
the multi-valued field.

fq="man's friend"~10000
(but url encode the value)

On 3/29/2011 4:57 PM, Brian Lamb wrote:
> Hi all,
>
> I have a field set up like this:
>
> <field name="common_names" multiValued="true" type="text" indexed="true"
> stored="true" required="false" />
>
> And I have some records:
>
> RECORD1
> <arr name="common_names">
>    <str>man's best friend</str>
>    <str>pooch</str>
> </arr>
>
> RECORD2
> <arr name="common_names">
>    <str>man's worst enemy</str>
>    <str>friend to no one</str>
> </arr>
>
> Now if I do a search such as:
> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND df=common_names}man's
> friend
>
> Both records are returned. However, I only want RECORD1 returned. I
> understand why RECORD2 is returned but how can I structure my query so that
> only RECORD1 is returned?
>
> Thanks,
>
> Brian Lamb
>

Re: Matching on a multi valued field

Posted by Savvas-Andreas Moysidis <sa...@googlemail.com>.
my bad..just realised your problem.. :D

On 29 March 2011 22:07, Savvas-Andreas Moysidis <
savvas.andreas.moysidis@googlemail.com> wrote:

> I assume you are using the Standard Handler?
> In that case wouldn't something like:
> "q=common_names:(man's friend)&q.op=AND" work?
>
> On 29 March 2011 21:57, Brian Lamb <br...@journalexperts.com> wrote:
>
>> Hi all,
>>
>> I have a field set up like this:
>>
>> <field name="common_names" multiValued="true" type="text" indexed="true"
>> stored="true" required="false" />
>>
>> And I have some records:
>>
>> RECORD1
>> <arr name="common_names">
>>  <str>man's best friend</str>
>>  <str>pooch</str>
>> </arr>
>>
>> RECORD2
>> <arr name="common_names">
>>  <str>man's worst enemy</str>
>>  <str>friend to no one</str>
>> </arr>
>>
>> Now if I do a search such as:
>> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=ANDdf=common_names}man's
>> friend
>>
>> Both records are returned. However, I only want RECORD1 returned. I
>> understand why RECORD2 is returned but how can I structure my query so
>> that
>> only RECORD1 is returned?
>>
>> Thanks,
>>
>> Brian Lamb
>>
>
>

Re: Matching on a multi valued field

Posted by Savvas-Andreas Moysidis <sa...@googlemail.com>.
I assume you are using the Standard Handler?
In that case wouldn't something like:
"q=common_names:(man's friend)&q.op=AND" work?

On 29 March 2011 21:57, Brian Lamb <br...@journalexperts.com> wrote:

> Hi all,
>
> I have a field set up like this:
>
> <field name="common_names" multiValued="true" type="text" indexed="true"
> stored="true" required="false" />
>
> And I have some records:
>
> RECORD1
> <arr name="common_names">
>  <str>man's best friend</str>
>  <str>pooch</str>
> </arr>
>
> RECORD2
> <arr name="common_names">
>  <str>man's worst enemy</str>
>  <str>friend to no one</str>
> </arr>
>
> Now if I do a search such as:
> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=ANDdf=common_names}man's
> friend
>
> Both records are returned. However, I only want RECORD1 returned. I
> understand why RECORD2 is returned but how can I structure my query so that
> only RECORD1 is returned?
>
> Thanks,
>
> Brian Lamb
>