You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alireza Salimi <al...@gmail.com> on 2012/11/22 19:57:59 UTC

Find the matched field in each matched document

Hi,

I apologize if i'm asking a duplicate question but I haven't found any good
answer for my problem.
My question is: How can I find out the type of fields that are matched to
the search criteria,
when I search over multip fields.

Assume I have documents like this:
{"title": "Robert De Niro", "actors": []}
{"title": "ronin", "actors": ["robert de niro", "jean reno"]}
{"title": "casino", "actors": ["robert de niro", "Joe Pesci"]}

Here's is the schema:

<field  name="actors"
            indexed="true"
            multiValued="true"
            stored="true"
            termPositions="true"
            termOffsets="true"
            termVectors="true"
            type="text_general" />

<field  name="title"
            indexed="true"
            multiValued="false"
            stored="true"
            type="text_general" />

Now after search for "robert de niro" in both "title" and "Actors",
I will have some matches, but my question is: How can I find out
what "robert de niro" is? Is he "an actor" or a "movie title"?


Thanks in advance



-- 
Alireza Salimi
Java EE Developer

Re: Find the matched field in each matched document

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
agh...
forgot to mention pivot facets are also close to what are you looking for
http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting
Good luck.


On Mon, Nov 26, 2012 at 6:21 PM, Alireza Salimi <al...@gmail.com>wrote:

> Hi Mikhail,
>
> Thanks for the reply, I have a feeling that what I'm looking for is
> something
> that someone else must have already implemented. Basically it's a component
> which categorizes matched items by their type.
>
> For my requirement, even debugQuery should be fine because I'm expecting
> to return just the type document for each category, i.e. best movie matched
> 'Robert de Niro' and the best actor matched 'Robert de Niro'.
>
> The problem is that even though i've been working with Solr for a year now,
> i've never had to dig into its code to figure out its internals, so any
> custom
> coding would be a bit of headache.
>
> Thanks
>
>
> On Mon, Nov 26, 2012 at 3:14 AM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
> > Alireza,
> >
> > Please be aware that debugQuery works across retrieved search result page
> > (# 'rows' from 'start'), but not for all numFound docs, it's also usually
> > slow, however it carries all info which you need.
> > In our platform we implemented similar functionality working really fast,
> > but not really lightweight and compatible. I spoke about it recently
> > http://goo.gl/7vgrB
> > You can check https://issues.apache.org/jira/browse/LUCENE-1999 as a
> > starting point for your own specific implementation.
> >
> > Regards
> >
> >
> > On Fri, Nov 23, 2012 at 4:14 AM, Alireza Salimi <
> alireza.salimi@gmail.com
> > >wrote:
> >
> > > Hi Jack,
> > >
> > > Thanks for the reply.
> > >
> > > I'm not sure about debug components, I thought it slows down query
> time.
> > > Can you explain more about custom search component?
> > >
> > > Thanks
> > >
> > >
> > > On Thu, Nov 22, 2012 at 7:02 PM, Jack Krupansky <
> jack@basetechnology.com
> > > >wrote:
> > >
> > > > No, not directly, but indirectly you can - add &debugQuery=true to
> your
> > > > request and the "explain" section will detail which terms matched in
> > > which
> > > > fields.
> > > >
> > > > You could probably also implement a custom search component which
> > > > annotated each document with the matched field names. In that sense,
> > Solr
> > > > CAN do it.
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > -----Original Message----- From: Alireza Salimi
> > > > Sent: Thursday, November 22, 2012 6:11 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Find the matched field in each matched document
> > > >
> > > >
> > > > Maybe I should say it in different way:
> > > >
> > > > By having documents like above, I want to know what "Robert De Niro"
> > is?
> > > > Is it an actor or a movie title.
> > > >
> > > > you can just tell me if Solr can do it or not, it will be enough.
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > > On Thu, Nov 22, 2012 at 1:57 PM, Alireza Salimi <
> > > alireza.salimi@gmail.com>
> > > > **wrote:
> > > >
> > > >  Hi,
> > > >>
> > > >> I apologize if i'm asking a duplicate question but I haven't found
> any
> > > >> good answer for my problem.
> > > >> My question is: How can I find out the type of fields that are
> matched
> > > to
> > > >> the search criteria,
> > > >> when I search over multip fields.
> > > >>
> > > >> Assume I have documents like this:
> > > >> {"title": "Robert De Niro", "actors": []}
> > > >> {"title": "ronin", "actors": ["robert de niro", "jean reno"]}
> > > >> {"title": "casino", "actors": ["robert de niro", "Joe Pesci"]}
> > > >>
> > > >> Here's is the schema:
> > > >>
> > > >> <field  name="actors"
> > > >>             indexed="true"
> > > >>             multiValued="true"
> > > >>             stored="true"
> > > >>             termPositions="true"
> > > >>             termOffsets="true"
> > > >>             termVectors="true"
> > > >>             type="text_general" />
> > > >>
> > > >> <field  name="title"
> > > >>             indexed="true"
> > > >>             multiValued="false"
> > > >>             stored="true"
> > > >>             type="text_general" />
> > > >>
> > > >> Now after search for "robert de niro" in both "title" and "Actors",
> > > >> I will have some matches, but my question is: How can I find out
> > > >> what "robert de niro" is? Is he "an actor" or a "movie title"?
> > > >>
> > > >>
> > > >> Thanks in advance
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Alireza Salimi
> > > >> Java EE Developer
> > > >>
> > > >>
> > > >>
> > > >>
> > > >
> > > > --
> > > > Alireza Salimi
> > > > Java EE Developer
> > > >
> > >
> > >
> > >
> > > --
> > > Alireza Salimi
> > > Java EE Developer
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> >  <mk...@griddynamics.com>
> >
>
>
>
> --
> Alireza Salimi
> Java EE Developer
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Find the matched field in each matched document

Posted by Alireza Salimi <al...@gmail.com>.
Hi Mikhail,

Thanks for the reply, I have a feeling that what I'm looking for is
something
that someone else must have already implemented. Basically it's a component
which categorizes matched items by their type.

For my requirement, even debugQuery should be fine because I'm expecting
to return just the type document for each category, i.e. best movie matched
'Robert de Niro' and the best actor matched 'Robert de Niro'.

The problem is that even though i've been working with Solr for a year now,
i've never had to dig into its code to figure out its internals, so any
custom
coding would be a bit of headache.

Thanks


On Mon, Nov 26, 2012 at 3:14 AM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Alireza,
>
> Please be aware that debugQuery works across retrieved search result page
> (# 'rows' from 'start'), but not for all numFound docs, it's also usually
> slow, however it carries all info which you need.
> In our platform we implemented similar functionality working really fast,
> but not really lightweight and compatible. I spoke about it recently
> http://goo.gl/7vgrB
> You can check https://issues.apache.org/jira/browse/LUCENE-1999 as a
> starting point for your own specific implementation.
>
> Regards
>
>
> On Fri, Nov 23, 2012 at 4:14 AM, Alireza Salimi <alireza.salimi@gmail.com
> >wrote:
>
> > Hi Jack,
> >
> > Thanks for the reply.
> >
> > I'm not sure about debug components, I thought it slows down query time.
> > Can you explain more about custom search component?
> >
> > Thanks
> >
> >
> > On Thu, Nov 22, 2012 at 7:02 PM, Jack Krupansky <jack@basetechnology.com
> > >wrote:
> >
> > > No, not directly, but indirectly you can - add &debugQuery=true to your
> > > request and the "explain" section will detail which terms matched in
> > which
> > > fields.
> > >
> > > You could probably also implement a custom search component which
> > > annotated each document with the matched field names. In that sense,
> Solr
> > > CAN do it.
> > >
> > > -- Jack Krupansky
> > >
> > > -----Original Message----- From: Alireza Salimi
> > > Sent: Thursday, November 22, 2012 6:11 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Find the matched field in each matched document
> > >
> > >
> > > Maybe I should say it in different way:
> > >
> > > By having documents like above, I want to know what "Robert De Niro"
> is?
> > > Is it an actor or a movie title.
> > >
> > > you can just tell me if Solr can do it or not, it will be enough.
> > >
> > > Thanks
> > >
> > >
> > >
> > > On Thu, Nov 22, 2012 at 1:57 PM, Alireza Salimi <
> > alireza.salimi@gmail.com>
> > > **wrote:
> > >
> > >  Hi,
> > >>
> > >> I apologize if i'm asking a duplicate question but I haven't found any
> > >> good answer for my problem.
> > >> My question is: How can I find out the type of fields that are matched
> > to
> > >> the search criteria,
> > >> when I search over multip fields.
> > >>
> > >> Assume I have documents like this:
> > >> {"title": "Robert De Niro", "actors": []}
> > >> {"title": "ronin", "actors": ["robert de niro", "jean reno"]}
> > >> {"title": "casino", "actors": ["robert de niro", "Joe Pesci"]}
> > >>
> > >> Here's is the schema:
> > >>
> > >> <field  name="actors"
> > >>             indexed="true"
> > >>             multiValued="true"
> > >>             stored="true"
> > >>             termPositions="true"
> > >>             termOffsets="true"
> > >>             termVectors="true"
> > >>             type="text_general" />
> > >>
> > >> <field  name="title"
> > >>             indexed="true"
> > >>             multiValued="false"
> > >>             stored="true"
> > >>             type="text_general" />
> > >>
> > >> Now after search for "robert de niro" in both "title" and "Actors",
> > >> I will have some matches, but my question is: How can I find out
> > >> what "robert de niro" is? Is he "an actor" or a "movie title"?
> > >>
> > >>
> > >> Thanks in advance
> > >>
> > >>
> > >>
> > >> --
> > >> Alireza Salimi
> > >> Java EE Developer
> > >>
> > >>
> > >>
> > >>
> > >
> > > --
> > > Alireza Salimi
> > > Java EE Developer
> > >
> >
> >
> >
> > --
> > Alireza Salimi
> > Java EE Developer
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>



-- 
Alireza Salimi
Java EE Developer

Re: Find the matched field in each matched document

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Alireza,

Please be aware that debugQuery works across retrieved search result page
(# 'rows' from 'start'), but not for all numFound docs, it's also usually
slow, however it carries all info which you need.
In our platform we implemented similar functionality working really fast,
but not really lightweight and compatible. I spoke about it recently
http://goo.gl/7vgrB
You can check https://issues.apache.org/jira/browse/LUCENE-1999 as a
starting point for your own specific implementation.

Regards


On Fri, Nov 23, 2012 at 4:14 AM, Alireza Salimi <al...@gmail.com>wrote:

> Hi Jack,
>
> Thanks for the reply.
>
> I'm not sure about debug components, I thought it slows down query time.
> Can you explain more about custom search component?
>
> Thanks
>
>
> On Thu, Nov 22, 2012 at 7:02 PM, Jack Krupansky <jack@basetechnology.com
> >wrote:
>
> > No, not directly, but indirectly you can - add &debugQuery=true to your
> > request and the "explain" section will detail which terms matched in
> which
> > fields.
> >
> > You could probably also implement a custom search component which
> > annotated each document with the matched field names. In that sense, Solr
> > CAN do it.
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Alireza Salimi
> > Sent: Thursday, November 22, 2012 6:11 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Find the matched field in each matched document
> >
> >
> > Maybe I should say it in different way:
> >
> > By having documents like above, I want to know what "Robert De Niro" is?
> > Is it an actor or a movie title.
> >
> > you can just tell me if Solr can do it or not, it will be enough.
> >
> > Thanks
> >
> >
> >
> > On Thu, Nov 22, 2012 at 1:57 PM, Alireza Salimi <
> alireza.salimi@gmail.com>
> > **wrote:
> >
> >  Hi,
> >>
> >> I apologize if i'm asking a duplicate question but I haven't found any
> >> good answer for my problem.
> >> My question is: How can I find out the type of fields that are matched
> to
> >> the search criteria,
> >> when I search over multip fields.
> >>
> >> Assume I have documents like this:
> >> {"title": "Robert De Niro", "actors": []}
> >> {"title": "ronin", "actors": ["robert de niro", "jean reno"]}
> >> {"title": "casino", "actors": ["robert de niro", "Joe Pesci"]}
> >>
> >> Here's is the schema:
> >>
> >> <field  name="actors"
> >>             indexed="true"
> >>             multiValued="true"
> >>             stored="true"
> >>             termPositions="true"
> >>             termOffsets="true"
> >>             termVectors="true"
> >>             type="text_general" />
> >>
> >> <field  name="title"
> >>             indexed="true"
> >>             multiValued="false"
> >>             stored="true"
> >>             type="text_general" />
> >>
> >> Now after search for "robert de niro" in both "title" and "Actors",
> >> I will have some matches, but my question is: How can I find out
> >> what "robert de niro" is? Is he "an actor" or a "movie title"?
> >>
> >>
> >> Thanks in advance
> >>
> >>
> >>
> >> --
> >> Alireza Salimi
> >> Java EE Developer
> >>
> >>
> >>
> >>
> >
> > --
> > Alireza Salimi
> > Java EE Developer
> >
>
>
>
> --
> Alireza Salimi
> Java EE Developer
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Find the matched field in each matched document

Posted by Alireza Salimi <al...@gmail.com>.
Hi Jack,

Thanks for the reply.

I'm not sure about debug components, I thought it slows down query time.
Can you explain more about custom search component?

Thanks


On Thu, Nov 22, 2012 at 7:02 PM, Jack Krupansky <ja...@basetechnology.com>wrote:

> No, not directly, but indirectly you can - add &debugQuery=true to your
> request and the "explain" section will detail which terms matched in which
> fields.
>
> You could probably also implement a custom search component which
> annotated each document with the matched field names. In that sense, Solr
> CAN do it.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Alireza Salimi
> Sent: Thursday, November 22, 2012 6:11 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Find the matched field in each matched document
>
>
> Maybe I should say it in different way:
>
> By having documents like above, I want to know what "Robert De Niro" is?
> Is it an actor or a movie title.
>
> you can just tell me if Solr can do it or not, it will be enough.
>
> Thanks
>
>
>
> On Thu, Nov 22, 2012 at 1:57 PM, Alireza Salimi <al...@gmail.com>
> **wrote:
>
>  Hi,
>>
>> I apologize if i'm asking a duplicate question but I haven't found any
>> good answer for my problem.
>> My question is: How can I find out the type of fields that are matched to
>> the search criteria,
>> when I search over multip fields.
>>
>> Assume I have documents like this:
>> {"title": "Robert De Niro", "actors": []}
>> {"title": "ronin", "actors": ["robert de niro", "jean reno"]}
>> {"title": "casino", "actors": ["robert de niro", "Joe Pesci"]}
>>
>> Here's is the schema:
>>
>> <field  name="actors"
>>             indexed="true"
>>             multiValued="true"
>>             stored="true"
>>             termPositions="true"
>>             termOffsets="true"
>>             termVectors="true"
>>             type="text_general" />
>>
>> <field  name="title"
>>             indexed="true"
>>             multiValued="false"
>>             stored="true"
>>             type="text_general" />
>>
>> Now after search for "robert de niro" in both "title" and "Actors",
>> I will have some matches, but my question is: How can I find out
>> what "robert de niro" is? Is he "an actor" or a "movie title"?
>>
>>
>> Thanks in advance
>>
>>
>>
>> --
>> Alireza Salimi
>> Java EE Developer
>>
>>
>>
>>
>
> --
> Alireza Salimi
> Java EE Developer
>



-- 
Alireza Salimi
Java EE Developer

Re: Find the matched field in each matched document

Posted by Jack Krupansky <ja...@basetechnology.com>.
No, not directly, but indirectly you can - add &debugQuery=true to your 
request and the "explain" section will detail which terms matched in which 
fields.

You could probably also implement a custom search component which annotated 
each document with the matched field names. In that sense, Solr CAN do it.

-- Jack Krupansky

-----Original Message----- 
From: Alireza Salimi
Sent: Thursday, November 22, 2012 6:11 PM
To: solr-user@lucene.apache.org
Subject: Re: Find the matched field in each matched document

Maybe I should say it in different way:

By having documents like above, I want to know what "Robert De Niro" is?
Is it an actor or a movie title.

you can just tell me if Solr can do it or not, it will be enough.

Thanks



On Thu, Nov 22, 2012 at 1:57 PM, Alireza Salimi 
<al...@gmail.com>wrote:

> Hi,
>
> I apologize if i'm asking a duplicate question but I haven't found any
> good answer for my problem.
> My question is: How can I find out the type of fields that are matched to
> the search criteria,
> when I search over multip fields.
>
> Assume I have documents like this:
> {"title": "Robert De Niro", "actors": []}
> {"title": "ronin", "actors": ["robert de niro", "jean reno"]}
> {"title": "casino", "actors": ["robert de niro", "Joe Pesci"]}
>
> Here's is the schema:
>
> <field  name="actors"
>             indexed="true"
>             multiValued="true"
>             stored="true"
>             termPositions="true"
>             termOffsets="true"
>             termVectors="true"
>             type="text_general" />
>
> <field  name="title"
>             indexed="true"
>             multiValued="false"
>             stored="true"
>             type="text_general" />
>
> Now after search for "robert de niro" in both "title" and "Actors",
> I will have some matches, but my question is: How can I find out
> what "robert de niro" is? Is he "an actor" or a "movie title"?
>
>
> Thanks in advance
>
>
>
> --
> Alireza Salimi
> Java EE Developer
>
>
>


-- 
Alireza Salimi
Java EE Developer 


Re: Find the matched field in each matched document

Posted by Alireza Salimi <al...@gmail.com>.
Maybe I should say it in different way:

By having documents like above, I want to know what "Robert De Niro" is?
Is it an actor or a movie title.

you can just tell me if Solr can do it or not, it will be enough.

Thanks



On Thu, Nov 22, 2012 at 1:57 PM, Alireza Salimi <al...@gmail.com>wrote:

> Hi,
>
> I apologize if i'm asking a duplicate question but I haven't found any
> good answer for my problem.
> My question is: How can I find out the type of fields that are matched to
> the search criteria,
> when I search over multip fields.
>
> Assume I have documents like this:
> {"title": "Robert De Niro", "actors": []}
> {"title": "ronin", "actors": ["robert de niro", "jean reno"]}
> {"title": "casino", "actors": ["robert de niro", "Joe Pesci"]}
>
> Here's is the schema:
>
> <field  name="actors"
>             indexed="true"
>             multiValued="true"
>             stored="true"
>             termPositions="true"
>             termOffsets="true"
>             termVectors="true"
>             type="text_general" />
>
> <field  name="title"
>             indexed="true"
>             multiValued="false"
>             stored="true"
>             type="text_general" />
>
> Now after search for "robert de niro" in both "title" and "Actors",
> I will have some matches, but my question is: How can I find out
> what "robert de niro" is? Is he "an actor" or a "movie title"?
>
>
> Thanks in advance
>
>
>
> --
> Alireza Salimi
> Java EE Developer
>
>
>


-- 
Alireza Salimi
Java EE Developer

Re: Find the matched field in each matched document

Posted by Alireza Salimi <al...@gmail.com>.
Hi Hoss,

Actually that was the first solution that came to my mind,
but I naively wanted to be efficient regarding disk usage, by
not creating different docs just for the sake of categorization.

But the more I think, the more I realize it's the best solution
is to have documents with two fields: type and title, then for
categorization searches will be done on 'title'.

Especially since the documents which keep all information of
the movie have much more terms in them, the scoring also
will not be good when we compare the movie documents vs
actor documents which have only one field.

Thanks






On Mon, Nov 26, 2012 at 8:24 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : Assume I have documents like this:
> : {"title": "Robert De Niro", "actors": []}
> : {"title": "ronin", "actors": ["robert de niro", "jean reno"]}
> : {"title": "casino", "actors": ["robert de niro", "Joe Pesci"]}
>         ...
> : Now after search for "robert de niro" in both "title" and "Actors",
> : I will have some matches, but my question is: How can I find out
> : what "robert de niro" is? Is he "an actor" or a "movie title"?
>
> I would strongly suggest you rethink your problem.
>
> asking solr to identify which _field_ your query matched on is something
> that is, in general, undefinable since a query might be something like
> "+title:casion actors:(robert rupert)".  Even if in your specific case you
> know that you will always be querying for the same string in all fields,
> you'll still run into ambiguious cases where a search might match on
> *both* the title and actors field (eg: imaging searching for "ray" and
> matching the movie "Ray" in which "Ray Charles" appears as himself in
> archive footage)
>
> Instead, you should really include in your in index a field that *tells*
> you what each kind of document is -- and then when you get a set of
> results back, you can look at that field and say "this is a movie", "this
> is an actor", etc...
>
> (you could probably deduce that from your current schema by looking at
> wether there are any stored values in the "actors" field, but there's no
> reason not to just add it as an explicit field -- at which point you can
> also facet on it, etc...)
>
>
> -Hoss
>



-- 
Alireza Salimi
Java EE Developer

Re: Find the matched field in each matched document

Posted by Chris Hostetter <ho...@fucit.org>.
: Assume I have documents like this:
: {"title": "Robert De Niro", "actors": []}
: {"title": "ronin", "actors": ["robert de niro", "jean reno"]}
: {"title": "casino", "actors": ["robert de niro", "Joe Pesci"]}
	...
: Now after search for "robert de niro" in both "title" and "Actors",
: I will have some matches, but my question is: How can I find out
: what "robert de niro" is? Is he "an actor" or a "movie title"?

I would strongly suggest you rethink your problem.

asking solr to identify which _field_ your query matched on is something 
that is, in general, undefinable since a query might be something like 
"+title:casion actors:(robert rupert)".  Even if in your specific case you 
know that you will always be querying for the same string in all fields, 
you'll still run into ambiguious cases where a search might match on 
*both* the title and actors field (eg: imaging searching for "ray" and 
matching the movie "Ray" in which "Ray Charles" appears as himself in 
archive footage)

Instead, you should really include in your in index a field that *tells* 
you what each kind of document is -- and then when you get a set of 
results back, you can look at that field and say "this is a movie", "this 
is an actor", etc...

(you could probably deduce that from your current schema by looking at 
wether there are any stored values in the "actors" field, but there's no 
reason not to just add it as an explicit field -- at which point you can 
also facet on it, etc...)


-Hoss