You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Alex J. G. Burzyński" <ma...@ajgb.net> on 2010/07/08 14:45:04 UTC

Filter multivalue fields from search result

Hi,

Is it possible to remove from search results the multivalued fields that
don't pass the search criteria?

My schema is defined as:

<!-- course_id -->
<field name="id" type="string" indexed="true" stored="true"
required="true" />
<!-- course_name -->
<field name="name" type="string" indexed="true" stored="true"/>
<!-- events.event_town -->
<field name="town" type="string" indexed="true" stored="true"
multiValued="true"/>
<!-- events.event_date -->
<field name="date" type="tdate" indexed="true" stored="true"
multiValued="true"/>

And example docs are:

+----+----------------------+------------+------------+
| id | name                 | town       | date       |
+----+----------------------+------------+------------+
| 1  | Microsoft Excel      | London     | 2010-08-20 |
|    |                      | Glasgow    | 2010-08-24 |
|    |                      | Leeds      | 2010-08-28 |
| 2  | Microsoft Word       | Aberdeen   | 2010-08-21 |
|    |                      | Reading    | 2010-08-25 |
|    |                      | London     | 2010-08-29 |
| 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
|    |                      | Leeds      | 2010-08-26 |
+----+----------------------+------------+------------+

so the query for q=name:Microsoft town:Leeds returns docs 1 & 3.

How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?

Or is it that I should create separate doc for each name-event?

Thanks,
Alex

Re: Filter multivalue fields from search result

Posted by Chantal Ackermann <ch...@btelligent.de>.
Hi Alex,

feedback inline:

On Mon, 2010-07-12 at 12:03 +0200, "Alex J. G. Burzyński" wrote:
> Hi Chantal,
> 
> The paging problem I've asked about is that having course-event pairs 
> and specifying rows limits the number of pairs returned not the courses
> 
> +-------+----------------------+------------+------------+
> | id-id | name                 | town       | date       |
> +-------+----------------------+------------+------------+
> | 1-1   | Microsoft Excel      | London     | 2010-08-20 |
> | 1-2   | Microsoft Excel      | Glasgow    | 2010-08-24 |
> | 1-3   | Microsoft Excel      | Leeds      | 2010-08-28 |
> | 2-1   | Microsoft Word       | Aberdeen   | 2010-08-21 |
> | 2-2   | Microsoft Word       | Reading    | 2010-08-25 |
> | 2-3   | Microsoft Word       | London     | 2010-08-29 |
> | 3-1   | Microsoft Powerpoint | Birmingham | 2010-08-22 |
> | 3-2   | Microsoft Powerpoint | Leeds      | 2010-08-26 |
> | 3-3   | Microsoft Powerpoint | Leeds      | 2010-08-30 |
> +-------+----------------------+------------+------------+
> 
> 
> And from UI point of view I'm returning less courses then events - 
> that's why I've asked about paging.
> 
> The search for q=name:Microsoft town:Leeds with rows=2 should return:
> 1-3 & 3-2 & 3-3

If you want to list all available courses in a query and also display
how often and where they take place, then query for "name" (in your
table") and facet on "town" per name. This might require the use of the
facet.query parameter.

Otherwise use your query from above and group afterwards in the client
or your server backend. Of course, you should increase the rows value.
But I see your point with paging, so facetting might be a better option.
Or maybe field collapsing is what you need (there is a patch - search
for "solr field collapsing" and you should find a lot about it). (I
haven't tried that, however, and it's just a guess.)

Chantal

> 
> But 3-3 will be obviously on page 2.
> 
> I hope that it makes my questions more clear.
> 
> Thanks,
> Alex
> 



Re: Filter multivalue fields from search result

Posted by "Alex J. G. Burzyński" <ma...@ajgb.net>.
Hi Chantal,

The paging problem I've asked about is that having course-event pairs 
and specifying rows limits the number of pairs returned not the courses

+-------+----------------------+------------+------------+
| id-id | name                 | town       | date       |
+-------+----------------------+------------+------------+
| 1-1   | Microsoft Excel      | London     | 2010-08-20 |
| 1-2   | Microsoft Excel      | Glasgow    | 2010-08-24 |
| 1-3   | Microsoft Excel      | Leeds      | 2010-08-28 |
| 2-1   | Microsoft Word       | Aberdeen   | 2010-08-21 |
| 2-2   | Microsoft Word       | Reading    | 2010-08-25 |
| 2-3   | Microsoft Word       | London     | 2010-08-29 |
| 3-1   | Microsoft Powerpoint | Birmingham | 2010-08-22 |
| 3-2   | Microsoft Powerpoint | Leeds      | 2010-08-26 |
| 3-3   | Microsoft Powerpoint | Leeds      | 2010-08-30 |
+-------+----------------------+------------+------------+


And from UI point of view I'm returning less courses then events - 
that's why I've asked about paging.

The search for q=name:Microsoft town:Leeds with rows=2 should return:
1-3 & 3-2 & 3-3

But 3-3 will be obviously on page 2.

I hope that it makes my questions more clear.

Thanks,
Alex


On 2010-07-12 10:26, Chantal Ackermann wrote:
> Hi Alex,
>
> I think you have to explain the complete use case. Paging is done by
> specifying the parameter "start" (and "rows" if you want to have more or
> less than 10 hits per page). For each page you need of course a new
> query, but the queries differ only in the parameter value "start" (first
> page start=0, second page start=10 etc. if rows=10). The other
> parameters remain the same.
>
> You should also have a look at facets. They might help you to get a list
> of the values of your multi valued fields that you can display in the
> UI, allowing the user to drill down the results further.
>
> Chantal
>
> On Mon, 2010-07-12 at 10:26 +0200, "Alex J. G. Burzyński" wrote:
>    
>> Hi,
>>
>> So if those are separate documents how should I handle paging? Two
>> separate queries?
>> First to return all matching courses-events pairs, and second one to get
>> courses for given page?
>>
>> Is this common design described in details somewhere?
>>
>> Thanks,
>> Alex
>>
>> On 2010-07-09 01:50, Lance Norskog wrote:
>>      
>>> Yes, denormalizing the index into separate (name,town) pairs is the
>>> common design for this problem.
>>>
>>> 2010/7/8 "Alex J. G. Burzyński"<ma...@ajgb.net>:
>>>
>>>        
>>>> Hi,
>>>>
>>>> Is it possible to remove from search results the multivalued fields that
>>>> don't pass the search criteria?
>>>>
>>>> My schema is defined as:
>>>>
>>>> <!-- course_id -->
>>>> <field name="id" type="string" indexed="true" stored="true"
>>>> required="true" />
>>>> <!-- course_name -->
>>>> <field name="name" type="string" indexed="true" stored="true"/>
>>>> <!-- events.event_town -->
>>>> <field name="town" type="string" indexed="true" stored="true"
>>>> multiValued="true"/>
>>>> <!-- events.event_date -->
>>>> <field name="date" type="tdate" indexed="true" stored="true"
>>>> multiValued="true"/>
>>>>
>>>> And example docs are:
>>>>
>>>> +----+----------------------+------------+------------+
>>>> | id | name                 | town       | date       |
>>>> +----+----------------------+------------+------------+
>>>> | 1  | Microsoft Excel      | London     | 2010-08-20 |
>>>> |    |                      | Glasgow    | 2010-08-24 |
>>>> |    |                      | Leeds      | 2010-08-28 |
>>>> | 2  | Microsoft Word       | Aberdeen   | 2010-08-21 |
>>>> |    |                      | Reading    | 2010-08-25 |
>>>> |    |                      | London     | 2010-08-29 |
>>>> | 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
>>>> |    |                      | Leeds      | 2010-08-26 |
>>>> +----+----------------------+------------+------------+
>>>>
>>>> so the query for q=name:Microsoft town:Leeds returns docs 1&   3.
>>>>
>>>> How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?
>>>>
>>>> Or is it that I should create separate doc for each name-event?
>>>>
>>>> Thanks,
>>>> Alex
>>>>
>>>>
>>>>          
>>>
>>>
>>>        
>
>    

Re: Filter multivalue fields from search result

Posted by Chantal Ackermann <ch...@btelligent.de>.
Hi Alex,

I think you have to explain the complete use case. Paging is done by
specifying the parameter "start" (and "rows" if you want to have more or
less than 10 hits per page). For each page you need of course a new
query, but the queries differ only in the parameter value "start" (first
page start=0, second page start=10 etc. if rows=10). The other
parameters remain the same.

You should also have a look at facets. They might help you to get a list
of the values of your multi valued fields that you can display in the
UI, allowing the user to drill down the results further.

Chantal

On Mon, 2010-07-12 at 10:26 +0200, "Alex J. G. Burzyński" wrote:
> Hi,
> 
> So if those are separate documents how should I handle paging? Two 
> separate queries?
> First to return all matching courses-events pairs, and second one to get 
> courses for given page?
> 
> Is this common design described in details somewhere?
> 
> Thanks,
> Alex
> 
> On 2010-07-09 01:50, Lance Norskog wrote:
> > Yes, denormalizing the index into separate (name,town) pairs is the
> > common design for this problem.
> >
> > 2010/7/8 "Alex J. G. Burzyński"<ma...@ajgb.net>:
> >    
> >> Hi,
> >>
> >> Is it possible to remove from search results the multivalued fields that
> >> don't pass the search criteria?
> >>
> >> My schema is defined as:
> >>
> >> <!-- course_id -->
> >> <field name="id" type="string" indexed="true" stored="true"
> >> required="true" />
> >> <!-- course_name -->
> >> <field name="name" type="string" indexed="true" stored="true"/>
> >> <!-- events.event_town -->
> >> <field name="town" type="string" indexed="true" stored="true"
> >> multiValued="true"/>
> >> <!-- events.event_date -->
> >> <field name="date" type="tdate" indexed="true" stored="true"
> >> multiValued="true"/>
> >>
> >> And example docs are:
> >>
> >> +----+----------------------+------------+------------+
> >> | id | name                 | town       | date       |
> >> +----+----------------------+------------+------------+
> >> | 1  | Microsoft Excel      | London     | 2010-08-20 |
> >> |    |                      | Glasgow    | 2010-08-24 |
> >> |    |                      | Leeds      | 2010-08-28 |
> >> | 2  | Microsoft Word       | Aberdeen   | 2010-08-21 |
> >> |    |                      | Reading    | 2010-08-25 |
> >> |    |                      | London     | 2010-08-29 |
> >> | 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
> >> |    |                      | Leeds      | 2010-08-26 |
> >> +----+----------------------+------------+------------+
> >>
> >> so the query for q=name:Microsoft town:Leeds returns docs 1&  3.
> >>
> >> How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?
> >>
> >> Or is it that I should create separate doc for each name-event?
> >>
> >> Thanks,
> >> Alex
> >>
> >>      
> >
> >
> >    



Re: Filter multivalue fields from search result

Posted by "Alex J. G. Burzyński" <ma...@ajgb.net>.
Hi,

So if those are separate documents how should I handle paging? Two 
separate queries?
First to return all matching courses-events pairs, and second one to get 
courses for given page?

Is this common design described in details somewhere?

Thanks,
Alex

On 2010-07-09 01:50, Lance Norskog wrote:
> Yes, denormalizing the index into separate (name,town) pairs is the
> common design for this problem.
>
> 2010/7/8 "Alex J. G. Burzyński"<ma...@ajgb.net>:
>    
>> Hi,
>>
>> Is it possible to remove from search results the multivalued fields that
>> don't pass the search criteria?
>>
>> My schema is defined as:
>>
>> <!-- course_id -->
>> <field name="id" type="string" indexed="true" stored="true"
>> required="true" />
>> <!-- course_name -->
>> <field name="name" type="string" indexed="true" stored="true"/>
>> <!-- events.event_town -->
>> <field name="town" type="string" indexed="true" stored="true"
>> multiValued="true"/>
>> <!-- events.event_date -->
>> <field name="date" type="tdate" indexed="true" stored="true"
>> multiValued="true"/>
>>
>> And example docs are:
>>
>> +----+----------------------+------------+------------+
>> | id | name                 | town       | date       |
>> +----+----------------------+------------+------------+
>> | 1  | Microsoft Excel      | London     | 2010-08-20 |
>> |    |                      | Glasgow    | 2010-08-24 |
>> |    |                      | Leeds      | 2010-08-28 |
>> | 2  | Microsoft Word       | Aberdeen   | 2010-08-21 |
>> |    |                      | Reading    | 2010-08-25 |
>> |    |                      | London     | 2010-08-29 |
>> | 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
>> |    |                      | Leeds      | 2010-08-26 |
>> +----+----------------------+------------+------------+
>>
>> so the query for q=name:Microsoft town:Leeds returns docs 1&  3.
>>
>> How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?
>>
>> Or is it that I should create separate doc for each name-event?
>>
>> Thanks,
>> Alex
>>
>>      
>
>
>    

Re: Filter multivalue fields from search result

Posted by Lance Norskog <go...@gmail.com>.
Yes, denormalizing the index into separate (name,town) pairs is the
common design for this problem.

2010/7/8 "Alex J. G. Burzyński" <ma...@ajgb.net>:
> Hi,
>
> Is it possible to remove from search results the multivalued fields that
> don't pass the search criteria?
>
> My schema is defined as:
>
> <!-- course_id -->
> <field name="id" type="string" indexed="true" stored="true"
> required="true" />
> <!-- course_name -->
> <field name="name" type="string" indexed="true" stored="true"/>
> <!-- events.event_town -->
> <field name="town" type="string" indexed="true" stored="true"
> multiValued="true"/>
> <!-- events.event_date -->
> <field name="date" type="tdate" indexed="true" stored="true"
> multiValued="true"/>
>
> And example docs are:
>
> +----+----------------------+------------+------------+
> | id | name                 | town       | date       |
> +----+----------------------+------------+------------+
> | 1  | Microsoft Excel      | London     | 2010-08-20 |
> |    |                      | Glasgow    | 2010-08-24 |
> |    |                      | Leeds      | 2010-08-28 |
> | 2  | Microsoft Word       | Aberdeen   | 2010-08-21 |
> |    |                      | Reading    | 2010-08-25 |
> |    |                      | London     | 2010-08-29 |
> | 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
> |    |                      | Leeds      | 2010-08-26 |
> +----+----------------------+------------+------------+
>
> so the query for q=name:Microsoft town:Leeds returns docs 1 & 3.
>
> How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?
>
> Or is it that I should create separate doc for each name-event?
>
> Thanks,
> Alex
>



-- 
Lance Norskog
goksron@gmail.com