You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by elisabeth benoit <el...@gmail.com> on 2011/07/05 09:12:05 UTC

faceting on field with two values

Hello,

I have two fields TOWN and POSTALCODE and I want to concat those two in one
field to do faceting

My two fields  are declared as followed:

<field name="TOWN" type="string" indexed="true" stored="true"/>
<field name="POSTALCODE" type="string" indexed="true" stored="true"/>

The concat field is declared as followed:

<field name="TOWN_POSTALCODE" type="string" indexed="true" stored="true"
multiValued="true"/>

and I do the copyfield as followed:

   <copyField source="TOWN" dest="TOWN_POSTALCODE"/>
   <copyField source="POSTALCODE" dest="TOWN_POSTALCODE"/>


When I do faceting on TOWN_POSTALCODE field, I only get answers like

<lst name="TOWN_POSTALCODE">
<int name="62200">5</int>
<int name="62280">5</int>
<int name="boulogne sur mer">5</int>
<int name="saint martin boulogne">5</int>
...

Which means the faceting is down on the TOWN part or the POSTALCODE part of
TOWN_POSTALCODE.

But I would like to have answers like

<lst name="TOWN_POSTALCODE">
<int name="boulogne sur mer 62200">5</int>
<int name="paris 75016">5</int>

Is this possible with Solr?

Thanks,
Elisabeth

Re: faceting on field with two values

Posted by elisabeth benoit <el...@gmail.com>.
Thanks for your advice and for your comments.

In fact, we don't use facets to offer a facet UI to user, but to analyze
user request, then send a second request to Solr.

Lot of requests have lot of answers (often more then a thousand), so we need
to filter user request with fq parameter, if possible.


Best,
Elisabeth


2011/7/5 Chris Hostetter <ho...@fucit.org>

>
> : I have two fields TOWN and POSTALCODE and I want to concat those two in
> one
> : field to do faceting
>
> As others have pointed out, copy field doesn't do a "concat", it just
> adds the field values from the source field to the desc field (so with
> those two <copyField/> lines you will typically get two values for each
> doc in the dest field)
>
> if you don't wnat to go the DIH route, and you don't want to change your
> talend process, you could use a simple UpdateProcessor for this (update
> processors are used to process add/delete requests no matter what
> source the come from, before analysis happens) ... but i don't think we
> have any off the shelf "Concat" update processors in solr at the moment
>
> there is a patch for a a Script based on which might be helpful..
> https://issues.apache.org/jira/browse/SOLR-1725
>
> All of that said, based on what you've described about your usecase i
> would question from a UI standpoint wether this field would actually a
> good idea...
>
> isn't there an extremely large number of postal codes even in a single
> city?
>
> why not let people fact on just the town field first, and then only when
> they click on one, offer them a facet on Postal code?
>
> Otherwise your facet UI is going to have a tendenzy to look like this...
>
>  Gender:
>   * Male  (9000 results)
>   * Female  (8000 results)
>  Town/Postal:
>   * paris, 75016  (560 results)
>   * paris, 75015  (490 results)
>   * paris, 75022  (487 results)
>   * boulogne sur mer 62200 (468 results)
>   * paris, 75018  (465 results)
>   * (click to see more)
>  Color:
>   * Red (900 results)
>   * Blue (800 results)
>
> ...and many of your users will never find the town they are looking for
> (let alone the post code)
>
>
> -Hoss
>

Re: faceting on field with two values

Posted by Chris Hostetter <ho...@fucit.org>.
: I have two fields TOWN and POSTALCODE and I want to concat those two in one
: field to do faceting

As others have pointed out, copy field doesn't do a "concat", it just 
adds the field values from the source field to the desc field (so with 
those two <copyField/> lines you will typically get two values for each 
doc in the dest field)

if you don't wnat to go the DIH route, and you don't want to change your 
talend process, you could use a simple UpdateProcessor for this (update 
processors are used to process add/delete requests no matter what 
source the come from, before analysis happens) ... but i don't think we 
have any off the shelf "Concat" update processors in solr at the moment 

there is a patch for a a Script based on which might be helpful..
https://issues.apache.org/jira/browse/SOLR-1725

All of that said, based on what you've described about your usecase i 
would question from a UI standpoint wether this field would actually a 
good idea...

isn't there an extremely large number of postal codes even in a single 
city?

why not let people fact on just the town field first, and then only when 
they click on one, offer them a facet on Postal code?

Otherwise your facet UI is going to have a tendenzy to look like this...

 Gender:
   * Male  (9000 results)
   * Female  (8000 results)
 Town/Postal:
   * paris, 75016  (560 results)
   * paris, 75015  (490 results)
   * paris, 75022  (487 results)
   * boulogne sur mer 62200 (468 results)
   * paris, 75018  (465 results)
   * (click to see more)
 Color:
   * Red (900 results)
   * Blue (800 results)

...and many of your users will never find the town they are looking for 
(let alone the post code)


-Hoss

Re: faceting on field with two values

Posted by Marian Steinbach <ma...@sendung.de>.
On Tue, Jul 5, 2011 at 10:21, elisabeth benoit
<el...@gmail.com> wrote:
> ...
>
> so do you think the dih (which I just discovered) would be appropriate to do
> the whole process (read a database, read fields from xml contained in some
> of the database columns, add informations from csv file)???
>
> from what I just read about dih, it seems so, but I'm still very confused
> about this dih thing.

As far as I can tell, the DataImportHandler is very useful if you want
to get data (only) from a database directly to Solr, with only slight
manipulation, e.g. concatenations. For that, it's much more convenient
than the path via scripts to generate XML.

It sounds like you are doing more than that in your importers.

Re: faceting on field with two values

Posted by elisabeth benoit <el...@gmail.com>.
hmmm... that sounds interesting and brings me somewhere else.

we are actually reindexing data every night but the whole process is done by
talend (reading and formatting data from a database) and this makes me
wondering if we should use Solr instead to do this.

in this case, concat two fields, the change is quite heavy (we have to
change the talend process, pollute the xml files we use to index data with
redundant fields, then modify the Solr process).

so do you think the dih (which I just discovered) would be appropriate to do
the whole process (read a database, read fields from xml contained in some
of the database columns, add informations from csv file)???

from what I just read about dih, it seems so, but I'm still very confused
about this dih thing.

thanks again,
Elisabeth

2011/7/5 roySolr <ro...@gmail.com>

> Are you using the DIH?? You can use the transformer to concat the two
> fields
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/faceting-on-field-with-two-values-tp3139870p3139934.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: faceting on field with two values

Posted by roySolr <ro...@gmail.com>.
Are you using the DIH?? You can use the transformer to concat the two fields

--
View this message in context: http://lucene.472066.n3.nabble.com/faceting-on-field-with-two-values-tp3139870p3139934.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: faceting on field with two values

Posted by Bill Bell <bi...@gmail.com>.
The easiest way is to concat() the fields in SQL, and pass it to indexing
as one field already merged together.

Thanks,

On 7/5/11 1:12 AM, "elisabeth benoit" <el...@gmail.com> wrote:

>Hello,
>
>I have two fields TOWN and POSTALCODE and I want to concat those two in
>one
>field to do faceting
>
>My two fields  are declared as followed:
>
><field name="TOWN" type="string" indexed="true" stored="true"/>
><field name="POSTALCODE" type="string" indexed="true" stored="true"/>
>
>The concat field is declared as followed:
>
><field name="TOWN_POSTALCODE" type="string" indexed="true" stored="true"
>multiValued="true"/>
>
>and I do the copyfield as followed:
>
>   <copyField source="TOWN" dest="TOWN_POSTALCODE"/>
>   <copyField source="POSTALCODE" dest="TOWN_POSTALCODE"/>
>
>
>When I do faceting on TOWN_POSTALCODE field, I only get answers like
>
><lst name="TOWN_POSTALCODE">
><int name="62200">5</int>
><int name="62280">5</int>
><int name="boulogne sur mer">5</int>
><int name="saint martin boulogne">5</int>
>...
>
>Which means the faceting is down on the TOWN part or the POSTALCODE part
>of
>TOWN_POSTALCODE.
>
>But I would like to have answers like
>
><lst name="TOWN_POSTALCODE">
><int name="boulogne sur mer 62200">5</int>
><int name="paris 75016">5</int>
>
>Is this possible with Solr?
>
>Thanks,
>Elisabeth