You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Achim Domma <do...@procoders.net> on 2013/05/20 23:12:52 UTC

Store complex (i.e. label + id) meta data in SOLR document

I store documents having some meta data that is composed out of multiple values. Usually an id with a label. A simple example would be the name of a city and the unique id of that city. The id is needed, because different cities can have the same name like Berlin in Germany and Berlin in the US. The name is obviously needed, because I want to search for that string.

If I use facets, I would like to get back two facets having the label "Berlin". If I restrict my search (using some other meta data field) to documents from germany, I would expect to get only one facet for the german Berlin. Obviously this does not work, if I store id and label in two seperated SOLR fields.

I would assume that this is not an uncommon requirement, but I was not able to find any useful information. My current approaches are:

 * Implement a complete custom field type in Java: Hard to estimate for me, because I'm currently just a SOLR user, not a SOLR developer.

 * Put id and label in a single string (like "123:Berlin" and "456:Berlin") and define custom field types in schema.xml using a custom analyzer which splits the value. Sound reasonable to me, but I'm not 100% sure if it will work with faceting.

 * I found some references to subfields, but only on older pages and I was not able to find useful documentation.

Is there some well known way to solve this in SOLR?

kind regards,
Achim

Re: Store complex (i.e. label + id) meta data in SOLR document

Posted by Lan <du...@gmail.com>.
Have you looked at field collapsing?
http://wiki.apache.org/solr/FieldCollapsing

You would collapse on the id and the when solr returns the results you could
extract the facet label from solr document from each group.



--
View this message in context: http://lucene.472066.n3.nabble.com/Store-complex-i-e-label-id-meta-data-in-SOLR-document-tp4064752p4065017.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Store complex (i.e. label + id) meta data in SOLR document

Posted by Achim Domma <do...@procoders.net>.
Sorry, I think my reference to restriction by country was more confusing than helpful. Let's say, that the author of the document is one dimension I would like to use facets for. "author" would be one field in my document schema. Now let's take "Schmidt, M." as author name, which is quite common in Germany. There are multiple authors with that name which all have unique ids in our source data. When searching and calculating facets, I would not like to get back only one "Schmidt, M." but multiple ones, each having the unique id from our source data.

I thinks so far it would be easy to store "123:Schmidt, M.", "234:Schmidt, M.", ... But I would also like to be able to search for "Schmidt" or to get autocomplete for "Schm...". Therefore my idea to have a custom field type, which stores "123:Schmidt", ... as value, but processes the string in a way (i.e. split at ':' and strip the first part), that only "Schmidt" get's stored in the search index.

If I do it like this, I would expect, that text search and autocomplete work just with "Schmidt, M." but I should get back "123:Schmidt, M.", ... as facet. I think that would solve my problem.

My first question would be: Does this make sense at all or do I understand something wrong? And the second question would be: Is there a better, simpler solution?

Does this make more clear what I want to do?

kind regards,
Achim




Am 20.05.2013 um 23:27 schrieb Jack Krupansky:

> Tell us a little more, with examples, of how you really want to search and facet this information.
> 
> One technique is to store the same information in multiple ways, for different uses, combining the name in different ways, such as "Berlin", "Berlin:DE", "Berlin, NJ", "Berlin:Germany", "Berlin GERMANY", etc.
> 
> Ultimately, the idea for facets is not that they uniquely identify an entity, but that a combination of facet selections let you drill down into the data, such that each facet selection narrows one dimension.
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: Achim Domma
> Sent: Monday, May 20, 2013 5:12 PM
> To: solr-user@lucene.apache.org
> Subject: Store complex (i.e. label + id) meta data in SOLR document
> 
> I store documents having some meta data that is composed out of multiple values. Usually an id with a label. A simple example would be the name of a city and the unique id of that city. The id is needed, because different cities can have the same name like Berlin in Germany and Berlin in the US. The name is obviously needed, because I want to search for that string.
> 
> If I use facets, I would like to get back two facets having the label "Berlin". If I restrict my search (using some other meta data field) to documents from germany, I would expect to get only one facet for the german Berlin. Obviously this does not work, if I store id and label in two seperated SOLR fields.
> 
> I would assume that this is not an uncommon requirement, but I was not able to find any useful information. My current approaches are:
> 
> * Implement a complete custom field type in Java: Hard to estimate for me, because I'm currently just a SOLR user, not a SOLR developer.
> 
> * Put id and label in a single string (like "123:Berlin" and "456:Berlin") and define custom field types in schema.xml using a custom analyzer which splits the value. Sound reasonable to me, but I'm not 100% sure if it will work with faceting.
> 
> * I found some references to subfields, but only on older pages and I was not able to find useful documentation.
> 
> Is there some well known way to solve this in SOLR?
> 
> kind regards,
> Achim= 


Re: Store complex (i.e. label + id) meta data in SOLR document

Posted by Jack Krupansky <ja...@basetechnology.com>.
Tell us a little more, with examples, of how you really want to search and 
facet this information.

One technique is to store the same information in multiple ways, for 
different uses, combining the name in different ways, such as "Berlin", 
"Berlin:DE", "Berlin, NJ", "Berlin:Germany", "Berlin GERMANY", etc.

Ultimately, the idea for facets is not that they uniquely identify an 
entity, but that a combination of facet selections let you drill down into 
the data, such that each facet selection narrows one dimension.

-- Jack Krupansky

-----Original Message----- 
From: Achim Domma
Sent: Monday, May 20, 2013 5:12 PM
To: solr-user@lucene.apache.org
Subject: Store complex (i.e. label + id) meta data in SOLR document

I store documents having some meta data that is composed out of multiple 
values. Usually an id with a label. A simple example would be the name of a 
city and the unique id of that city. The id is needed, because different 
cities can have the same name like Berlin in Germany and Berlin in the US. 
The name is obviously needed, because I want to search for that string.

If I use facets, I would like to get back two facets having the label 
"Berlin". If I restrict my search (using some other meta data field) to 
documents from germany, I would expect to get only one facet for the german 
Berlin. Obviously this does not work, if I store id and label in two 
seperated SOLR fields.

I would assume that this is not an uncommon requirement, but I was not able 
to find any useful information. My current approaches are:

* Implement a complete custom field type in Java: Hard to estimate for me, 
because I'm currently just a SOLR user, not a SOLR developer.

* Put id and label in a single string (like "123:Berlin" and "456:Berlin") 
and define custom field types in schema.xml using a custom analyzer which 
splits the value. Sound reasonable to me, but I'm not 100% sure if it will 
work with faceting.

* I found some references to subfields, but only on older pages and I was 
not able to find useful documentation.

Is there some well known way to solve this in SOLR?

kind regards,
Achim=