You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Stephane Bailliez <sb...@gmail.com> on 2008/12/01 17:37:58 UTC

Dealing with field values as key/value pairs

Hi all,


I'm looking for ideas about how to best deal with a situation where I 
need to deal with storing key/values pairs in the index for consumption 
in the client.


Typical example would be to have a document with multiple genres where 
for simplicity reasons i'd like to send both the 'id' and the 'human 
readable label' (might not be the best example since one would 
immediatly say 'what about localization', but in that case assume it's 
an entity such as company name or a person name).

So say I have

field1 = { 'key1':'this is value1', 'key2':'this is value2' }


I was thinking the easiest (not the prettiest) solution would be to 
store it as effectively a string 'key:this is the value' and then have 
the client deal with this 'format' and then parse it based on 
'<key>:<value>' pattern

Another alternative I was thinking may have been to use a custom field 
that effectively would make the field value as a map key/value for the 
writer but I'm not so sure it can really be done, haven't investigated 
that one deeply.

Any feedback would be welcome, solution might even be simpler and 
cleaner than what I'm mentioning above, but my brain is mushy in the 
last couple of weeks.

-- stephane


Re: Dealing with field values as key/value pairs

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
On Thu, Dec 11, 2008 at 4:41 AM, Chris Hostetter
<ho...@fucit.org> wrote:
>
> : This is really cool. Ummmm... How does it integrate with the Data Import
> : Handler?
>
> my DIH knowledge is extremely limited, but i'm guessing approach #1 is
> trivial (there is an easy way to concat DB values to build up solr field
> values right?);
yes TemplateTransformer can help you here

approach #2 would probably be possible using multiple root
> entities (assuming multiple root entites means what i think it means)

Yes ,multiple rooot entities can do the trick (with a separate doctype).


>
> : I've taken two approaches in the past...
> :
> : 1) encode the "id" and the "label" in the field value; facet on it; require
> : clients to know how to decode.  This works really well for simple things
> : where the the id=>label mappings don't ever change, and are easy to encode
> : (ie "01234:Chris Hostetter").  This is a horrible approach when id=>label
> : mappings do change with any frequency.
> :
> : 2) have a seperate type of "metadata" document, one per "thing" that you are
> : faceting on containing fields for id and the label (and probably a doc_type
> : field so you can tell it apart from your main docs) then once you've done
> : your main query and gotten the results back facetied on id, you can query
> : for those ids to get the corrisponding labels.  this works realy well if the
> : labels ever change (just reindex the corrisponding metadata document) and
> : has the added bonus that you can store additional metadata in each of those
> : docs, and in many use cases for presenting an initial "browse" interface,
> : you can sometimes get away with a cheap search for all metadata docs (or all
> : metadata docs meeting a certain
> : criteria) instead of an expensive facet query across all of your main
> : documents.
>
>
>
> -Hoss
>
>



-- 
--Noble Paul

RE: Dealing with field values as key/value pairs

Posted by Chris Hostetter <ho...@fucit.org>.
: This is really cool. Ummmm... How does it integrate with the Data Import
: Handler?

my DIH knowledge is extremely limited, but i'm guessing approach #1 is 
trivial (there is an easy way to concat DB values to build up solr field 
values right?); approach #2 would probably be possible using multiple root 
entities (assuming multiple root entites means what i think it means)

: I've taken two approaches in the past...
: 
: 1) encode the "id" and the "label" in the field value; facet on it; require
: clients to know how to decode.  This works really well for simple things
: where the the id=>label mappings don't ever change, and are easy to encode
: (ie "01234:Chris Hostetter").  This is a horrible approach when id=>label
: mappings do change with any frequency.
: 
: 2) have a seperate type of "metadata" document, one per "thing" that you are
: faceting on containing fields for id and the label (and probably a doc_type
: field so you can tell it apart from your main docs) then once you've done
: your main query and gotten the results back facetied on id, you can query
: for those ids to get the corrisponding labels.  this works realy well if the
: labels ever change (just reindex the corrisponding metadata document) and
: has the added bonus that you can store additional metadata in each of those
: docs, and in many use cases for presenting an initial "browse" interface,
: you can sometimes get away with a cheap search for all metadata docs (or all
: metadata docs meeting a certain
: criteria) instead of an expensive facet query across all of your main
: documents.



-Hoss


RE: Dealing with field values as key/value pairs

Posted by Lance Norskog <go...@gmail.com>.
This is really cool. Ummmm... How does it integrate with the Data Import
Handler?

Lance

-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: Friday, December 05, 2008 8:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Dealing with field values as key/value pairs


: So i'm basically looking for design pattern/best practice for that
scenario
: based on people's experience.

I've taken two approaches in the past...

1) encode the "id" and the "label" in the field value; facet on it; require
clients to know how to decode.  This works really well for simple things
where the the id=>label mappings don't ever change, and are easy to encode
(ie "01234:Chris Hostetter").  This is a horrible approach when id=>label
mappings do change with any frequency.

2) have a seperate type of "metadata" document, one per "thing" that you are
faceting on containing fields for id and the label (and probably a doc_type
field so you can tell it apart from your main docs) then once you've done
your main query and gotten the results back facetied on id, you can query
for those ids to get the corrisponding labels.  this works realy well if the
labels ever change (just reindex the corrisponding metadata document) and
has the added bonus that you can store additional metadata in each of those
docs, and in many use cases for presenting an initial "browse" interface,
you can sometimes get away with a cheap search for all metadata docs (or all
metadata docs meeting a certain
criteria) instead of an expensive facet query across all of your main
documents.



-Hoss



Re: Dealing with field values as key/value pairs

Posted by Chris Hostetter <ho...@fucit.org>.
: So i'm basically looking for design pattern/best practice for that scenario
: based on people's experience.

I've taken two approaches in the past...

1) encode the "id" and the "label" in the field value; facet on it; 
require clients to know how to decode.  This works really well for simple 
things where the the id=>label mappings don't ever change, and are 
easy to encode (ie "01234:Chris Hostetter").  This is a horrible approach 
when id=>label mappings do change with any frequency.

2) have a seperate type of "metadata" document, one per "thing" that you 
are faceting on containing fields for id and the label (and probably a 
doc_type field so you can tell it apart from your main docs) then once 
you've done your main query and gotten the results back facetied on id, 
you can query for those ids to get the corrisponding labels.  this works 
realy well if the labels ever change (just reindex the corrisponding 
metadata document) and has the added bonus that you can store additional 
metadata in each of those docs, and in many use cases for presenting an 
initial "browse" interface, you can sometimes get away with a cheap 
search for all metadata docs (or all metadata docs meeting a certain 
criteria) instead of an expensive facet query across all of your main 
documents.



-Hoss


Re: Dealing with field values as key/value pairs

Posted by Stephane Bailliez <sb...@gmail.com>.
Yeh, sorry was not clear in my question. Storage would end up being done 
the same way of course

I guess I'm more looking for feedback about what people have used as a 
strategy to handle this type of situation. This goes for faceting as well.

Assuming I do faceting by author and there is 2 authors with the same 
name. Does not work right.

So discovering hot water, here, the facet value is best expressed with 
identifiers which would uniquely identify your author. Then you lose the 
'name' and you need to effectively get it.

But if you want to effectively also offer the ability to offer the name 
of the author in your solr response in a 'standalone' way (ie: don't 
rely an another source of data, like the db where is stored that 
mapping) ...then you need to store this data in a convenient form in the 
index to be able to access it later.

So i'm basically looking for design pattern/best practice for that 
scenario based on people's experience.


I was also thinking about storing each values into dynamic fields such 
as 'metadata_<field>_<identifier>' and then assuming I have a facet 
'facet_<field>' which stores identifiers,  use a search component to 
provide the mapping as an 'extra' in the response  and give the mapping 
in another section of the response (similar to the debug, facets, etc)

ie: something like:
mapping: {

   '<field1>': { '<identifier1>': '<value1>', '<identifier2>': '<value2>' },

   '<field2>': { '<identifierx>': '<valuex>', '<identifiery>': '<valuey>' }
}

does that make sense ?

-- stephane

Noble Paul നോബിള്‍ नोब्ळ् wrote:
> In the end lucene stores stuff as strings.
> 
> Even if you do store your data as map FieldType , Solr May not be able
> to treat it like a map.
> So it is fine to put is the map as one single string
> 
> On Mon, Dec 1, 2008 at 10:07 PM, Stephane Bailliez <sb...@gmail.com> wrote:
>> Hi all,
>>
>>
>> I'm looking for ideas about how to best deal with a situation where I need
>> to deal with storing key/values pairs in the index for consumption in the
>> client.
>>
>>
>> Typical example would be to have a document with multiple genres where for
>> simplicity reasons i'd like to send both the 'id' and the 'human readable
>> label' (might not be the best example since one would immediatly say 'what
>> about localization', but in that case assume it's an entity such as company
>> name or a person name).
>>
>> So say I have
>>
>> field1 = { 'key1':'this is value1', 'key2':'this is value2' }
>>
>>
>> I was thinking the easiest (not the prettiest) solution would be to store it
>> as effectively a string 'key:this is the value' and then have the client
>> deal with this 'format' and then parse it based on '<key>:<value>' pattern
>>
>> Another alternative I was thinking may have been to use a custom field that
>> effectively would make the field value as a map key/value for the writer but
>> I'm not so sure it can really be done, haven't investigated that one deeply.
>>
>> Any feedback would be welcome, solution might even be simpler and cleaner
>> than what I'm mentioning above, but my brain is mushy in the last couple of
>> weeks.
>>
>> -- stephane
>>
>>
> 
> 
> 


Re: Dealing with field values as key/value pairs

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
In the end lucene stores stuff as strings.

Even if you do store your data as map FieldType , Solr May not be able
to treat it like a map.
So it is fine to put is the map as one single string

On Mon, Dec 1, 2008 at 10:07 PM, Stephane Bailliez <sb...@gmail.com> wrote:
> Hi all,
>
>
> I'm looking for ideas about how to best deal with a situation where I need
> to deal with storing key/values pairs in the index for consumption in the
> client.
>
>
> Typical example would be to have a document with multiple genres where for
> simplicity reasons i'd like to send both the 'id' and the 'human readable
> label' (might not be the best example since one would immediatly say 'what
> about localization', but in that case assume it's an entity such as company
> name or a person name).
>
> So say I have
>
> field1 = { 'key1':'this is value1', 'key2':'this is value2' }
>
>
> I was thinking the easiest (not the prettiest) solution would be to store it
> as effectively a string 'key:this is the value' and then have the client
> deal with this 'format' and then parse it based on '<key>:<value>' pattern
>
> Another alternative I was thinking may have been to use a custom field that
> effectively would make the field value as a map key/value for the writer but
> I'm not so sure it can really be done, haven't investigated that one deeply.
>
> Any feedback would be welcome, solution might even be simpler and cleaner
> than what I'm mentioning above, but my brain is mushy in the last couple of
> weeks.
>
> -- stephane
>
>



-- 
--Noble Paul