You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Saurabh Sethi <sa...@sendgrid.com> on 2017/06/26 17:11:37 UTC

Dynamic fields vs parent child

We have two requirements:

1. Indexing and storing event id and its timestamp.
2. Indexing and storing custom field name and value. The fields can be of
any type, but for now lets say they are of types string, date and number.

The events and custom fields for any solr document can easily be in
hundreds.

We are looking at two different approaches to handle these scenarios:

1. *Dynamic fields* - Have the fields name start with a particular pattern
like for string, the pattern could be like str_* and for event could be
eventid_*
2. *Parent/child fields* - This seems to be an overkill for our use case
since it's more for hierarchical data. Also, the parent and all its
children need to be reindexed on update which defeats the purpose - we are
now reindexing multiple docs instead of one with dynamic fields. But it
allows us to store custom field name along with its value unlike dynamic
fields where we will have to map user supplied custom field to some other
name based on type.

Has anyone handled similar scenarios with Solr? If so, which approach would
you recommend based on your experience?

We are using solr 6.6

Thanks,
Saurabh

Re: Dynamic fields vs parent child

Posted by Rick Leir <rl...@leirtech.com>.
Saurabh

Maybe you need two fields. The first field is named "keyName" and the 
second is "keyValue". Give that a try, though searching with AND may be 
a challenge.

Otherwise, use one field named "whatever" containing "key-value", 
assuming '-' never appears in keys or values. Search for an exact match.

cheers -- Rick


On 2017-06-27 03:56 PM, Susheel Kumar wrote:
> Do you have any close count of how many max dynamic fields you may have
> (1k, 2k or 3k etc.). In one of our index we have a total around 2K dynamic
> fields across all documents.
>
> My suggestion would be to try out dynamic fields for the use case you are
> describing and do some real performance test.
>
> Thanks,
> Susheel
>
> On Tue, Jun 27, 2017 at 3:01 PM, Saurabh Sethi <sa...@sendgrid.com>
> wrote:
>
>> We have key-value pairs that need to be searchable. We are looking for best
>> approach, both in terms of indexing (fast as well as space efficient) as
>> well as retrieval (fast search).
>>
>> Right now, the two approaches that we have are: Nested docs or dynamic
>> fields (myField_*_time:some date)
>>
>> The number of dynamic fields would definitely be > 1k.
>>
>> We wanted to get an idea which of these approaches work best or if there a
>> third approach which is better than nested and dynamic fields.
>>
>> On Tue, Jun 27, 2017 at 5:39 AM, Susheel Kumar <su...@gmail.com>
>> wrote:
>>
>>> Can you describe your use case in terms of what business functionality
>> you
>>> are looking to achieve.
>>>
>>> Thanks,
>>> Susheel
>>>
>>> On Mon, Jun 26, 2017 at 4:26 PM, Saurabh Sethi <
>> saurabh.sethi@sendgrid.com
>>> wrote:
>>>
>>>> Number of dynamic fields will be in thousands (millions of users +
>>>> thousands of events shared between subsets of users).
>>>>
>>>> We also thought about indexing in one field with value being
>>>> fieldname_fieldvalue. Since we support range queries for dates and
>>> numbers,
>>>> it won't work out of box.
>>>>
>>>> On Mon, Jun 26, 2017 at 1:05 PM, Erick Erickson <
>> erickerickson@gmail.com
>>>> wrote:
>>>>
>>>>> How many distinct fields do you expect across _all_ documents? That
>>>>> is, if doc1 has 10 dynamic fields and doc2 has 10 dynamic fields,
>> will
>>>>> there be exactly 10 fields total or more than 10 when you consider
>>>>> both documents?
>>>>>
>>>>> 100s of fields total across all documents is a tractable problem.
>>>>> thousands of dynamic fields total is going to be a problem.
>>>>>
>>>>> One technique that people do use is to index one field with a prefix
>>>>> rather than N dynamic fields. So you have something like
>>>>> dyn1_val1
>>>>> dyn1_val2
>>>>> dyn4_val67
>>>>>
>>>>> Only really works with string fields of course.
>>>>>
>>>>> Best,
>>>>> Erick
>>>>>
>>>>> On Mon, Jun 26, 2017 at 10:11 AM, Saurabh Sethi
>>>>> <sa...@sendgrid.com> wrote:
>>>>>> We have two requirements:
>>>>>>
>>>>>> 1. Indexing and storing event id and its timestamp.
>>>>>> 2. Indexing and storing custom field name and value. The fields can
>>> be
>>>> of
>>>>>> any type, but for now lets say they are of types string, date and
>>>> number.
>>>>>> The events and custom fields for any solr document can easily be in
>>>>>> hundreds.
>>>>>>
>>>>>> We are looking at two different approaches to handle these
>> scenarios:
>>>>>> 1. *Dynamic fields* - Have the fields name start with a particular
>>>>> pattern
>>>>>> like for string, the pattern could be like str_* and for event
>> could
>>> be
>>>>>> eventid_*
>>>>>> 2. *Parent/child fields* - This seems to be an overkill for our use
>>>> case
>>>>>> since it's more for hierarchical data. Also, the parent and all its
>>>>>> children need to be reindexed on update which defeats the purpose -
>>> we
>>>>> are
>>>>>> now reindexing multiple docs instead of one with dynamic fields.
>> But
>>> it
>>>>>> allows us to store custom field name along with its value unlike
>>>> dynamic
>>>>>> fields where we will have to map user supplied custom field to some
>>>> other
>>>>>> name based on type.
>>>>>>
>>>>>> Has anyone handled similar scenarios with Solr? If so, which
>> approach
>>>>> would
>>>>>> you recommend based on your experience?
>>>>>>
>>>>>> We are using solr 6.6
>>>>>>
>>>>>> Thanks,
>>>>>> Saurabh
>>>>
>>>>
>>>> --
>>>> Saurabh Sethi
>>>> Principal Engineer I | Engineering
>>>>
>>
>>
>> --
>> Saurabh Sethi
>> Principal Engineer I | Engineering
>>


Re: Dynamic fields vs parent child

Posted by Susheel Kumar <su...@gmail.com>.
Do you have any close count of how many max dynamic fields you may have
(1k, 2k or 3k etc.). In one of our index we have a total around 2K dynamic
fields across all documents.

My suggestion would be to try out dynamic fields for the use case you are
describing and do some real performance test.

Thanks,
Susheel

On Tue, Jun 27, 2017 at 3:01 PM, Saurabh Sethi <sa...@sendgrid.com>
wrote:

> We have key-value pairs that need to be searchable. We are looking for best
> approach, both in terms of indexing (fast as well as space efficient) as
> well as retrieval (fast search).
>
> Right now, the two approaches that we have are: Nested docs or dynamic
> fields (myField_*_time:some date)
>
> The number of dynamic fields would definitely be > 1k.
>
> We wanted to get an idea which of these approaches work best or if there a
> third approach which is better than nested and dynamic fields.
>
> On Tue, Jun 27, 2017 at 5:39 AM, Susheel Kumar <su...@gmail.com>
> wrote:
>
> > Can you describe your use case in terms of what business functionality
> you
> > are looking to achieve.
> >
> > Thanks,
> > Susheel
> >
> > On Mon, Jun 26, 2017 at 4:26 PM, Saurabh Sethi <
> saurabh.sethi@sendgrid.com
> > >
> > wrote:
> >
> > > Number of dynamic fields will be in thousands (millions of users +
> > > thousands of events shared between subsets of users).
> > >
> > > We also thought about indexing in one field with value being
> > > fieldname_fieldvalue. Since we support range queries for dates and
> > numbers,
> > > it won't work out of box.
> > >
> > > On Mon, Jun 26, 2017 at 1:05 PM, Erick Erickson <
> erickerickson@gmail.com
> > >
> > > wrote:
> > >
> > > > How many distinct fields do you expect across _all_ documents? That
> > > > is, if doc1 has 10 dynamic fields and doc2 has 10 dynamic fields,
> will
> > > > there be exactly 10 fields total or more than 10 when you consider
> > > > both documents?
> > > >
> > > > 100s of fields total across all documents is a tractable problem.
> > > > thousands of dynamic fields total is going to be a problem.
> > > >
> > > > One technique that people do use is to index one field with a prefix
> > > > rather than N dynamic fields. So you have something like
> > > > dyn1_val1
> > > > dyn1_val2
> > > > dyn4_val67
> > > >
> > > > Only really works with string fields of course.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Mon, Jun 26, 2017 at 10:11 AM, Saurabh Sethi
> > > > <sa...@sendgrid.com> wrote:
> > > > > We have two requirements:
> > > > >
> > > > > 1. Indexing and storing event id and its timestamp.
> > > > > 2. Indexing and storing custom field name and value. The fields can
> > be
> > > of
> > > > > any type, but for now lets say they are of types string, date and
> > > number.
> > > > >
> > > > > The events and custom fields for any solr document can easily be in
> > > > > hundreds.
> > > > >
> > > > > We are looking at two different approaches to handle these
> scenarios:
> > > > >
> > > > > 1. *Dynamic fields* - Have the fields name start with a particular
> > > > pattern
> > > > > like for string, the pattern could be like str_* and for event
> could
> > be
> > > > > eventid_*
> > > > > 2. *Parent/child fields* - This seems to be an overkill for our use
> > > case
> > > > > since it's more for hierarchical data. Also, the parent and all its
> > > > > children need to be reindexed on update which defeats the purpose -
> > we
> > > > are
> > > > > now reindexing multiple docs instead of one with dynamic fields.
> But
> > it
> > > > > allows us to store custom field name along with its value unlike
> > > dynamic
> > > > > fields where we will have to map user supplied custom field to some
> > > other
> > > > > name based on type.
> > > > >
> > > > > Has anyone handled similar scenarios with Solr? If so, which
> approach
> > > > would
> > > > > you recommend based on your experience?
> > > > >
> > > > > We are using solr 6.6
> > > > >
> > > > > Thanks,
> > > > > Saurabh
> > > >
> > >
> > >
> > >
> > > --
> > > Saurabh Sethi
> > > Principal Engineer I | Engineering
> > >
> >
>
>
>
> --
> Saurabh Sethi
> Principal Engineer I | Engineering
>

Re: Dynamic fields vs parent child

Posted by Saurabh Sethi <sa...@sendgrid.com>.
We have key-value pairs that need to be searchable. We are looking for best
approach, both in terms of indexing (fast as well as space efficient) as
well as retrieval (fast search).

Right now, the two approaches that we have are: Nested docs or dynamic
fields (myField_*_time:some date)

The number of dynamic fields would definitely be > 1k.

We wanted to get an idea which of these approaches work best or if there a
third approach which is better than nested and dynamic fields.

On Tue, Jun 27, 2017 at 5:39 AM, Susheel Kumar <su...@gmail.com>
wrote:

> Can you describe your use case in terms of what business functionality you
> are looking to achieve.
>
> Thanks,
> Susheel
>
> On Mon, Jun 26, 2017 at 4:26 PM, Saurabh Sethi <saurabh.sethi@sendgrid.com
> >
> wrote:
>
> > Number of dynamic fields will be in thousands (millions of users +
> > thousands of events shared between subsets of users).
> >
> > We also thought about indexing in one field with value being
> > fieldname_fieldvalue. Since we support range queries for dates and
> numbers,
> > it won't work out of box.
> >
> > On Mon, Jun 26, 2017 at 1:05 PM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> > > How many distinct fields do you expect across _all_ documents? That
> > > is, if doc1 has 10 dynamic fields and doc2 has 10 dynamic fields, will
> > > there be exactly 10 fields total or more than 10 when you consider
> > > both documents?
> > >
> > > 100s of fields total across all documents is a tractable problem.
> > > thousands of dynamic fields total is going to be a problem.
> > >
> > > One technique that people do use is to index one field with a prefix
> > > rather than N dynamic fields. So you have something like
> > > dyn1_val1
> > > dyn1_val2
> > > dyn4_val67
> > >
> > > Only really works with string fields of course.
> > >
> > > Best,
> > > Erick
> > >
> > > On Mon, Jun 26, 2017 at 10:11 AM, Saurabh Sethi
> > > <sa...@sendgrid.com> wrote:
> > > > We have two requirements:
> > > >
> > > > 1. Indexing and storing event id and its timestamp.
> > > > 2. Indexing and storing custom field name and value. The fields can
> be
> > of
> > > > any type, but for now lets say they are of types string, date and
> > number.
> > > >
> > > > The events and custom fields for any solr document can easily be in
> > > > hundreds.
> > > >
> > > > We are looking at two different approaches to handle these scenarios:
> > > >
> > > > 1. *Dynamic fields* - Have the fields name start with a particular
> > > pattern
> > > > like for string, the pattern could be like str_* and for event could
> be
> > > > eventid_*
> > > > 2. *Parent/child fields* - This seems to be an overkill for our use
> > case
> > > > since it's more for hierarchical data. Also, the parent and all its
> > > > children need to be reindexed on update which defeats the purpose -
> we
> > > are
> > > > now reindexing multiple docs instead of one with dynamic fields. But
> it
> > > > allows us to store custom field name along with its value unlike
> > dynamic
> > > > fields where we will have to map user supplied custom field to some
> > other
> > > > name based on type.
> > > >
> > > > Has anyone handled similar scenarios with Solr? If so, which approach
> > > would
> > > > you recommend based on your experience?
> > > >
> > > > We are using solr 6.6
> > > >
> > > > Thanks,
> > > > Saurabh
> > >
> >
> >
> >
> > --
> > Saurabh Sethi
> > Principal Engineer I | Engineering
> >
>



-- 
Saurabh Sethi
Principal Engineer I | Engineering

Re: Dynamic fields vs parent child

Posted by Susheel Kumar <su...@gmail.com>.
Can you describe your use case in terms of what business functionality you
are looking to achieve.

Thanks,
Susheel

On Mon, Jun 26, 2017 at 4:26 PM, Saurabh Sethi <sa...@sendgrid.com>
wrote:

> Number of dynamic fields will be in thousands (millions of users +
> thousands of events shared between subsets of users).
>
> We also thought about indexing in one field with value being
> fieldname_fieldvalue. Since we support range queries for dates and numbers,
> it won't work out of box.
>
> On Mon, Jun 26, 2017 at 1:05 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
> > How many distinct fields do you expect across _all_ documents? That
> > is, if doc1 has 10 dynamic fields and doc2 has 10 dynamic fields, will
> > there be exactly 10 fields total or more than 10 when you consider
> > both documents?
> >
> > 100s of fields total across all documents is a tractable problem.
> > thousands of dynamic fields total is going to be a problem.
> >
> > One technique that people do use is to index one field with a prefix
> > rather than N dynamic fields. So you have something like
> > dyn1_val1
> > dyn1_val2
> > dyn4_val67
> >
> > Only really works with string fields of course.
> >
> > Best,
> > Erick
> >
> > On Mon, Jun 26, 2017 at 10:11 AM, Saurabh Sethi
> > <sa...@sendgrid.com> wrote:
> > > We have two requirements:
> > >
> > > 1. Indexing and storing event id and its timestamp.
> > > 2. Indexing and storing custom field name and value. The fields can be
> of
> > > any type, but for now lets say they are of types string, date and
> number.
> > >
> > > The events and custom fields for any solr document can easily be in
> > > hundreds.
> > >
> > > We are looking at two different approaches to handle these scenarios:
> > >
> > > 1. *Dynamic fields* - Have the fields name start with a particular
> > pattern
> > > like for string, the pattern could be like str_* and for event could be
> > > eventid_*
> > > 2. *Parent/child fields* - This seems to be an overkill for our use
> case
> > > since it's more for hierarchical data. Also, the parent and all its
> > > children need to be reindexed on update which defeats the purpose - we
> > are
> > > now reindexing multiple docs instead of one with dynamic fields. But it
> > > allows us to store custom field name along with its value unlike
> dynamic
> > > fields where we will have to map user supplied custom field to some
> other
> > > name based on type.
> > >
> > > Has anyone handled similar scenarios with Solr? If so, which approach
> > would
> > > you recommend based on your experience?
> > >
> > > We are using solr 6.6
> > >
> > > Thanks,
> > > Saurabh
> >
>
>
>
> --
> Saurabh Sethi
> Principal Engineer I | Engineering
>

Re: Dynamic fields vs parent child

Posted by Saurabh Sethi <sa...@sendgrid.com>.
Number of dynamic fields will be in thousands (millions of users +
thousands of events shared between subsets of users).

We also thought about indexing in one field with value being
fieldname_fieldvalue. Since we support range queries for dates and numbers,
it won't work out of box.

On Mon, Jun 26, 2017 at 1:05 PM, Erick Erickson <er...@gmail.com>
wrote:

> How many distinct fields do you expect across _all_ documents? That
> is, if doc1 has 10 dynamic fields and doc2 has 10 dynamic fields, will
> there be exactly 10 fields total or more than 10 when you consider
> both documents?
>
> 100s of fields total across all documents is a tractable problem.
> thousands of dynamic fields total is going to be a problem.
>
> One technique that people do use is to index one field with a prefix
> rather than N dynamic fields. So you have something like
> dyn1_val1
> dyn1_val2
> dyn4_val67
>
> Only really works with string fields of course.
>
> Best,
> Erick
>
> On Mon, Jun 26, 2017 at 10:11 AM, Saurabh Sethi
> <sa...@sendgrid.com> wrote:
> > We have two requirements:
> >
> > 1. Indexing and storing event id and its timestamp.
> > 2. Indexing and storing custom field name and value. The fields can be of
> > any type, but for now lets say they are of types string, date and number.
> >
> > The events and custom fields for any solr document can easily be in
> > hundreds.
> >
> > We are looking at two different approaches to handle these scenarios:
> >
> > 1. *Dynamic fields* - Have the fields name start with a particular
> pattern
> > like for string, the pattern could be like str_* and for event could be
> > eventid_*
> > 2. *Parent/child fields* - This seems to be an overkill for our use case
> > since it's more for hierarchical data. Also, the parent and all its
> > children need to be reindexed on update which defeats the purpose - we
> are
> > now reindexing multiple docs instead of one with dynamic fields. But it
> > allows us to store custom field name along with its value unlike dynamic
> > fields where we will have to map user supplied custom field to some other
> > name based on type.
> >
> > Has anyone handled similar scenarios with Solr? If so, which approach
> would
> > you recommend based on your experience?
> >
> > We are using solr 6.6
> >
> > Thanks,
> > Saurabh
>



-- 
Saurabh Sethi
Principal Engineer I | Engineering

Re: Dynamic fields vs parent child

Posted by Erick Erickson <er...@gmail.com>.
How many distinct fields do you expect across _all_ documents? That
is, if doc1 has 10 dynamic fields and doc2 has 10 dynamic fields, will
there be exactly 10 fields total or more than 10 when you consider
both documents?

100s of fields total across all documents is a tractable problem.
thousands of dynamic fields total is going to be a problem.

One technique that people do use is to index one field with a prefix
rather than N dynamic fields. So you have something like
dyn1_val1
dyn1_val2
dyn4_val67

Only really works with string fields of course.

Best,
Erick

On Mon, Jun 26, 2017 at 10:11 AM, Saurabh Sethi
<sa...@sendgrid.com> wrote:
> We have two requirements:
>
> 1. Indexing and storing event id and its timestamp.
> 2. Indexing and storing custom field name and value. The fields can be of
> any type, but for now lets say they are of types string, date and number.
>
> The events and custom fields for any solr document can easily be in
> hundreds.
>
> We are looking at two different approaches to handle these scenarios:
>
> 1. *Dynamic fields* - Have the fields name start with a particular pattern
> like for string, the pattern could be like str_* and for event could be
> eventid_*
> 2. *Parent/child fields* - This seems to be an overkill for our use case
> since it's more for hierarchical data. Also, the parent and all its
> children need to be reindexed on update which defeats the purpose - we are
> now reindexing multiple docs instead of one with dynamic fields. But it
> allows us to store custom field name along with its value unlike dynamic
> fields where we will have to map user supplied custom field to some other
> name based on type.
>
> Has anyone handled similar scenarios with Solr? If so, which approach would
> you recommend based on your experience?
>
> We are using solr 6.6
>
> Thanks,
> Saurabh