You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by lee carroll <le...@googlemail.com> on 2011/06/10 18:54:21 UTC

Document has fields with different update frequencies: how best to model

Hi,
We have a document type which has fields which are pretty static. Say
they change once every 6 month. But the same document has a field
which changes hourly
What are the best approaches to index this document ?

Eg
Hotel ID (static) , Hotel Description (static and costly to get from a
url etc), FromPrice (changes hourly)

Option 1
Index hourly as a single document and don't worry about the unneeded
field updates

Option 2
Split into 2 document types and index independently. This would
require the front end application to query multiple times?
doc1
ID,Description,DocType
doc2
ID,HotelID,Price,DocType

application performs searches based on hotel attributes
for each hotel match issue query to get price


Any other options ? Can you query across documents ?

We run 1.4.1, we could maybe update to 3.2 but I don't think I could
swing to trunk for JOIN feature (if that indeed is JOIN's use case)

Thanks in advance

PS Am I just worrying about de-normalised data and should sort the
source data out maybe by caching and get over it ...?

cheers Lee c

Re: Document has fields with different update frequencies: how best to model

Posted by lee carroll <le...@googlemail.com>.
Thanks Jay for the quick reply.

Maybe we can set up a dev env with trunk and use JOIN.

Is JOIN a good use case for this ?

On 11 June 2011 15:28, Jay Luker <lb...@reallywow.com> wrote:
> You are correct that ExternalFileField values can only be used in
> query functions (i.e. scoring, basically). Sorry for firing off that
> answer without reading your use case more carefully.
>
> I'd be inclined towards giving your Option #1 a try, but that's
> without knowing much about the scale of your app, size of your index,
> documents, etc. Unneeded field updates are only a problem if they're
> causing performance problems, right? Otherwise, trying to avoid seems
> like premature optimization.
>
> --jay
>
> On Sat, Jun 11, 2011 at 5:26 AM, lee carroll
> <le...@googlemail.com> wrote:
>> Hi Jay
>> I thought external file field could not be returned as a field but
>> only used in scoring.
>> trunk has pseudo field which can take a function value but we cant
>> move to trunk.
>>
>> also its a more general question around schema design, what happens if
>> you have several fields with different update frequencies. It does not
>> seem external file field is the use case for this.
>>
>>
>>
>> On 10 June 2011 20:13, Jay Luker <lb...@reallywow.com> wrote:
>>> Take a look at ExternalFileField [1]. It's meant for exactly what you
>>> want to do here.
>>>
>>> FYI, there is an issue with caching of the external values introduced
>>> in v1.4 but, thankfully, resolved in v3.2 [2]
>>>
>>> --jay
>>>
>>> [1] http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
>>> [2] https://issues.apache.org/jira/browse/SOLR-2536
>>>
>>>
>>> On Fri, Jun 10, 2011 at 12:54 PM, lee carroll
>>> <le...@googlemail.com> wrote:
>>>> Hi,
>>>> We have a document type which has fields which are pretty static. Say
>>>> they change once every 6 month. But the same document has a field
>>>> which changes hourly
>>>> What are the best approaches to index this document ?
>>>>
>>>> Eg
>>>> Hotel ID (static) , Hotel Description (static and costly to get from a
>>>> url etc), FromPrice (changes hourly)
>>>>
>>>> Option 1
>>>> Index hourly as a single document and don't worry about the unneeded
>>>> field updates
>>>>
>>>> Option 2
>>>> Split into 2 document types and index independently. This would
>>>> require the front end application to query multiple times?
>>>> doc1
>>>> ID,Description,DocType
>>>> doc2
>>>> ID,HotelID,Price,DocType
>>>>
>>>> application performs searches based on hotel attributes
>>>> for each hotel match issue query to get price
>>>>
>>>>
>>>> Any other options ? Can you query across documents ?
>>>>
>>>> We run 1.4.1, we could maybe update to 3.2 but I don't think I could
>>>> swing to trunk for JOIN feature (if that indeed is JOIN's use case)
>>>>
>>>> Thanks in advance
>>>>
>>>> PS Am I just worrying about de-normalised data and should sort the
>>>> source data out maybe by caching and get over it ...?
>>>>
>>>> cheers Lee c
>>>>
>>>
>>
>

Re: Document has fields with different update frequencies: how best to model

Posted by Jay Luker <lb...@reallywow.com>.
You are correct that ExternalFileField values can only be used in
query functions (i.e. scoring, basically). Sorry for firing off that
answer without reading your use case more carefully.

I'd be inclined towards giving your Option #1 a try, but that's
without knowing much about the scale of your app, size of your index,
documents, etc. Unneeded field updates are only a problem if they're
causing performance problems, right? Otherwise, trying to avoid seems
like premature optimization.

--jay

On Sat, Jun 11, 2011 at 5:26 AM, lee carroll
<le...@googlemail.com> wrote:
> Hi Jay
> I thought external file field could not be returned as a field but
> only used in scoring.
> trunk has pseudo field which can take a function value but we cant
> move to trunk.
>
> also its a more general question around schema design, what happens if
> you have several fields with different update frequencies. It does not
> seem external file field is the use case for this.
>
>
>
> On 10 June 2011 20:13, Jay Luker <lb...@reallywow.com> wrote:
>> Take a look at ExternalFileField [1]. It's meant for exactly what you
>> want to do here.
>>
>> FYI, there is an issue with caching of the external values introduced
>> in v1.4 but, thankfully, resolved in v3.2 [2]
>>
>> --jay
>>
>> [1] http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
>> [2] https://issues.apache.org/jira/browse/SOLR-2536
>>
>>
>> On Fri, Jun 10, 2011 at 12:54 PM, lee carroll
>> <le...@googlemail.com> wrote:
>>> Hi,
>>> We have a document type which has fields which are pretty static. Say
>>> they change once every 6 month. But the same document has a field
>>> which changes hourly
>>> What are the best approaches to index this document ?
>>>
>>> Eg
>>> Hotel ID (static) , Hotel Description (static and costly to get from a
>>> url etc), FromPrice (changes hourly)
>>>
>>> Option 1
>>> Index hourly as a single document and don't worry about the unneeded
>>> field updates
>>>
>>> Option 2
>>> Split into 2 document types and index independently. This would
>>> require the front end application to query multiple times?
>>> doc1
>>> ID,Description,DocType
>>> doc2
>>> ID,HotelID,Price,DocType
>>>
>>> application performs searches based on hotel attributes
>>> for each hotel match issue query to get price
>>>
>>>
>>> Any other options ? Can you query across documents ?
>>>
>>> We run 1.4.1, we could maybe update to 3.2 but I don't think I could
>>> swing to trunk for JOIN feature (if that indeed is JOIN's use case)
>>>
>>> Thanks in advance
>>>
>>> PS Am I just worrying about de-normalised data and should sort the
>>> source data out maybe by caching and get over it ...?
>>>
>>> cheers Lee c
>>>
>>
>

Re: Document has fields with different update frequencies: how best to model

Posted by lee carroll <le...@googlemail.com>.
Hi Jay
I thought external file field could not be returned as a field but
only used in scoring.
trunk has pseudo field which can take a function value but we cant
move to trunk.

also its a more general question around schema design, what happens if
you have several fields with different update frequencies. It does not
seem external file field is the use case for this.



On 10 June 2011 20:13, Jay Luker <lb...@reallywow.com> wrote:
> Take a look at ExternalFileField [1]. It's meant for exactly what you
> want to do here.
>
> FYI, there is an issue with caching of the external values introduced
> in v1.4 but, thankfully, resolved in v3.2 [2]
>
> --jay
>
> [1] http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
> [2] https://issues.apache.org/jira/browse/SOLR-2536
>
>
> On Fri, Jun 10, 2011 at 12:54 PM, lee carroll
> <le...@googlemail.com> wrote:
>> Hi,
>> We have a document type which has fields which are pretty static. Say
>> they change once every 6 month. But the same document has a field
>> which changes hourly
>> What are the best approaches to index this document ?
>>
>> Eg
>> Hotel ID (static) , Hotel Description (static and costly to get from a
>> url etc), FromPrice (changes hourly)
>>
>> Option 1
>> Index hourly as a single document and don't worry about the unneeded
>> field updates
>>
>> Option 2
>> Split into 2 document types and index independently. This would
>> require the front end application to query multiple times?
>> doc1
>> ID,Description,DocType
>> doc2
>> ID,HotelID,Price,DocType
>>
>> application performs searches based on hotel attributes
>> for each hotel match issue query to get price
>>
>>
>> Any other options ? Can you query across documents ?
>>
>> We run 1.4.1, we could maybe update to 3.2 but I don't think I could
>> swing to trunk for JOIN feature (if that indeed is JOIN's use case)
>>
>> Thanks in advance
>>
>> PS Am I just worrying about de-normalised data and should sort the
>> source data out maybe by caching and get over it ...?
>>
>> cheers Lee c
>>
>

Re: Document has fields with different update frequencies: how best to model

Posted by Jay Luker <lb...@reallywow.com>.
Take a look at ExternalFileField [1]. It's meant for exactly what you
want to do here.

FYI, there is an issue with caching of the external values introduced
in v1.4 but, thankfully, resolved in v3.2 [2]

--jay

[1] http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
[2] https://issues.apache.org/jira/browse/SOLR-2536


On Fri, Jun 10, 2011 at 12:54 PM, lee carroll
<le...@googlemail.com> wrote:
> Hi,
> We have a document type which has fields which are pretty static. Say
> they change once every 6 month. But the same document has a field
> which changes hourly
> What are the best approaches to index this document ?
>
> Eg
> Hotel ID (static) , Hotel Description (static and costly to get from a
> url etc), FromPrice (changes hourly)
>
> Option 1
> Index hourly as a single document and don't worry about the unneeded
> field updates
>
> Option 2
> Split into 2 document types and index independently. This would
> require the front end application to query multiple times?
> doc1
> ID,Description,DocType
> doc2
> ID,HotelID,Price,DocType
>
> application performs searches based on hotel attributes
> for each hotel match issue query to get price
>
>
> Any other options ? Can you query across documents ?
>
> We run 1.4.1, we could maybe update to 3.2 but I don't think I could
> swing to trunk for JOIN feature (if that indeed is JOIN's use case)
>
> Thanks in advance
>
> PS Am I just worrying about de-normalised data and should sort the
> source data out maybe by caching and get over it ...?
>
> cheers Lee c
>