You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by britske <gb...@gmail.com> on 2012/12/11 15:00:03 UTC

modeling prices based on daterange using multipoints

HI all, 

Based on some good discussion in 
Modeling openinghours using multipoints
<http://lucene.472066.n3.nabble.com/Modeling-openinghours-using-multipoints-tp4025336p4025683.html>  
I was triggered to have a review of an old painpoint of mine: modeling
pricing & availability of hotels which depends on a couple of factors
including, date or arrival, length of stay & roomtype. 

This question is to see if it would be possible to model the above using
multipoints (os ome other technique I'm not aware of that's been come into
existence in Lucene / Solr in the last 2 years or so. 

Let me explain: Hotels (in my implementation) have pricing & availability
based on: date, duration, nr of persons, roomtype (e.g.: single, double,
twin, triple, family). Instead of modeling these as separate documents,
currently I model 1 doc per hotel where each
<date*duration*persons*roomtype> combo has each own price and is modeled as
a separate field:  (configured in backend as dynamic fields: ddp-*)..
Non-availability is just modeled as the absence of the particular field. 

The advantage of modeling 1 doc per hotel is clear: users have no chance of
seeing multiple offers per hotel in the frontend. It's just how they have
become accustomed to these type of travel/ hotel searchengines. 

Now there's also a big diadvantage of my current setup: Lucene/Solr just
isn't really build for having 20.000+ fields on which can be sorted and
filtered on. (Could go into this, but it's not really the point of this
question) 

I realize the new spatial-stuff in Solr 4 is no magic bullet, but I'm
wondering if I could model multiple prices per day as multipoints, whereas: 

 - date*duration*nr of persons*roomtype is modeled as point.x (discretized
in some 20.000 values) 
 - price modeled as point.y ( in dollarcents / normalized as avg price per
day: range:  [0,200000] covering a max price of $2.000/day) 

The stuff that needs to be possible: 
 A) 1 required filter on point.x (filtering a 1 particular <date*duration*nr
of persons* roomtype> combo.
 B) an optional range query on point.y (min and./or max price filter)
 C) optional soring on point.y (sorting on price (normal or reverse))

I'm pretty certain A) and B) won't be a problem as far is functionality is
concerned, but how about performance? I.e: would some sort of cached Solr
filter jump in for a given <date*duration*nr of persons* roomtype> combo,
for quick doc-interesection, just as would with multiple dynamic fields in
my desribed as-is-case?

How about C)? Is sorting on point.y possible? (potenially in conjunction
with other sorting-fields used as tiebreaker, to give a stable sort? I
remember to have read that any filterquery can be used for sorting combined
with multipoints (which would make the above work I guess) but just would
like to confirm. 

Looking forward to your feedback, 

Best, 
Geert-Jan








--
View this message in context: http://lucene.472066.n3.nabble.com/modeling-prices-based-on-daterange-using-multipoints-tp4026011.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: modeling prices based on daterange using multipoints

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.

britske wrote
>> Ah; ok.  But still, my first suggestion is still what I think you could
>> do
>> except that the algorithm is simpler -- return the first matching 'y' in
>> the
>> document where the point matches the query.  Alternatively, if you're
>> confident the number of matching documents (hotels) is going to be
>> small-ish, say less than a couple hundred, then you could simply sort it
>> client-side.  You'd have to get back all the values, or maybe write a
>> DocTransformer to find the specific one.
>>
>> ~ David
>>
>>
> Writing something similar to ShapeFieldCacheDistanceValueSource, being a
> valueSource, would enable me to expose it by name to the frontend?
> What I'm saying is: let's say I want to call this implementation
> 'pricesort' and chain it with other sorts, like: 'sort=pricesort asc,
> popularity desc, name asc'. Or use it by name in a functionquery. That
> would be possible right?
> 
> Geert-Jan

It wouldn't quite work this way.  The Solr adapters to Lucene spatial can't
simply have a field expose a ValueSource because it needs to be configured
with the search parameters (e.g. the query center point).  See:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4#Sorting_and_Relevancy
and in particular the sort=query(...) part.   The wiki shows 2 ways, this
way and the other way when q= the spatial query then you simply do score
sorting.

~ David



-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/modeling-prices-based-on-daterange-using-multipoints-tp4026011p4026456.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: modeling prices based on daterange using multipoints

Posted by Geert-Jan Brits <gb...@gmail.com>.

2012/12/12 David Smiley (@MITRE.org) <DS...@mitre.org>

> britske wrote
> > Hi David,
> >
> > Yeah interesting (as well as problematic as far is implementing) use-case
> > indeed :)
> >
> > 1. You mention "there are no special caches / memory requirements
> inherent
> > in this.". For a given user-query this would mean all hotels would have
> to
> > seach for all point.x each time right? What would be a good plugin-point
> > to
> > build in some custom cached filter code for this (perhaps using the Solr
> > Filter cache)? As I see it, determining all hotels that have a particular
> > point.x value is probably: A) pretty costly to do on each user query. B).
> > is static and can be cached easily without a lot of memory (relatively
> > speaking) i.e: 20.000 filters (representing all of the 20.000 different
> > point.x, that is, &lt;date,duration,nr persons, roomtype&gt; combos) with
> > a
> > bitset per filter  representing ids of hotels that have the said point.x.
>
> I think you're over-thinking the complexity of this query.  I bet it's
> faster than you think and even then putting this in a filter query 'fq' is
> going to be cached by Solr any way, making it lightning fast at subsequent
> queries.
>
>
Ah! Didn't realize such a spatial query could be dropped in a FQ. Nice,
that solves this part indeed.


>  britske wrote
> > 2. I'm not sure I explained C. (sorting) well, since I believe you're
> > talking about implementing custom code to sort multiple point.y's per
> > hotel, correct?. That's not what I need. Instead, for every user-query at
> > most 1 point ever matches. I.e: a hotel has a price for a particular
> > &lt;date,
> > duration,nrpersons,roomtype&gt;-combo (P.x) or it hasn't.
> >
> > Say a user queries for the
> &lt;date,duration,nrpersons,roomtype&gt;-combo:
> > <21
> > dec 2012,3 days,2 persons, double>. This might be encoded into a value,
> > say: 12345.
> > Now, for the hotels that do match that query (i.e: those hotels that have
> > a
> > point P for which P.x=12345) I want to sort those hotels on P.y (the
> price
> > for the requested P.x)
>
> Ah; ok.  But still, my first suggestion is still what I think you could do
> except that the algorithm is simpler -- return the first matching 'y' in
> the
> document where the point matches the query.  Alternatively, if you're
> confident the number of matching documents (hotels) is going to be
> small-ish, say less than a couple hundred, then you could simply sort it
> client-side.  You'd have to get back all the values, or maybe write a
> DocTransformer to find the specific one.
>
> ~ David
>
>
Writing something similar to ShapeFieldCacheDistanceValueSource, being a
valueSource, would enable me to expose it by name to the frontend?
What I'm saying is: let's say I want to call this implementation
'pricesort' and chain it with other sorts, like: 'sort=pricesort asc,
popularity desc, name asc'. Or use it by name in a functionquery. That
would be possible right?

Geert-Jan


>
> -----
>  Author:
> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/modeling-prices-based-on-daterange-using-multipoints-tp4026011p4026256.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: modeling prices based on daterange using multipoints

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.

britske wrote
> Hi David,
> 
> Yeah interesting (as well as problematic as far is implementing) use-case
> indeed :)
> 
> 1. You mention "there are no special caches / memory requirements inherent
> in this.". For a given user-query this would mean all hotels would have to
> seach for all point.x each time right? What would be a good plugin-point
> to
> build in some custom cached filter code for this (perhaps using the Solr
> Filter cache)? As I see it, determining all hotels that have a particular
> point.x value is probably: A) pretty costly to do on each user query. B).
> is static and can be cached easily without a lot of memory (relatively
> speaking) i.e: 20.000 filters (representing all of the 20.000 different
> point.x, that is, &lt;date,duration,nr persons, roomtype&gt; combos) with
> a
> bitset per filter  representing ids of hotels that have the said point.x.

I think you're over-thinking the complexity of this query.  I bet it's
faster than you think and even then putting this in a filter query 'fq' is
going to be cached by Solr any way, making it lightning fast at subsequent
queries.


britske wrote
> 2. I'm not sure I explained C. (sorting) well, since I believe you're
> talking about implementing custom code to sort multiple point.y's per
> hotel, correct?. That's not what I need. Instead, for every user-query at
> most 1 point ever matches. I.e: a hotel has a price for a particular
> &lt;date,
> duration,nrpersons,roomtype&gt;-combo (P.x) or it hasn't.
> 
> Say a user queries for the &lt;date,duration,nrpersons,roomtype&gt;-combo:
> <21
> dec 2012,3 days,2 persons, double>. This might be encoded into a value,
> say: 12345.
> Now, for the hotels that do match that query (i.e: those hotels that have
> a
> point P for which P.x=12345) I want to sort those hotels on P.y (the price
> for the requested P.x)

Ah; ok.  But still, my first suggestion is still what I think you could do
except that the algorithm is simpler -- return the first matching 'y' in the
document where the point matches the query.  Alternatively, if you're
confident the number of matching documents (hotels) is going to be
small-ish, say less than a couple hundred, then you could simply sort it
client-side.  You'd have to get back all the values, or maybe write a
DocTransformer to find the specific one.

~ David



-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/modeling-prices-based-on-daterange-using-multipoints-tp4026011p4026256.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: modeling prices based on daterange using multipoints

Posted by britske <gb...@gmail.com>.

Hi David,

Yeah interesting (as well as problematic as far is implementing) use-case
indeed :)

1. You mention "there are no special caches / memory requirements inherent
in this.". For a given user-query this would mean all hotels would have to
seach for all point.x each time right? What would be a good plugin-point to
build in some custom cached filter code for this (perhaps using the Solr
Filter cache)? As I see it, determining all hotels that have a particular
point.x value is probably: A) pretty costly to do on each user query. B).
is static and can be cached easily without a lot of memory (relatively
speaking) i.e: 20.000 filters (representing all of the 20.000 different
point.x, that is, <date,duration,nr persons, roomtype> combos) with a
bitset per filter  representing ids of hotels that have the said point.x.

2. I'm not sure I explained C. (sorting) well, since I believe you're
talking about implementing custom code to sort multiple point.y's per
hotel, correct?. That's not what I need. Instead, for every user-query at
most 1 point ever matches. I.e: a hotel has a price for a particular <date,
duration,nrpersons,roomtype>-combo (P.x) or it hasn't.

Say a user queries for the <date,duration,nrpersons,roomtype>-combo: <21
dec 2012,3 days,2 persons, double>. This might be encoded into a value,
say: 12345.
Now, for the hotels that do match that query (i.e: those hotels that have a
point P for which P.x=12345) I want to sort those hotels on P.y (the price
for the requested P.x)

Geert-Jan




2012/12/11 David Smiley (@MITRE.org) [via Lucene] <
ml-node+s472066n4026151h71@n3.nabble.com>

> Hi Britske,
>   This is a very interesting question!
>
> britske wrote
> ...
> I realize the new spatial-stuff in Solr 4 is no magic bullet, but I'm
> wondering if I could model multiple prices per day as multipoints, whereas:
>
>  - date*duration*nr of persons*roomtype is modeled as point.x (discretized
> in some 20.000 values)
>  - price modeled as point.y ( in dollarcents / normalized as avg price per
> day: range:  [0,200000] covering a max price of $2.000/day)
>
> The stuff that needs to be possible:
>  A) 1 required filter on point.x (filtering a 1 particular
> <date*duration*nr of persons* roomtype> combo.
>  B) an optional range query on point.y (min and./or max price filter)
>  C) optional soring on point.y (sorting on price (normal or reverse))
>
> I'm pretty certain A) and B) won't be a problem as far is functionality is
> concerned, but how about performance? I.e: would some sort of cached Solr
> filter jump in for a given <date*duration*nr of persons* roomtype> combo,
> for quick doc-interesection, just as would with multiple dynamic fields in
> my desribed as-is-case?
>
> A & B are indeed not a problem and there are no special caches / memory
> requirements inherent in this.
>
> britske wrote
> How about C)? Is sorting on point.y possible? (potenially in conjunction
> with other sorting-fields used as tiebreaker, to give a stable sort? I
> remember to have read that any filterquery can be used for sorting combined
> with multipoints (which would make the above work I guess) but just would
> like to confirm.
> ...
>
> 'C' (sorting) is the challenge.  As it stands, you will have to implement
> a variation of this class:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/spatial/src/java/org/apache/lucene/spatial/util/ShapeFieldCacheDistanceValueSource.java?view=markup
> Unlike this implementation, your implementation should  ensure the point is
> indeed in the query shape, and it should be configured to take the smallest
> or largest 'y' as desired.  Note that the cache infrastructure that this is
> built on is flakey right now -- a memory hog in multiple ways.  There will
> be a Point implementation in memory for all of your indexed points, and an
> ArrayList per doc.  And it's not NRT search friendly, and doesn't
> relinquish its resources (i.e. on commit) as quickly as it should.  I know
> what it's problems are but I have been quite busy.
>
> ~ David
>  Author:
> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/modeling-prices-based-on-daterange-using-multipoints-tp4026011p4026151.html
>  To unsubscribe from modeling prices based on daterange using multipoints, click
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4026011&code=Z2JyaXRzQGdtYWlsLmNvbXw0MDI2MDExfDExNjk3MTIyNTA=>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://lucene.472066.n3.nabble.com/modeling-prices-based-on-daterange-using-multipoints-tp4026011p4026169.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: modeling prices based on daterange using multipoints

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.

Hi Britske,
  This is a very interesting question!


britske wrote
> ...
> I realize the new spatial-stuff in Solr 4 is no magic bullet, but I'm
> wondering if I could model multiple prices per day as multipoints,
> whereas: 
> 
>  - date*duration*nr of persons*roomtype is modeled as point.x (discretized
> in some 20.000 values) 
>  - price modeled as point.y ( in dollarcents / normalized as avg price per
> day: range:  [0,200000] covering a max price of $2.000/day) 
> 
> The stuff that needs to be possible: 
>  A) 1 required filter on point.x (filtering a 1 particular
> &lt;date*duration*nr of persons* roomtype&gt; combo.
>  B) an optional range query on point.y (min and./or max price filter)
>  C) optional soring on point.y (sorting on price (normal or reverse))
> 
> I'm pretty certain A) and B) won't be a problem as far is functionality is
> concerned, but how about performance? I.e: would some sort of cached Solr
> filter jump in for a given &lt;date*duration*nr of persons* roomtype&gt;
> combo, for quick doc-interesection, just as would with multiple dynamic
> fields in my desribed as-is-case?

A & B are indeed not a problem and there are no special caches / memory
requirements inherent in this.


britske wrote
> How about C)? Is sorting on point.y possible? (potenially in conjunction
> with other sorting-fields used as tiebreaker, to give a stable sort? I
> remember to have read that any filterquery can be used for sorting
> combined with multipoints (which would make the above work I guess) but
> just would like to confirm. 
> ...

'C' (sorting) is the challenge.  As it stands, you will have to implement a
variation of this class: 
http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/spatial/src/java/org/apache/lucene/spatial/util/ShapeFieldCacheDistanceValueSource.java?view=markup 
Unlike this implementation, your implementation should  ensure the point is
indeed in the query shape, and it should be configured to take the smallest
or largest 'y' as desired.  Note that the cache infrastructure that this is
built on is flakey right now -- a memory hog in multiple ways.  There will
be a Point implementation in memory for all of your indexed points, and an
ArrayList per doc.  And it's not NRT search friendly, and doesn't relinquish
its resources (i.e. on commit) as quickly as it should.  I know what it's
problems are but I have been quite busy.  

~ David



-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/modeling-prices-based-on-daterange-using-multipoints-tp4026011p4026151.html
Sent from the Solr - User mailing list archive at Nabble.com.