You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark Schoy <he...@gmx.de> on 2011/12/02 12:21:43 UTC

Best practise to automatically change a field value for a specific period of time

Hi,

I have an solr index for an online shop with a field "price" which
contains the standard price of a product.
But in the database, the shop owner can specify a period of time with
an alternative price.

For example: standard price is $20.00, but 12/24/11 08:00am to
12/26/11 11:59pm = $12.59

Of course I could use an cronjob to updating the documents. But I
think this is too unstable.
I also could save all price campaigns in a field an then extracting
the correct price. But then I could not sort by price or only by the
standard price.

What I need is an field where I can put a condition like that: if
[current_time between one of the price campains] then [return price of
price campaign]. But (unfortunately) this is not possible.

Thanks for advice.

Re: Best practise to automatically change a field value for a specific period of time

Posted by da...@ontrenet.com.
Solr doesn't support these kind of business rules inside of it. Not
intended to.

Thusly, you will have to manage them externally. What's unstable about a
cronjob?

You will have to run your business rules externally, then apply the
necessary
field updates to the documents in Solr, ensuring the doc id's remain the
same
so the updates overwrite the previous document.

On Fri, 2 Dec 2011 12:21:43 +0100, Mark Schoy <he...@gmx.de> wrote:
> Hi,
> 
> I have an solr index for an online shop with a field "price" which
> contains the standard price of a product.
> But in the database, the shop owner can specify a period of time with
> an alternative price.
> 
> For example: standard price is $20.00, but 12/24/11 08:00am to
> 12/26/11 11:59pm = $12.59
> 
> Of course I could use an cronjob to updating the documents. But I
> think this is too unstable.
> I also could save all price campaigns in a field an then extracting
> the correct price. But then I could not sort by price or only by the
> standard price.
> 
> What I need is an field where I can put a condition like that: if
> [current_time between one of the price campains] then [return price of
> price campaign]. But (unfortunately) this is not possible.
> 
> Thanks for advice.

Re: Multivalued field

Posted by Erick Erickson <er...@gmail.com>.
<field name="id" type="string" stored="true" indexed="true" required="true" />
<field name="data" type="text_en" stored="true" indexed="false" />


Then sometime later
<uniqueKey>id</uniqueKey>

(all this in your schema.xml file).

That's it. The data field isn't analyzed at all, so the type is largely
irrelevant. what you put in it is all your pairs of doubles in some
kind of delimited format, e.g. 2345.0,<timestamp> | 873945.7,<timestamp>
Now you just get your data field back, split it up and go.

Getting the report document will be about as fast as anything you could
do in Solr, lookup by what is essentially the primary key.

Updating your reports is just re-indexing (use the timestamp in your
DB) and it'll automatically replace documents with the same
id.

You *might* be able to use the "binary" type, but that's base64 encoded
so whether it would be faster than parsing your pairs from text
is an open question.

But what's really unclear is how ginormous your double/timestamp pairs
are. If you're pulling a billion pairs out, Solr performance won't be
your problem <G>....

Best
Erick


On Mon, Dec 5, 2011 at 2:24 PM, Alan Miller <al...@gmail.com> wrote:
>
> I know I'm using SolR for a task that is better suited for the DB to handle but I'm
> doing this for reasons related to the overall design of my system. My DB is going to
> become very large over time and it is constantly being  updated via Hadoop jobs that
> collect,analyze some data and generate the final (report) results.
>
> The front end web-app needs to be VERY fast and only needs access to a subset of the data.
> It also let's us decouple the state of the DB and the front end, ie we can control when we sync
> the data from the DB to the SolR indexes.
> You could say I'm using SolR as an in memory cache of my DB indexes.
>
> We're also a small team and all our development is in java Hadoop, GWT so it was very
> easy for us to integrate SolR and Solrj into our app.
>
> If somebody could toss in an example of what the scheme might look like that'd be great.
> I have a very simple VALUE table that has columns:
>     value_pk INTEGER  ; primary-key
>     report_fk INT ; foreign-key to report table
>     tstamp TIMESTAMP
>     value NUMERIC(7,4)
>
> Alan
>
> On Dec 5, 2011, at 14:34, Erick Erickson <er...@gmail.com> wrote:
>
>> Well, Solr is a text search engine, and a good one. But this sure
>> feels like a problem that RDBMSs were built to handle. Why do
>> you want to do this? Is your current performance a problem?
>> Are you blowing your space resources out of the water? Do you
>> want to distribute your app to places not connected to your RDBMS?
>> Is there too much traffic on your RDBMS machine?
>>
>> Something about "if it ain't broke, don't fix it".
>>
>> In general, you have to tell us the problem you're trying to solve
>> so we don't go off into XY land.
>> http://people.apache.org/~hossman/#xyproblem
>>
>> Best
>> Erick
>>
>> On Fri, Dec 2, 2011 at 1:33 PM, Alan Miller <al...@gmail.com> wrote:
>>> Hi I have a webapp that plots a bunch of time series
>>> Data which are just doubles coupled with a timestamp
>>>
>>> Every chart in my webapp has a reportid in my db and i am wondering if it would be effective to usr solr to serve the data th my app instead of keeping the data in my rdbms.
>>>
>>> Currently im using hadoop to calc and generate the report data and the sticking it in my rdbms but i could use solrj client to upload the data to a solr index
>>>
>>> I know solr if for indexing text documents but would it be effective to use solr in this way?
>>>
>>> I want to query by reportid and get back a series of timestamp:double pairs.
>>>
>>> Regards
>>> Alan

Re: Multivalued field

Posted by Alan Miller <al...@gmail.com>.
I know I'm using SolR for a task that is better suited for the DB to handle but I'm
doing this for reasons related to the overall design of my system. My DB is going to
become very large over time and it is constantly being  updated via Hadoop jobs that 
collect,analyze some data and generate the final (report) results.
 
The front end web-app needs to be VERY fast and only needs access to a subset of the data.
It also let's us decouple the state of the DB and the front end, ie we can control when we sync 
the data from the DB to the SolR indexes.
You could say I'm using SolR as an in memory cache of my DB indexes.

We're also a small team and all our development is in java Hadoop, GWT so it was very
easy for us to integrate SolR and Solrj into our app.

If somebody could toss in an example of what the scheme might look like that'd be great.
I have a very simple VALUE table that has columns:
     value_pk INTEGER  ; primary-key
     report_fk INT ; foreign-key to report table
     tstamp TIMESTAMP
     value NUMERIC(7,4)

Alan

On Dec 5, 2011, at 14:34, Erick Erickson <er...@gmail.com> wrote:

> Well, Solr is a text search engine, and a good one. But this sure
> feels like a problem that RDBMSs were built to handle. Why do
> you want to do this? Is your current performance a problem?
> Are you blowing your space resources out of the water? Do you
> want to distribute your app to places not connected to your RDBMS?
> Is there too much traffic on your RDBMS machine?
> 
> Something about "if it ain't broke, don't fix it".
> 
> In general, you have to tell us the problem you're trying to solve
> so we don't go off into XY land.
> http://people.apache.org/~hossman/#xyproblem
> 
> Best
> Erick
> 
> On Fri, Dec 2, 2011 at 1:33 PM, Alan Miller <al...@gmail.com> wrote:
>> Hi I have a webapp that plots a bunch of time series
>> Data which are just doubles coupled with a timestamp
>> 
>> Every chart in my webapp has a reportid in my db and i am wondering if it would be effective to usr solr to serve the data th my app instead of keeping the data in my rdbms.
>> 
>> Currently im using hadoop to calc and generate the report data and the sticking it in my rdbms but i could use solrj client to upload the data to a solr index
>> 
>> I know solr if for indexing text documents but would it be effective to use solr in this way?
>> 
>> I want to query by reportid and get back a series of timestamp:double pairs.
>> 
>> Regards
>> Alan

Re: Multivalued field

Posted by Erick Erickson <er...@gmail.com>.
Well, Solr is a text search engine, and a good one. But this sure
feels like a problem that RDBMSs were built to handle. Why do
you want to do this? Is your current performance a problem?
Are you blowing your space resources out of the water? Do you
want to distribute your app to places not connected to your RDBMS?
Is there too much traffic on your RDBMS machine?

Something about "if it ain't broke, don't fix it".

In general, you have to tell us the problem you're trying to solve
so we don't go off into XY land.
http://people.apache.org/~hossman/#xyproblem

Best
Erick

On Fri, Dec 2, 2011 at 1:33 PM, Alan Miller <al...@gmail.com> wrote:
> Hi I have a webapp that plots a bunch of time series
> Data which are just doubles coupled with a timestamp
>
> Every chart in my webapp has a reportid in my db and i am wondering if it would be effective to usr solr to serve the data th my app instead of keeping the data in my rdbms.
>
> Currently im using hadoop to calc and generate the report data and the sticking it in my rdbms but i could use solrj client to upload the data to a solr index
>
> I know solr if for indexing text documents but would it be effective to use solr in this way?
>
> I want to query by reportid and get back a series of timestamp:double pairs.
>
> Regards
> Alan

Multivalued field

Posted by Alan Miller <al...@gmail.com>.
Hi I have a webapp that plots a bunch of time series
Data which are just doubles coupled with a timestamp

Every chart in my webapp has a reportid in my db and i am wondering if it would be effective to usr solr to serve the data th my app instead of keeping the data in my rdbms. 

Currently im using hadoop to calc and generate the report data and the sticking it in my rdbms but i could use solrj client to upload the data to a solr index

I know solr if for indexing text documents but would it be effective to use solr in this way?

I want to query by reportid and get back a series of timestamp:double pairs.  

Regards
Alan

Re: Best practise to automatically change a field value for a specific period of time

Posted by Michael Kuhlmann <ku...@solarier.de>.
Hi Mark,

I'm sure you can manage this using function queries somehow, but this is 
rather complicated, esp. if you both want to return the price and sort 
on it.

I'd rather update the index as soon as a campaign starts or ends. At 
least that's how we did it when I worked for online shops. Normally this 
isn't a matter of seconds, and you would need to update Solr anyway when 
you create such a campaign.

As a benefit, you're not limited in the number of running campaigns (at 
least not on the Solr side). Maybe you want to plan a campaign when the 
current one hasn't ended yet, which would be (nearly) impossible when 
you calculate the price at query time.

Greetings,
Kuli

Am 02.12.2011 12:21, schrieb Mark Schoy:
> Hi,
>
> I have an solr index for an online shop with a field "price" which
> contains the standard price of a product.
> But in the database, the shop owner can specify a period of time with
> an alternative price.
>
> For example: standard price is $20.00, but 12/24/11 08:00am to
> 12/26/11 11:59pm = $12.59
>
> Of course I could use an cronjob to updating the documents. But I
> think this is too unstable.
> I also could save all price campaigns in a field an then extracting
> the correct price. But then I could not sort by price or only by the
> standard price.
>
> What I need is an field where I can put a condition like that: if
> [current_time between one of the price campains] then [return price of
> price campaign]. But (unfortunately) this is not possible.
>
> Thanks for advice.


Re: Best practise to automatically change a field value for a specific period of time

Posted by Mathias Hodler <ma...@gmail.com>.
Hi Morten,

thanks, this is a very good solution.

I also found another solution:
Creating a custom ValueSourceParser for price sorting which considered
the standard price and the campaign price.

In my special case I think your approach isn't working, because i also
need result grouping and this cant be combined with field collapsing.

2011/12/2 Morten Lied Johansen <mo...@ifi.uio.no>:
>
> This is a problem that can be solved with grouping.
> http://wiki.apache.org/solr/FieldCollapsing
>
> For each possible price on a product, you index a document with the dates
> and the price. In your query, you group on the product, and apply a
> date-filter, and the price you see for each product will be from the top
> document within the given dates.
>
> You can also sort by price. If you have multiple overlapping campaigns, you
> might need to pay attention to which one you want to take precedence, as
> your sorting will determine which document gets shown.
>
> --
> Morten
> We all live in a yellow subroutine.

Re: Best practise to automatically change a field value for a specific period of time

Posted by Mark Schoy <he...@gmx.de>.
Hi Morten,
thanks, this is a very good solution.
I also found another solution:Creating a custom ValueSourceParser for
price sorting which consideredthe standard price and the campaign
price.
In my special case I think your approach isn't working, because i
alsoneed result grouping and this cant be combined with field
collapsing.
2011/12/2 Morten Lied Johansen <mo...@ifi.uio.no>:
> On 02. des. 2011 12:21, Mark Schoy wrote:
>
> This is a problem that can be solved with grouping.
> http://wiki.apache.org/solr/FieldCollapsing
>
> For each possible price on a product, you index a document with the dates
> and the price. In your query, you group on the product, and apply a
> date-filter, and the price you see for each product will be from the top
> document within the given dates.
>
> You can also sort by price. If you have multiple overlapping campaigns, you
> might need to pay attention to which one you want to take precedence, as
> your sorting will determine which document gets shown.
>
> --
> Morten
> We all live in a yellow subroutine.

Re: Best practise to automatically change a field value for a specific period of time

Posted by Morten Lied Johansen <mo...@ifi.uio.no>.
On 02. des. 2011 12:21, Mark Schoy wrote:
> Hi,
>
> I have an solr index for an online shop with a field "price" which
> contains the standard price of a product.
> But in the database, the shop owner can specify a period of time with
> an alternative price.
>
> For example: standard price is $20.00, but 12/24/11 08:00am to
> 12/26/11 11:59pm = $12.59

This is a problem that can be solved with grouping.
http://wiki.apache.org/solr/FieldCollapsing

For each possible price on a product, you index a document with the 
dates and the price. In your query, you group on the product, and apply 
a date-filter, and the price you see for each product will be from the 
top document within the given dates.

You can also sort by price. If you have multiple overlapping campaigns, 
you might need to pay attention to which one you want to take 
precedence, as your sorting will determine which document gets shown.

-- 
Morten
We all live in a yellow subroutine.