You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Hayden Muhl <ha...@gmail.com> on 2012/11/17 01:32:11 UTC

Solr filter using data from the database

I am working on migrating our system from Lucene to Solr, and my boss and I
are at an impasse over an architectural issue. Here's the basic setup.

We index products from multiple retailers, and allow people to search
across all retailers for specific products. It is a regular occurrence for
us to disable or deactivate a retailer (on the order of once a day). What
that means is that when someone does a search, we will not show results for
products sold by deactivated retailers. The list of active/inactive
retailers is maintained in a table in our SQL database. We have to maintain
this functionality when we move to Solr, but we can't agree on how to
implement this in Solr.

Currently, we load a new Lucene product index once every two hours. Every
time we load a new index, we run a SQL query to find the current list of
active retailers, and construct a filter based on that list.

My boss wants to essentially do the same thing we do now. Implement a
custom filter that makes a call to the database to retrieve the list of
retailers, caches that list for some period of time, then refreshes itself
from time to time with another call to the database. I find it strange
having a dependency between Solr and the database like that, because it
would require a running database being present in order to even start Solr.
I am new to Solr, so I don't have any alternative solutions.

tl;dr, We need to construct a Solr filter based on data stored in a
database. What's the best way to get that data from the database into Solr
and keep it updated?

- Hayden

Re: Solr filter using data from the database

Posted by Walter Underwood <wu...@wunderwood.org>.
Create an HTTP call backed by the database to fetch the list of valid vendors. Mark that response cacheable until the next refresh. Use an HTTP cache in case the database is temporarily unavailable.

You don't really need a custom filter, you can list all the valid vendors in the filter query. The matches will be in the filter cache, so it will be fast after the first request, even though the query is long.

wunder

On Nov 16, 2012, at 4:32 PM, Hayden Muhl wrote:

> I am working on migrating our system from Lucene to Solr, and my boss and I
> are at an impasse over an architectural issue. Here's the basic setup.
> 
> We index products from multiple retailers, and allow people to search
> across all retailers for specific products. It is a regular occurrence for
> us to disable or deactivate a retailer (on the order of once a day). What
> that means is that when someone does a search, we will not show results for
> products sold by deactivated retailers. The list of active/inactive
> retailers is maintained in a table in our SQL database. We have to maintain
> this functionality when we move to Solr, but we can't agree on how to
> implement this in Solr.
> 
> Currently, we load a new Lucene product index once every two hours. Every
> time we load a new index, we run a SQL query to find the current list of
> active retailers, and construct a filter based on that list.
> 
> My boss wants to essentially do the same thing we do now. Implement a
> custom filter that makes a call to the database to retrieve the list of
> retailers, caches that list for some period of time, then refreshes itself
> from time to time with another call to the database. I find it strange
> having a dependency between Solr and the database like that, because it
> would require a running database being present in order to even start Solr.
> I am new to Solr, so I don't have any alternative solutions.
> 
> tl;dr, We need to construct a Solr filter based on data stored in a
> database. What's the best way to get that data from the database into Solr
> and keep it updated?
> 
> - Hayden

--
Walter Underwood
wunder@wunderwood.org




Re: Solr filter using data from the database

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello Colleagues,
Recently I had talk at ApacheCon about this problem. Both proposed
approaches are definitely work. Frequent updates
http://goo.gl/xGPMUsometimes cost too much. Filters
http://goo.gl/mMvRQ works slow starting from thousand of keys and might
have low hit ratio. One of the promising approach is FunctionRangeQuery +
ExternalFileField, the current implementation requires commit for reload
that's a drawback. Here is the possible solution for this
https://issues.apache.org/jira/browse/SOLR-4085 .

Good Luck


On Sat, Nov 17, 2012 at 6:32 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Hi,
>
> I'm actually not sure what Wunder is suggesting, but here is another way.
> Have an external app that talks to the DB either on demand or every N
> minutes/hours.  When it talks to the DB it gets all merchants whose
> visibility flag was changed one way or the other since the last time the
> app checked.  Then you can simply delete all products for those merchants
> whose products are supposed to be hidden, and reindex all those whose flag
> was switched to visible.  Depending on the numbers this may or may not work
> well, but it's super simple and Solr and the DB don't know about each
> other.
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm/index.html
> Search Analytics - http://sematext.com/search-analytics/index.html
>
>
>
>
> On Fri, Nov 16, 2012 at 7:32 PM, Hayden Muhl <ha...@gmail.com> wrote:
>
> > I am working on migrating our system from Lucene to Solr, and my boss
> and I
> > are at an impasse over an architectural issue. Here's the basic setup.
> >
> > We index products from multiple retailers, and allow people to search
> > across all retailers for specific products. It is a regular occurrence
> for
> > us to disable or deactivate a retailer (on the order of once a day). What
> > that means is that when someone does a search, we will not show results
> for
> > products sold by deactivated retailers. The list of active/inactive
> > retailers is maintained in a table in our SQL database. We have to
> maintain
> > this functionality when we move to Solr, but we can't agree on how to
> > implement this in Solr.
> >
> > Currently, we load a new Lucene product index once every two hours. Every
> > time we load a new index, we run a SQL query to find the current list of
> > active retailers, and construct a filter based on that list.
> >
> > My boss wants to essentially do the same thing we do now. Implement a
> > custom filter that makes a call to the database to retrieve the list of
> > retailers, caches that list for some period of time, then refreshes
> itself
> > from time to time with another call to the database. I find it strange
> > having a dependency between Solr and the database like that, because it
> > would require a running database being present in order to even start
> Solr.
> > I am new to Solr, so I don't have any alternative solutions.
> >
> > tl;dr, We need to construct a Solr filter based on data stored in a
> > database. What's the best way to get that data from the database into
> Solr
> > and keep it updated?
> >
> > - Hayden
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Solr filter using data from the database

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

I'm actually not sure what Wunder is suggesting, but here is another way.
Have an external app that talks to the DB either on demand or every N
minutes/hours.  When it talks to the DB it gets all merchants whose
visibility flag was changed one way or the other since the last time the
app checked.  Then you can simply delete all products for those merchants
whose products are supposed to be hidden, and reindex all those whose flag
was switched to visible.  Depending on the numbers this may or may not work
well, but it's super simple and Solr and the DB don't know about each other.

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Fri, Nov 16, 2012 at 7:32 PM, Hayden Muhl <ha...@gmail.com> wrote:

> I am working on migrating our system from Lucene to Solr, and my boss and I
> are at an impasse over an architectural issue. Here's the basic setup.
>
> We index products from multiple retailers, and allow people to search
> across all retailers for specific products. It is a regular occurrence for
> us to disable or deactivate a retailer (on the order of once a day). What
> that means is that when someone does a search, we will not show results for
> products sold by deactivated retailers. The list of active/inactive
> retailers is maintained in a table in our SQL database. We have to maintain
> this functionality when we move to Solr, but we can't agree on how to
> implement this in Solr.
>
> Currently, we load a new Lucene product index once every two hours. Every
> time we load a new index, we run a SQL query to find the current list of
> active retailers, and construct a filter based on that list.
>
> My boss wants to essentially do the same thing we do now. Implement a
> custom filter that makes a call to the database to retrieve the list of
> retailers, caches that list for some period of time, then refreshes itself
> from time to time with another call to the database. I find it strange
> having a dependency between Solr and the database like that, because it
> would require a running database being present in order to even start Solr.
> I am new to Solr, so I don't have any alternative solutions.
>
> tl;dr, We need to construct a Solr filter based on data stored in a
> database. What's the best way to get that data from the database into Solr
> and keep it updated?
>
> - Hayden
>