You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vikram Parmar <pa...@gmail.com> on 2015/12/11 11:48:02 UTC

NRT vs Redis for Dynamic Data in SOLR (like counts, viewcounts, etc) -

We are creating a web application which would contain posts (something like
FB or say Youtube). For the stable part of the data (i.e.the facets, search
results & its content), we plan to use SOLR.

What should we use for the unstable part of the data (i.e. dynamic and
volatile content such as Like counts, Comments counts, Viewcounts)?


Option 1) Redis

What about storing the "dynamic" data in a different data store (like
Redis)? Thus, everytime the counts get refreshed, I do not have to reindex
the data into SOLR at all. Thus SOLR indexing is only triggered when new
posts are added to the site, and never on any activity on the posts by the
users.

Side-note :-
I also looked at the SOLR-Redis plugin at
https://github.com/sematext/solr-redis

The plugin looks good, but not sure if the plugin can be used to fetch the
data stored in Redis as part of the solr result set, i.e. in docs. The
description looks more like the Redis data can be used in the function
queries for boosting, sorting, etc. Anyone has experience with this?


Option 2) SOLR NRT with Soft Commits

We would depend on the in-built NRT features. Let's say we do soft-commits
every second and hard-commits every 10 seconds. Suppose huge amount of
dynamic data is created on the site across hundreds of posts, e.g. 100000
likes across 10000 posts. Thus, this would mean soft-commiting on 10000
rows every second. And then hard-commiting those many rows every 10
seconds. Isn't this overkill?


Which option is preferred? How would you compare both options in terms of
scalibility, maintenance, feasibility, best-practices, etc? Any real-life
experiences or links to articles?

Many thanks!


p.s. EFF (external file fields) is not an option, as I read that the data
in that file can only be used in function queries and cannot be returned as
part of a document.

Re: NRT vs Redis for Dynamic Data in SOLR (like counts, viewcounts, etc) -

Posted by Jack Krupansky <ja...@gmail.com>.
You can consider DataStax Enterprise (DSE) which deeply integrates
Solr (not just a plugin) with the Cassandra database (DSE Search):
http://www.datastax.com/products/datastax-enterprise-search

Solr's Join queries are supported across tables in DSE Search, so you could
keep dynamic data in a separate table (use the same partition key to assure
that the join will be more efficient by being on the same node.)


-- Jack Krupansky

On Fri, Dec 11, 2015 at 6:21 AM, Andrea Gazzarini <a....@gmail.com>
wrote:

> Hi Vikram,
> sounds like you're using those "dynamic" fields only for visualization
> (i.e. you don't need to have them "indexed")...this is the big point that
> could make the difference.
>
> If the answer is yes, about the first option (NOTE: I don't know Redis and
> that plugin), a custom SearchComponent would be very easy to implement. It
> would contribute to search results in a dedicated section of the response
> (see for example the highlight or the facet component)
>
> I don't have a concrete experience about the second option, but still
> assuming that
>
> - you need those fields stored, not indexed
> - the response page size is not huge (this is considered a bad practice in
> Solr)
>
> I would avoid to bomb Solr with repeated updates
>
> Best,
> Andrea
>
>
>
> 2015-12-11 11:48 GMT+01:00 Vikram Parmar <pa...@gmail.com>:
>
> > We are creating a web application which would contain posts (something
> like
> > FB or say Youtube). For the stable part of the data (i.e.the facets,
> search
> > results & its content), we plan to use SOLR.
> >
> > What should we use for the unstable part of the data (i.e. dynamic and
> > volatile content such as Like counts, Comments counts, Viewcounts)?
> >
> >
> > Option 1) Redis
> >
> > What about storing the "dynamic" data in a different data store (like
> > Redis)? Thus, everytime the counts get refreshed, I do not have to
> reindex
> > the data into SOLR at all. Thus SOLR indexing is only triggered when new
> > posts are added to the site, and never on any activity on the posts by
> the
> > users.
> >
> > Side-note :-
> > I also looked at the SOLR-Redis plugin at
> > https://github.com/sematext/solr-redis
> >
> > The plugin looks good, but not sure if the plugin can be used to fetch
> the
> > data stored in Redis as part of the solr result set, i.e. in docs. The
> > description looks more like the Redis data can be used in the function
> > queries for boosting, sorting, etc. Anyone has experience with this?
> >
> >
> > Option 2) SOLR NRT with Soft Commits
> >
> > We would depend on the in-built NRT features. Let's say we do
> soft-commits
> > every second and hard-commits every 10 seconds. Suppose huge amount of
> > dynamic data is created on the site across hundreds of posts, e.g. 100000
> > likes across 10000 posts. Thus, this would mean soft-commiting on 10000
> > rows every second. And then hard-commiting those many rows every 10
> > seconds. Isn't this overkill?
> >
> >
> > Which option is preferred? How would you compare both options in terms of
> > scalibility, maintenance, feasibility, best-practices, etc? Any real-life
> > experiences or links to articles?
> >
> > Many thanks!
> >
> >
> > p.s. EFF (external file fields) is not an option, as I read that the data
> > in that file can only be used in function queries and cannot be returned
> as
> > part of a document.
> >
>

Re: NRT vs Redis for Dynamic Data in SOLR (like counts, viewcounts, etc) -

Posted by Andrea Gazzarini <a....@gmail.com>.
Hi Vikram,
sounds like you're using those "dynamic" fields only for visualization
(i.e. you don't need to have them "indexed")...this is the big point that
could make the difference.

If the answer is yes, about the first option (NOTE: I don't know Redis and
that plugin), a custom SearchComponent would be very easy to implement. It
would contribute to search results in a dedicated section of the response
(see for example the highlight or the facet component)

I don't have a concrete experience about the second option, but still
assuming that

- you need those fields stored, not indexed
- the response page size is not huge (this is considered a bad practice in
Solr)

I would avoid to bomb Solr with repeated updates

Best,
Andrea



2015-12-11 11:48 GMT+01:00 Vikram Parmar <pa...@gmail.com>:

> We are creating a web application which would contain posts (something like
> FB or say Youtube). For the stable part of the data (i.e.the facets, search
> results & its content), we plan to use SOLR.
>
> What should we use for the unstable part of the data (i.e. dynamic and
> volatile content such as Like counts, Comments counts, Viewcounts)?
>
>
> Option 1) Redis
>
> What about storing the "dynamic" data in a different data store (like
> Redis)? Thus, everytime the counts get refreshed, I do not have to reindex
> the data into SOLR at all. Thus SOLR indexing is only triggered when new
> posts are added to the site, and never on any activity on the posts by the
> users.
>
> Side-note :-
> I also looked at the SOLR-Redis plugin at
> https://github.com/sematext/solr-redis
>
> The plugin looks good, but not sure if the plugin can be used to fetch the
> data stored in Redis as part of the solr result set, i.e. in docs. The
> description looks more like the Redis data can be used in the function
> queries for boosting, sorting, etc. Anyone has experience with this?
>
>
> Option 2) SOLR NRT with Soft Commits
>
> We would depend on the in-built NRT features. Let's say we do soft-commits
> every second and hard-commits every 10 seconds. Suppose huge amount of
> dynamic data is created on the site across hundreds of posts, e.g. 100000
> likes across 10000 posts. Thus, this would mean soft-commiting on 10000
> rows every second. And then hard-commiting those many rows every 10
> seconds. Isn't this overkill?
>
>
> Which option is preferred? How would you compare both options in terms of
> scalibility, maintenance, feasibility, best-practices, etc? Any real-life
> experiences or links to articles?
>
> Many thanks!
>
>
> p.s. EFF (external file fields) is not an option, as I read that the data
> in that file can only be used in function queries and cannot be returned as
> part of a document.
>

Re: NRT vs Redis for Dynamic Data in SOLR (like counts, viewcounts, etc) -

Posted by Charlie Hull <ch...@flax.co.uk>.
On 15/12/2015 14:13, Vikram Parmar wrote:
> Hi Mikhail,

Hi,

In case you're interested, several years ago we prototyped a Lucene 
codec using Redis for just this sort of application:
http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/

It's a slightly crazy idea but appeared to work :)

Charlie
>
> Thanks for chiming in. Looking forward to your post regarding updatable
> numeric DocValues.
>
> What would be the 2nd most promising approach for now, would you say EFF
> should be ok to go with?
>
> Updating and reloading the EFF external file (containing a millions lines)
> at very short intervals is fine? Say every 10 seconds?
>
> Thanks!
>
> On Tue, Dec 15, 2015 at 5:46 PM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
>> I believe https://issues.apache.org/jira/browse/SOLR-5944 is the most
>> promising approach for such scenarios.
>> Despite it's not delivered in distro.
>> We are going to publish a post about it at blog.griddynamics.com.
>>
>> FWIW, I suppose EFF can be returned in result list.
>>
>>
>>
>> On Fri, Dec 11, 2015 at 1:48 PM, Vikram Parmar <pa...@gmail.com>
>> wrote:
>>
>>> We are creating a web application which would contain posts (something
>> like
>>> FB or say Youtube). For the stable part of the data (i.e.the facets,
>> search
>>> results & its content), we plan to use SOLR.
>>>
>>> What should we use for the unstable part of the data (i.e. dynamic and
>>> volatile content such as Like counts, Comments counts, Viewcounts)?
>>>
>>>
>>> Option 1) Redis
>>>
>>> What about storing the "dynamic" data in a different data store (like
>>> Redis)? Thus, everytime the counts get refreshed, I do not have to
>> reindex
>>> the data into SOLR at all. Thus SOLR indexing is only triggered when new
>>> posts are added to the site, and never on any activity on the posts by
>> the
>>> users.
>>>
>>> Side-note :-
>>> I also looked at the SOLR-Redis plugin at
>>> https://github.com/sematext/solr-redis
>>>
>>> The plugin looks good, but not sure if the plugin can be used to fetch
>> the
>>> data stored in Redis as part of the solr result set, i.e. in docs. The
>>> description looks more like the Redis data can be used in the function
>>> queries for boosting, sorting, etc. Anyone has experience with this?
>>>
>>>
>>> Option 2) SOLR NRT with Soft Commits
>>>
>>> We would depend on the in-built NRT features. Let's say we do
>> soft-commits
>>> every second and hard-commits every 10 seconds. Suppose huge amount of
>>> dynamic data is created on the site across hundreds of posts, e.g. 100000
>>> likes across 10000 posts. Thus, this would mean soft-commiting on 10000
>>> rows every second. And then hard-commiting those many rows every 10
>>> seconds. Isn't this overkill?
>>>
>>>
>>> Which option is preferred? How would you compare both options in terms of
>>> scalibility, maintenance, feasibility, best-practices, etc? Any real-life
>>> experiences or links to articles?
>>>
>>> Many thanks!
>>>
>>>
>>> p.s. EFF (external file fields) is not an option, as I read that the data
>>> in that file can only be used in function queries and cannot be returned
>> as
>>> part of a document.
>>>
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> <http://www.griddynamics.com>
>> <mk...@griddynamics.com>
>>
>


-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Re: NRT vs Redis for Dynamic Data in SOLR (like counts, viewcounts, etc) -

Posted by Vikram Parmar <pa...@gmail.com>.
Hi Mikhail,

Thanks for chiming in. Looking forward to your post regarding updatable
numeric DocValues.

What would be the 2nd most promising approach for now, would you say EFF
should be ok to go with?

Updating and reloading the EFF external file (containing a millions lines)
at very short intervals is fine? Say every 10 seconds?

Thanks!

On Tue, Dec 15, 2015 at 5:46 PM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> I believe https://issues.apache.org/jira/browse/SOLR-5944 is the most
> promising approach for such scenarios.
> Despite it's not delivered in distro.
> We are going to publish a post about it at blog.griddynamics.com.
>
> FWIW, I suppose EFF can be returned in result list.
>
>
>
> On Fri, Dec 11, 2015 at 1:48 PM, Vikram Parmar <pa...@gmail.com>
> wrote:
>
> > We are creating a web application which would contain posts (something
> like
> > FB or say Youtube). For the stable part of the data (i.e.the facets,
> search
> > results & its content), we plan to use SOLR.
> >
> > What should we use for the unstable part of the data (i.e. dynamic and
> > volatile content such as Like counts, Comments counts, Viewcounts)?
> >
> >
> > Option 1) Redis
> >
> > What about storing the "dynamic" data in a different data store (like
> > Redis)? Thus, everytime the counts get refreshed, I do not have to
> reindex
> > the data into SOLR at all. Thus SOLR indexing is only triggered when new
> > posts are added to the site, and never on any activity on the posts by
> the
> > users.
> >
> > Side-note :-
> > I also looked at the SOLR-Redis plugin at
> > https://github.com/sematext/solr-redis
> >
> > The plugin looks good, but not sure if the plugin can be used to fetch
> the
> > data stored in Redis as part of the solr result set, i.e. in docs. The
> > description looks more like the Redis data can be used in the function
> > queries for boosting, sorting, etc. Anyone has experience with this?
> >
> >
> > Option 2) SOLR NRT with Soft Commits
> >
> > We would depend on the in-built NRT features. Let's say we do
> soft-commits
> > every second and hard-commits every 10 seconds. Suppose huge amount of
> > dynamic data is created on the site across hundreds of posts, e.g. 100000
> > likes across 10000 posts. Thus, this would mean soft-commiting on 10000
> > rows every second. And then hard-commiting those many rows every 10
> > seconds. Isn't this overkill?
> >
> >
> > Which option is preferred? How would you compare both options in terms of
> > scalibility, maintenance, feasibility, best-practices, etc? Any real-life
> > experiences or links to articles?
> >
> > Many thanks!
> >
> >
> > p.s. EFF (external file fields) is not an option, as I read that the data
> > in that file can only be used in function queries and cannot be returned
> as
> > part of a document.
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mk...@griddynamics.com>
>

Re: NRT vs Redis for Dynamic Data in SOLR (like counts, viewcounts, etc) -

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
I believe https://issues.apache.org/jira/browse/SOLR-5944 is the most
promising approach for such scenarios.
Despite it's not delivered in distro.
We are going to publish a post about it at blog.griddynamics.com.

FWIW, I suppose EFF can be returned in result list.



On Fri, Dec 11, 2015 at 1:48 PM, Vikram Parmar <pa...@gmail.com>
wrote:

> We are creating a web application which would contain posts (something like
> FB or say Youtube). For the stable part of the data (i.e.the facets, search
> results & its content), we plan to use SOLR.
>
> What should we use for the unstable part of the data (i.e. dynamic and
> volatile content such as Like counts, Comments counts, Viewcounts)?
>
>
> Option 1) Redis
>
> What about storing the "dynamic" data in a different data store (like
> Redis)? Thus, everytime the counts get refreshed, I do not have to reindex
> the data into SOLR at all. Thus SOLR indexing is only triggered when new
> posts are added to the site, and never on any activity on the posts by the
> users.
>
> Side-note :-
> I also looked at the SOLR-Redis plugin at
> https://github.com/sematext/solr-redis
>
> The plugin looks good, but not sure if the plugin can be used to fetch the
> data stored in Redis as part of the solr result set, i.e. in docs. The
> description looks more like the Redis data can be used in the function
> queries for boosting, sorting, etc. Anyone has experience with this?
>
>
> Option 2) SOLR NRT with Soft Commits
>
> We would depend on the in-built NRT features. Let's say we do soft-commits
> every second and hard-commits every 10 seconds. Suppose huge amount of
> dynamic data is created on the site across hundreds of posts, e.g. 100000
> likes across 10000 posts. Thus, this would mean soft-commiting on 10000
> rows every second. And then hard-commiting those many rows every 10
> seconds. Isn't this overkill?
>
>
> Which option is preferred? How would you compare both options in terms of
> scalibility, maintenance, feasibility, best-practices, etc? Any real-life
> experiences or links to articles?
>
> Many thanks!
>
>
> p.s. EFF (external file fields) is not an option, as I read that the data
> in that file can only be used in function queries and cannot be returned as
> part of a document.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>