You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Furkan KAMACI <fu...@gmail.com> on 2019/08/03 18:00:28 UTC

Re: NRT for new items in index

Hi,

First of all, could you check here:
https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
to
better understand hard commits, soft commits and transaction logs to
achieve NRT search.

Kind Regards,
Furkan KAMACI

On Wed, Jul 31, 2019 at 3:47 PM profiuser <up...@profimedia.com> wrote:

> Hi,
>
> we have something about 400 000 000 items in a solr collection.
> We have set up auto commit property for this collection to 15 minutes.
> Is a big collection and we using some caches etc. Therefore we have big
> autocommit value.
>
> This have disadvantage that we haven't NRT searches.
>
> We would like to have NRT at least for searching for the newly added items.
>
> We read about new functionality "Category routed alilases" in a solr
> version
> 8.1.
>
> And we got an idea, that we could add to our collection schema field for
> routing.
> And at the time of indexing we check if item is new and to routing field we
> set up value "new", or the item is older than some time period we set up
> value to "old".
> And we will have one category routed alias routedCollection, and there will
> be 2 collections old and new.
>
> If we index new item, router choose new collection and this item is
> inserted
> to it. After some period we reindex item and we decide that this item is
> old
> and to routing field we set up value "old". Router decide to update
> (insert)
> item to collection old. But we expect that solr automatically check
> uniqueness in all routed collections. And if solr found item in other
> collection, than will be automatically deleted. But not !!!
>
> Is this expected behaviour?
>
> Could be used this functionality for issue we have? Or could someone
> suggest
> another solution, which ensure that we have all new items ready for NRT
> searches?
>
> Thanks for your help
>
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: NRT for new items in index

Posted by Updates Profimedia <up...@profimedia.com>.

On 2019/08/06 06:43:20, Jörn Franke <jo...@gmail.com> wrote: 
> Do you have some more information on index and size? 
> 
> Do you have to store everything in the Index? Can you store some data (blobs etc) outside ?
> 
> I think you are generally right with your solution, but also be aware that it is sometimes cheaper to have several servers instead keeping engineer busy for some months to find a solution. I don’t say this is the case in your solution and I am also not a fan at throwing hardware at a problem, but an engineer (even if it affects him/herself) should always make that decision. That does not necessarily mean that engineer looses a job - one can implement other valuable features for a customer.
> 
> > Am 06.08.2019 um 08:21 schrieb Updates Profimedia <up...@profimedia.com>:
> > 
> > 
> > 
> >> On 2019/08/03 18:00:28, Furkan KAMACI <fu...@gmail.com> wrote: 
> >> Hi,
> >> 
> >> First of all, could you check here:
> >> https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >> to
> >> better understand hard commits, soft commits and transaction logs to
> >> achieve NRT search.
> >> 
> >> Kind Regards,
> >> Furkan KAMACI
> >> 
> >>> On Wed, Jul 31, 2019 at 3:47 PM profiuser <up...@profimedia.com> wrote:
> >>> 
> >>> Hi,
> >>> 
> >>> we have something about 400 000 000 items in a solr collection.
> >>> We have set up auto commit property for this collection to 15 minutes.
> >>> Is a big collection and we using some caches etc. Therefore we have big
> >>> autocommit value.
> >>> 
> >>> This have disadvantage that we haven't NRT searches.
> >>> 
> >>> We would like to have NRT at least for searching for the newly added items.
> >>> 
> >>> We read about new functionality "Category routed alilases" in a solr
> >>> version
> >>> 8.1.
> >>> 
> >>> And we got an idea, that we could add to our collection schema field for
> >>> routing.
> >>> And at the time of indexing we check if item is new and to routing field we
> >>> set up value "new", or the item is older than some time period we set up
> >>> value to "old".
> >>> And we will have one category routed alias routedCollection, and there will
> >>> be 2 collections old and new.
> >>> 
> >>> If we index new item, router choose new collection and this item is
> >>> inserted
> >>> to it. After some period we reindex item and we decide that this item is
> >>> old
> >>> and to routing field we set up value "old". Router decide to update
> >>> (insert)
> >>> item to collection old. But we expect that solr automatically check
> >>> uniqueness in all routed collections. And if solr found item in other
> >>> collection, than will be automatically deleted. But not !!!
> >>> 
> >>> Is this expected behaviour?
> >>> 
> >>> Could be used this functionality for issue we have? Or could someone
> >>> suggest
> >>> another solution, which ensure that we have all new items ready for NRT
> >>> searches?
> >>> 
> >>> Thanks for your help
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> --
> >>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >>> 
> >> 
> > 
> > Hi,
> > 
> > we know this page, and we understand how commits and transaction logs works, but as I said we have a very big index size ;-) Therefore we cannot create commits to often.
> > We must cache data for fast search, and if we will commit to often, then we can any cache throw out.
> > 
> > Now we have only one server, and we prepare new solution with Solr Cloud. Where we would have several servers. We have limited resources and we cannot afford to have for example 20 Solr servers, which I believe is a standard solution for big indexes.
> > 
> > Therefore we search for some compromise between price/performance. Therefore we think about have more collections. And one collection would be a daily feed (small index) and then we can commit every several seconds. And these collections would be merge to main collection alias.
> > 
> > Do you have another idea?
> > 
> > Best
> > 
> > 
> > 
> 

We have almost 500 GB index. We store only data which we need, rest of data we have in other storages (database, filesystem).
We have a big daily feed, something about 150 000 new items per day. And similar count of updates/deletes we have too. These are very live data.

And back to the first question, do you have someone experience with the new functionality of "Category routed alilases" . The problem with updating category at existing item to reindex to other collection, and remaining original item in original collection?
This means, that authors of this functionality doesn't assume, that someone change category to existing item?



Re: NRT for new items in index

Posted by Jörn Franke <jo...@gmail.com>.
Do you have some more information on index and size? 

Do you have to store everything in the Index? Can you store some data (blobs etc) outside ?

I think you are generally right with your solution, but also be aware that it is sometimes cheaper to have several servers instead keeping engineer busy for some months to find a solution. I don’t say this is the case in your solution and I am also not a fan at throwing hardware at a problem, but an engineer (even if it affects him/herself) should always make that decision. That does not necessarily mean that engineer looses a job - one can implement other valuable features for a customer.

> Am 06.08.2019 um 08:21 schrieb Updates Profimedia <up...@profimedia.com>:
> 
> 
> 
>> On 2019/08/03 18:00:28, Furkan KAMACI <fu...@gmail.com> wrote: 
>> Hi,
>> 
>> First of all, could you check here:
>> https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> to
>> better understand hard commits, soft commits and transaction logs to
>> achieve NRT search.
>> 
>> Kind Regards,
>> Furkan KAMACI
>> 
>>> On Wed, Jul 31, 2019 at 3:47 PM profiuser <up...@profimedia.com> wrote:
>>> 
>>> Hi,
>>> 
>>> we have something about 400 000 000 items in a solr collection.
>>> We have set up auto commit property for this collection to 15 minutes.
>>> Is a big collection and we using some caches etc. Therefore we have big
>>> autocommit value.
>>> 
>>> This have disadvantage that we haven't NRT searches.
>>> 
>>> We would like to have NRT at least for searching for the newly added items.
>>> 
>>> We read about new functionality "Category routed alilases" in a solr
>>> version
>>> 8.1.
>>> 
>>> And we got an idea, that we could add to our collection schema field for
>>> routing.
>>> And at the time of indexing we check if item is new and to routing field we
>>> set up value "new", or the item is older than some time period we set up
>>> value to "old".
>>> And we will have one category routed alias routedCollection, and there will
>>> be 2 collections old and new.
>>> 
>>> If we index new item, router choose new collection and this item is
>>> inserted
>>> to it. After some period we reindex item and we decide that this item is
>>> old
>>> and to routing field we set up value "old". Router decide to update
>>> (insert)
>>> item to collection old. But we expect that solr automatically check
>>> uniqueness in all routed collections. And if solr found item in other
>>> collection, than will be automatically deleted. But not !!!
>>> 
>>> Is this expected behaviour?
>>> 
>>> Could be used this functionality for issue we have? Or could someone
>>> suggest
>>> another solution, which ensure that we have all new items ready for NRT
>>> searches?
>>> 
>>> Thanks for your help
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>> 
>> 
> 
> Hi,
> 
> we know this page, and we understand how commits and transaction logs works, but as I said we have a very big index size ;-) Therefore we cannot create commits to often.
> We must cache data for fast search, and if we will commit to often, then we can any cache throw out.
> 
> Now we have only one server, and we prepare new solution with Solr Cloud. Where we would have several servers. We have limited resources and we cannot afford to have for example 20 Solr servers, which I believe is a standard solution for big indexes.
> 
> Therefore we search for some compromise between price/performance. Therefore we think about have more collections. And one collection would be a daily feed (small index) and then we can commit every several seconds. And these collections would be merge to main collection alias.
> 
> Do you have another idea?
> 
> Best
> 
> 
> 

Re: NRT for new items in index

Posted by Updates Profimedia <up...@profimedia.com>.

On 2019/08/03 18:00:28, Furkan KAMACI <fu...@gmail.com> wrote: 
> Hi,
> 
> First of all, could you check here:
> https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> to
> better understand hard commits, soft commits and transaction logs to
> achieve NRT search.
> 
> Kind Regards,
> Furkan KAMACI
> 
> On Wed, Jul 31, 2019 at 3:47 PM profiuser <up...@profimedia.com> wrote:
> 
> > Hi,
> >
> > we have something about 400 000 000 items in a solr collection.
> > We have set up auto commit property for this collection to 15 minutes.
> > Is a big collection and we using some caches etc. Therefore we have big
> > autocommit value.
> >
> > This have disadvantage that we haven't NRT searches.
> >
> > We would like to have NRT at least for searching for the newly added items.
> >
> > We read about new functionality "Category routed alilases" in a solr
> > version
> > 8.1.
> >
> > And we got an idea, that we could add to our collection schema field for
> > routing.
> > And at the time of indexing we check if item is new and to routing field we
> > set up value "new", or the item is older than some time period we set up
> > value to "old".
> > And we will have one category routed alias routedCollection, and there will
> > be 2 collections old and new.
> >
> > If we index new item, router choose new collection and this item is
> > inserted
> > to it. After some period we reindex item and we decide that this item is
> > old
> > and to routing field we set up value "old". Router decide to update
> > (insert)
> > item to collection old. But we expect that solr automatically check
> > uniqueness in all routed collections. And if solr found item in other
> > collection, than will be automatically deleted. But not !!!
> >
> > Is this expected behaviour?
> >
> > Could be used this functionality for issue we have? Or could someone
> > suggest
> > another solution, which ensure that we have all new items ready for NRT
> > searches?
> >
> > Thanks for your help
> >
> >
> >
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >
> 

Hi,

we know this page, and we understand how commits and transaction logs works, but as I said we have a very big index size ;-) Therefore we cannot create commits to often.
We must cache data for fast search, and if we will commit to often, then we can any cache throw out.

Now we have only one server, and we prepare new solution with Solr Cloud. Where we would have several servers. We have limited resources and we cannot afford to have for example 20 Solr servers, which I believe is a standard solution for big indexes.

Therefore we search for some compromise between price/performance. Therefore we think about have more collections. And one collection would be a daily feed (small index) and then we can commit every several seconds. And these collections would be merge to main collection alias.

Do you have another idea?

Best