You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vincenzo D'Amore <v....@gmail.com> on 2015/05/24 04:56:06 UTC

SolrCloud 4.8 - Transaction log size over 1GB

Hi,

looking at tlog size I see there are many collection that have keep more
than 1GB of space.
Tlog are growing and the code that adds new documents never does an hard
commit.

The question is must I fix the code that update the collections or can I do
an hard commit externally using collection api or via admin console?

Thanks, any help is appreciated,
Vincenzo

Re: SolrCloud 4.8 - Transaction log size over 1GB

Posted by Erick Erickson <er...@gmail.com>.
right, autoCommit (in solrconfig.xml) will
1> close the current Lucene segments and open a new one
2> close the tlog and start a new one.

Those actions are independent of whether openSearcher=true or false.
if (and only if) openSearcher=true, then the commits will be
immediately visible to a query.

So then it's up to you to issue either a soft commit (or hard commit
with openSearcher=true) at
some point for the docs to be visible.

bq: Does it mean, let me say, that when openSearcher=false we have implicit
commit done by solrCloud <autoCommit> not visible to world and explicit
commit done by clients visible to world?

Exactly. Now, this all assumes that you want all your recent indexing
to be visible at once. If you don't care whether documents become
visible while you're indexing but before the whole thing is done,
then:
1> set autoCommit with openSearcher=false to some fairly short
interval, say 1 minute.
2> set autoSoftCommit to some longer interval (say 5 minutes).

Now you don't have to do anything at all. Don't commit from the
client. Just wait 5 minutes after the indexing is done before
expecting to see _all_ the docs from your indexing run.

Do note one quirk though. Let's claim you're doing autoCommits with
openSearcher=false. If you restart Solr, then those changes _will_
become visible.

Best,
Erick

On Tue, May 26, 2015 at 9:33 AM, Vincenzo D'Amore <v....@gmail.com> wrote:
> Thanks Erick for your willingness and patience,
>
> if I understood well when autoCommit with openSearcher=true at first commit
> (soft or hard) all new documents will be automatically available for search.
> But when openSearcher=false, the commit will flush recent index changes to
> stable storage, but does not cause a new searcher to be opened to make
> those changes visible
> <https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-autoCommit>
> .
>
> So, it is not clear what is this stable storage, where is and when the new
> documents will be visible?
> Only when at very end of indexing process my code will commit ?
>
> Does it mean, let me say, that when openSearcher=false we have implicit
> commit done by solrCloud <autoCommit> not visible to world and explicit
> commit done by clients visible to world?
>
>
>
>
> On Tue, May 26, 2015 at 2:55 AM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> The design is that the latest successfully flushed tlog file is kept
>> for "peer sync" in SolrCloud mode. When a replica comes up, there's a
>> chance that it's not very many docs behind. So, if possible, some of
>> the docs are taken from the leader's tlog and replayed to the follower
>> that's just been started. If the follower is too far out of sync, a
>> full old-style replication is done. So there will always be a tlog
>> file (and occasionally more than one if they're very small) kept
>> around, even on successful commit. It doesn't matter if you have
>> leaders and replicas or not, that's still the process that's followed.
>>
>> Please re-read the link I sent earlier. There's absolutely no reason
>> your tlog files have to be so big! Really, set you autoCommit to, say,
>> 15 seconds and 100000 docs and set openSearcher=false in your
>> solrconfig.xml file and your tlog file that's kept around will be much
>> smaller and they'll be available for "peer sync"..
>>
>> And if you really don't care about tlogs at all, just take this bit
>> our of your solrconfig.xml
>>
>>     <updateLog>
>>       <str name="dir">${solr.ulog.dir:}</str>
>>       <int name="">${solr.ulog.numVersionBuckets:256}</int>
>>     </updateLog>
>>
>>
>>
>> Best,
>> Erick
>>
>> On Mon, May 25, 2015 at 4:40 PM, Vincenzo D'Amore <v....@gmail.com>
>> wrote:
>> > Hi Erick,
>> >
>> > I have tried indexing code I have few times, this is the behaviour I have
>> > tried out:
>> >
>> > When an indexing process starts, even if one or more tlog file exists, a
>> > new tlog file is created and all the new documents are stored there.
>> > When indexing process ends and does an hard commit, older old tlog files
>> > are removed but the new one (the latest) remains.
>> >
>> > As far as I can see, since my indexing process every time loads few
>> > millions of documents, at end of process latest tlog file persist with
>> all
>> > these documents there.
>> > So I have such big tlog files. Now the question is, why latest tlog file
>> > persist even if the code have done a hard commit.
>> > When an hard commit is done successfully, why should we keep latest tlog
>> > file?
>> >
>> >
>> >
>> > On Mon, May 25, 2015 at 7:24 PM, Erick Erickson <erickerickson@gmail.com
>> >
>> > wrote:
>> >
>> >> OK, assuming you're not doing any commits at all until the very end,
>> >> then the tlog contains all the docs for the _entire_ run. The article
>> >> really doesn't care whether the commits come from the solrconfig.xml
>> >> or SolrJ client or curl. The tlog simply is not truncated until a hard
>> >> commit happens, no matter where it comes from.
>> >>
>> >> So here's what I'd do:
>> >> 1> set autoCommit in your solrconfig.xml with openSearcher=false for
>> >> every minute. Then the problem will probably go away.
>> >> or
>> >> 2> periodically issue a hard commit (openSearcher=false) from the
>> client.
>> >>
>> >> Of the two, I _strongly_ recommend <1> as it's more graceful when
>> >> there are multiple clents.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Mon, May 25, 2015 at 4:45 AM, Vincenzo D'Amore <v....@gmail.com>
>> >> wrote:
>> >> > Hi Erick, thanks for your support.
>> >> >
>> >> > Reading the post I realised that my scenario does not apply the
>> >> autoCommit
>> >> > configuration, now we don't have autoCommit in our solrconfig.xml.
>> >> >
>> >> > We need docs are searchable only after the indexing process, and all
>> the
>> >> > documents are committed only at end of index process.
>> >> >
>> >> > Now I don't understand why tlog files are so big, given that we have
>> an
>> >> > hard commit at end of every indexing.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Sun, May 24, 2015 at 5:49 PM, Erick Erickson <
>> erickerickson@gmail.com
>> >> >
>> >> > wrote:
>> >> >
>> >> >> Vincenzo:
>> >> >>
>> >> >> Here's perhaps more than you want to know about hard commits, soft
>> >> >> commits and transaction logs:
>> >> >>
>> >> >>
>> >> >>
>> >>
>> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> >> >>
>> >> >> Best,
>> >> >> Erick
>> >> >>
>> >> >> On Sun, May 24, 2015 at 12:04 AM, Vincenzo D'Amore <
>> v.damore@gmail.com>
>> >> >> wrote:
>> >> >> > Thanks Shawn for your prompt support.
>> >> >> >
>> >> >> > Best regards,
>> >> >> > Vincenzo
>> >> >> >
>> >> >> > On Sun, May 24, 2015 at 6:45 AM, Shawn Heisey <apache@elyograg.org
>> >
>> >> >> wrote:
>> >> >> >
>> >> >> >> On 5/23/2015 9:41 PM, Vincenzo D'Amore wrote:
>> >> >> >> > Thanks Shawn,
>> >> >> >> >
>> >> >> >> > may be this is a silly question, but I looked around and didn't
>> >> find
>> >> >> an
>> >> >> >> > answer...
>> >> >> >> > Well, could I update solrconfig.xml for the collection while the
>> >> >> >> instances
>> >> >> >> > are running or should I restart the cluster/reload the cores?
>> >> >> >>
>> >> >> >> You can upload a new config to zookeeper with the zkcli program
>> while
>> >> >> >> Solr is running, and nothing will change, at least not
>> immediately.
>> >> The
>> >> >> >> new config will take effect when you reload the collection or
>> restart
>> >> >> >> all the Solr instances.
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> Shawn
>> >> >> >>
>> >> >> >>
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Vincenzo D'Amore
>> >> > email: v.damore@gmail.com
>> >> > skype: free.dev
>> >> > mobile: +39 349 8513251
>> >>
>> >
>> >
>> >
>> > --
>> > Vincenzo D'Amore
>> > email: v.damore@gmail.com
>> > skype: free.dev
>> > mobile: +39 349 8513251
>>
>
>
>
> --
> Vincenzo D'Amore
> email: v.damore@gmail.com
> skype: free.dev
> mobile: +39 349 8513251

Re: SolrCloud 4.8 - Transaction log size over 1GB

Posted by Vincenzo D'Amore <v....@gmail.com>.
Thanks Erick for your willingness and patience,

if I understood well when autoCommit with openSearcher=true at first commit
(soft or hard) all new documents will be automatically available for search.
But when openSearcher=false, the commit will flush recent index changes to
stable storage, but does not cause a new searcher to be opened to make
those changes visible
<https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-autoCommit>
.

So, it is not clear what is this stable storage, where is and when the new
documents will be visible?
Only when at very end of indexing process my code will commit ?

Does it mean, let me say, that when openSearcher=false we have implicit
commit done by solrCloud <autoCommit> not visible to world and explicit
commit done by clients visible to world?




On Tue, May 26, 2015 at 2:55 AM, Erick Erickson <er...@gmail.com>
wrote:

> The design is that the latest successfully flushed tlog file is kept
> for "peer sync" in SolrCloud mode. When a replica comes up, there's a
> chance that it's not very many docs behind. So, if possible, some of
> the docs are taken from the leader's tlog and replayed to the follower
> that's just been started. If the follower is too far out of sync, a
> full old-style replication is done. So there will always be a tlog
> file (and occasionally more than one if they're very small) kept
> around, even on successful commit. It doesn't matter if you have
> leaders and replicas or not, that's still the process that's followed.
>
> Please re-read the link I sent earlier. There's absolutely no reason
> your tlog files have to be so big! Really, set you autoCommit to, say,
> 15 seconds and 100000 docs and set openSearcher=false in your
> solrconfig.xml file and your tlog file that's kept around will be much
> smaller and they'll be available for "peer sync"..
>
> And if you really don't care about tlogs at all, just take this bit
> our of your solrconfig.xml
>
>     <updateLog>
>       <str name="dir">${solr.ulog.dir:}</str>
>       <int name="">${solr.ulog.numVersionBuckets:256}</int>
>     </updateLog>
>
>
>
> Best,
> Erick
>
> On Mon, May 25, 2015 at 4:40 PM, Vincenzo D'Amore <v....@gmail.com>
> wrote:
> > Hi Erick,
> >
> > I have tried indexing code I have few times, this is the behaviour I have
> > tried out:
> >
> > When an indexing process starts, even if one or more tlog file exists, a
> > new tlog file is created and all the new documents are stored there.
> > When indexing process ends and does an hard commit, older old tlog files
> > are removed but the new one (the latest) remains.
> >
> > As far as I can see, since my indexing process every time loads few
> > millions of documents, at end of process latest tlog file persist with
> all
> > these documents there.
> > So I have such big tlog files. Now the question is, why latest tlog file
> > persist even if the code have done a hard commit.
> > When an hard commit is done successfully, why should we keep latest tlog
> > file?
> >
> >
> >
> > On Mon, May 25, 2015 at 7:24 PM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> >> OK, assuming you're not doing any commits at all until the very end,
> >> then the tlog contains all the docs for the _entire_ run. The article
> >> really doesn't care whether the commits come from the solrconfig.xml
> >> or SolrJ client or curl. The tlog simply is not truncated until a hard
> >> commit happens, no matter where it comes from.
> >>
> >> So here's what I'd do:
> >> 1> set autoCommit in your solrconfig.xml with openSearcher=false for
> >> every minute. Then the problem will probably go away.
> >> or
> >> 2> periodically issue a hard commit (openSearcher=false) from the
> client.
> >>
> >> Of the two, I _strongly_ recommend <1> as it's more graceful when
> >> there are multiple clents.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, May 25, 2015 at 4:45 AM, Vincenzo D'Amore <v....@gmail.com>
> >> wrote:
> >> > Hi Erick, thanks for your support.
> >> >
> >> > Reading the post I realised that my scenario does not apply the
> >> autoCommit
> >> > configuration, now we don't have autoCommit in our solrconfig.xml.
> >> >
> >> > We need docs are searchable only after the indexing process, and all
> the
> >> > documents are committed only at end of index process.
> >> >
> >> > Now I don't understand why tlog files are so big, given that we have
> an
> >> > hard commit at end of every indexing.
> >> >
> >> >
> >> >
> >> >
> >> > On Sun, May 24, 2015 at 5:49 PM, Erick Erickson <
> erickerickson@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> Vincenzo:
> >> >>
> >> >> Here's perhaps more than you want to know about hard commits, soft
> >> >> commits and transaction logs:
> >> >>
> >> >>
> >> >>
> >>
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Sun, May 24, 2015 at 12:04 AM, Vincenzo D'Amore <
> v.damore@gmail.com>
> >> >> wrote:
> >> >> > Thanks Shawn for your prompt support.
> >> >> >
> >> >> > Best regards,
> >> >> > Vincenzo
> >> >> >
> >> >> > On Sun, May 24, 2015 at 6:45 AM, Shawn Heisey <apache@elyograg.org
> >
> >> >> wrote:
> >> >> >
> >> >> >> On 5/23/2015 9:41 PM, Vincenzo D'Amore wrote:
> >> >> >> > Thanks Shawn,
> >> >> >> >
> >> >> >> > may be this is a silly question, but I looked around and didn't
> >> find
> >> >> an
> >> >> >> > answer...
> >> >> >> > Well, could I update solrconfig.xml for the collection while the
> >> >> >> instances
> >> >> >> > are running or should I restart the cluster/reload the cores?
> >> >> >>
> >> >> >> You can upload a new config to zookeeper with the zkcli program
> while
> >> >> >> Solr is running, and nothing will change, at least not
> immediately.
> >> The
> >> >> >> new config will take effect when you reload the collection or
> restart
> >> >> >> all the Solr instances.
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Shawn
> >> >> >>
> >> >> >>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Vincenzo D'Amore
> >> > email: v.damore@gmail.com
> >> > skype: free.dev
> >> > mobile: +39 349 8513251
> >>
> >
> >
> >
> > --
> > Vincenzo D'Amore
> > email: v.damore@gmail.com
> > skype: free.dev
> > mobile: +39 349 8513251
>



-- 
Vincenzo D'Amore
email: v.damore@gmail.com
skype: free.dev
mobile: +39 349 8513251

Re: SolrCloud 4.8 - Transaction log size over 1GB

Posted by Erick Erickson <er...@gmail.com>.
The design is that the latest successfully flushed tlog file is kept
for "peer sync" in SolrCloud mode. When a replica comes up, there's a
chance that it's not very many docs behind. So, if possible, some of
the docs are taken from the leader's tlog and replayed to the follower
that's just been started. If the follower is too far out of sync, a
full old-style replication is done. So there will always be a tlog
file (and occasionally more than one if they're very small) kept
around, even on successful commit. It doesn't matter if you have
leaders and replicas or not, that's still the process that's followed.

Please re-read the link I sent earlier. There's absolutely no reason
your tlog files have to be so big! Really, set you autoCommit to, say,
15 seconds and 100000 docs and set openSearcher=false in your
solrconfig.xml file and your tlog file that's kept around will be much
smaller and they'll be available for "peer sync"..

And if you really don't care about tlogs at all, just take this bit
our of your solrconfig.xml

    <updateLog>
      <str name="dir">${solr.ulog.dir:}</str>
      <int name="">${solr.ulog.numVersionBuckets:256}</int>
    </updateLog>



Best,
Erick

On Mon, May 25, 2015 at 4:40 PM, Vincenzo D'Amore <v....@gmail.com> wrote:
> Hi Erick,
>
> I have tried indexing code I have few times, this is the behaviour I have
> tried out:
>
> When an indexing process starts, even if one or more tlog file exists, a
> new tlog file is created and all the new documents are stored there.
> When indexing process ends and does an hard commit, older old tlog files
> are removed but the new one (the latest) remains.
>
> As far as I can see, since my indexing process every time loads few
> millions of documents, at end of process latest tlog file persist with all
> these documents there.
> So I have such big tlog files. Now the question is, why latest tlog file
> persist even if the code have done a hard commit.
> When an hard commit is done successfully, why should we keep latest tlog
> file?
>
>
>
> On Mon, May 25, 2015 at 7:24 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> OK, assuming you're not doing any commits at all until the very end,
>> then the tlog contains all the docs for the _entire_ run. The article
>> really doesn't care whether the commits come from the solrconfig.xml
>> or SolrJ client or curl. The tlog simply is not truncated until a hard
>> commit happens, no matter where it comes from.
>>
>> So here's what I'd do:
>> 1> set autoCommit in your solrconfig.xml with openSearcher=false for
>> every minute. Then the problem will probably go away.
>> or
>> 2> periodically issue a hard commit (openSearcher=false) from the client.
>>
>> Of the two, I _strongly_ recommend <1> as it's more graceful when
>> there are multiple clents.
>>
>> Best,
>> Erick
>>
>> On Mon, May 25, 2015 at 4:45 AM, Vincenzo D'Amore <v....@gmail.com>
>> wrote:
>> > Hi Erick, thanks for your support.
>> >
>> > Reading the post I realised that my scenario does not apply the
>> autoCommit
>> > configuration, now we don't have autoCommit in our solrconfig.xml.
>> >
>> > We need docs are searchable only after the indexing process, and all the
>> > documents are committed only at end of index process.
>> >
>> > Now I don't understand why tlog files are so big, given that we have an
>> > hard commit at end of every indexing.
>> >
>> >
>> >
>> >
>> > On Sun, May 24, 2015 at 5:49 PM, Erick Erickson <erickerickson@gmail.com
>> >
>> > wrote:
>> >
>> >> Vincenzo:
>> >>
>> >> Here's perhaps more than you want to know about hard commits, soft
>> >> commits and transaction logs:
>> >>
>> >>
>> >>
>> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Sun, May 24, 2015 at 12:04 AM, Vincenzo D'Amore <v....@gmail.com>
>> >> wrote:
>> >> > Thanks Shawn for your prompt support.
>> >> >
>> >> > Best regards,
>> >> > Vincenzo
>> >> >
>> >> > On Sun, May 24, 2015 at 6:45 AM, Shawn Heisey <ap...@elyograg.org>
>> >> wrote:
>> >> >
>> >> >> On 5/23/2015 9:41 PM, Vincenzo D'Amore wrote:
>> >> >> > Thanks Shawn,
>> >> >> >
>> >> >> > may be this is a silly question, but I looked around and didn't
>> find
>> >> an
>> >> >> > answer...
>> >> >> > Well, could I update solrconfig.xml for the collection while the
>> >> >> instances
>> >> >> > are running or should I restart the cluster/reload the cores?
>> >> >>
>> >> >> You can upload a new config to zookeeper with the zkcli program while
>> >> >> Solr is running, and nothing will change, at least not immediately.
>> The
>> >> >> new config will take effect when you reload the collection or restart
>> >> >> all the Solr instances.
>> >> >>
>> >> >> Thanks,
>> >> >> Shawn
>> >> >>
>> >> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Vincenzo D'Amore
>> > email: v.damore@gmail.com
>> > skype: free.dev
>> > mobile: +39 349 8513251
>>
>
>
>
> --
> Vincenzo D'Amore
> email: v.damore@gmail.com
> skype: free.dev
> mobile: +39 349 8513251

Re: SolrCloud 4.8 - Transaction log size over 1GB

Posted by Vincenzo D'Amore <v....@gmail.com>.
Hi Erick,

I have tried indexing code I have few times, this is the behaviour I have
tried out:

When an indexing process starts, even if one or more tlog file exists, a
new tlog file is created and all the new documents are stored there.
When indexing process ends and does an hard commit, older old tlog files
are removed but the new one (the latest) remains.

As far as I can see, since my indexing process every time loads few
millions of documents, at end of process latest tlog file persist with all
these documents there.
So I have such big tlog files. Now the question is, why latest tlog file
persist even if the code have done a hard commit.
When an hard commit is done successfully, why should we keep latest tlog
file?



On Mon, May 25, 2015 at 7:24 PM, Erick Erickson <er...@gmail.com>
wrote:

> OK, assuming you're not doing any commits at all until the very end,
> then the tlog contains all the docs for the _entire_ run. The article
> really doesn't care whether the commits come from the solrconfig.xml
> or SolrJ client or curl. The tlog simply is not truncated until a hard
> commit happens, no matter where it comes from.
>
> So here's what I'd do:
> 1> set autoCommit in your solrconfig.xml with openSearcher=false for
> every minute. Then the problem will probably go away.
> or
> 2> periodically issue a hard commit (openSearcher=false) from the client.
>
> Of the two, I _strongly_ recommend <1> as it's more graceful when
> there are multiple clents.
>
> Best,
> Erick
>
> On Mon, May 25, 2015 at 4:45 AM, Vincenzo D'Amore <v....@gmail.com>
> wrote:
> > Hi Erick, thanks for your support.
> >
> > Reading the post I realised that my scenario does not apply the
> autoCommit
> > configuration, now we don't have autoCommit in our solrconfig.xml.
> >
> > We need docs are searchable only after the indexing process, and all the
> > documents are committed only at end of index process.
> >
> > Now I don't understand why tlog files are so big, given that we have an
> > hard commit at end of every indexing.
> >
> >
> >
> >
> > On Sun, May 24, 2015 at 5:49 PM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> >> Vincenzo:
> >>
> >> Here's perhaps more than you want to know about hard commits, soft
> >> commits and transaction logs:
> >>
> >>
> >>
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, May 24, 2015 at 12:04 AM, Vincenzo D'Amore <v....@gmail.com>
> >> wrote:
> >> > Thanks Shawn for your prompt support.
> >> >
> >> > Best regards,
> >> > Vincenzo
> >> >
> >> > On Sun, May 24, 2015 at 6:45 AM, Shawn Heisey <ap...@elyograg.org>
> >> wrote:
> >> >
> >> >> On 5/23/2015 9:41 PM, Vincenzo D'Amore wrote:
> >> >> > Thanks Shawn,
> >> >> >
> >> >> > may be this is a silly question, but I looked around and didn't
> find
> >> an
> >> >> > answer...
> >> >> > Well, could I update solrconfig.xml for the collection while the
> >> >> instances
> >> >> > are running or should I restart the cluster/reload the cores?
> >> >>
> >> >> You can upload a new config to zookeeper with the zkcli program while
> >> >> Solr is running, and nothing will change, at least not immediately.
> The
> >> >> new config will take effect when you reload the collection or restart
> >> >> all the Solr instances.
> >> >>
> >> >> Thanks,
> >> >> Shawn
> >> >>
> >> >>
> >>
> >
> >
> >
> > --
> > Vincenzo D'Amore
> > email: v.damore@gmail.com
> > skype: free.dev
> > mobile: +39 349 8513251
>



-- 
Vincenzo D'Amore
email: v.damore@gmail.com
skype: free.dev
mobile: +39 349 8513251

Re: SolrCloud 4.8 - Transaction log size over 1GB

Posted by Erick Erickson <er...@gmail.com>.
OK, assuming you're not doing any commits at all until the very end,
then the tlog contains all the docs for the _entire_ run. The article
really doesn't care whether the commits come from the solrconfig.xml
or SolrJ client or curl. The tlog simply is not truncated until a hard
commit happens, no matter where it comes from.

So here's what I'd do:
1> set autoCommit in your solrconfig.xml with openSearcher=false for
every minute. Then the problem will probably go away.
or
2> periodically issue a hard commit (openSearcher=false) from the client.

Of the two, I _strongly_ recommend <1> as it's more graceful when
there are multiple clents.

Best,
Erick

On Mon, May 25, 2015 at 4:45 AM, Vincenzo D'Amore <v....@gmail.com> wrote:
> Hi Erick, thanks for your support.
>
> Reading the post I realised that my scenario does not apply the autoCommit
> configuration, now we don't have autoCommit in our solrconfig.xml.
>
> We need docs are searchable only after the indexing process, and all the
> documents are committed only at end of index process.
>
> Now I don't understand why tlog files are so big, given that we have an
> hard commit at end of every indexing.
>
>
>
>
> On Sun, May 24, 2015 at 5:49 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Vincenzo:
>>
>> Here's perhaps more than you want to know about hard commits, soft
>> commits and transaction logs:
>>
>>
>> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> Best,
>> Erick
>>
>> On Sun, May 24, 2015 at 12:04 AM, Vincenzo D'Amore <v....@gmail.com>
>> wrote:
>> > Thanks Shawn for your prompt support.
>> >
>> > Best regards,
>> > Vincenzo
>> >
>> > On Sun, May 24, 2015 at 6:45 AM, Shawn Heisey <ap...@elyograg.org>
>> wrote:
>> >
>> >> On 5/23/2015 9:41 PM, Vincenzo D'Amore wrote:
>> >> > Thanks Shawn,
>> >> >
>> >> > may be this is a silly question, but I looked around and didn't find
>> an
>> >> > answer...
>> >> > Well, could I update solrconfig.xml for the collection while the
>> >> instances
>> >> > are running or should I restart the cluster/reload the cores?
>> >>
>> >> You can upload a new config to zookeeper with the zkcli program while
>> >> Solr is running, and nothing will change, at least not immediately.  The
>> >> new config will take effect when you reload the collection or restart
>> >> all the Solr instances.
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>> >>
>>
>
>
>
> --
> Vincenzo D'Amore
> email: v.damore@gmail.com
> skype: free.dev
> mobile: +39 349 8513251

Re: SolrCloud 4.8 - Transaction log size over 1GB

Posted by Vincenzo D'Amore <v....@gmail.com>.
Hi Erick, thanks for your support.

Reading the post I realised that my scenario does not apply the autoCommit
configuration, now we don't have autoCommit in our solrconfig.xml.

We need docs are searchable only after the indexing process, and all the
documents are committed only at end of index process.

Now I don't understand why tlog files are so big, given that we have an
hard commit at end of every indexing.




On Sun, May 24, 2015 at 5:49 PM, Erick Erickson <er...@gmail.com>
wrote:

> Vincenzo:
>
> Here's perhaps more than you want to know about hard commits, soft
> commits and transaction logs:
>
>
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Best,
> Erick
>
> On Sun, May 24, 2015 at 12:04 AM, Vincenzo D'Amore <v....@gmail.com>
> wrote:
> > Thanks Shawn for your prompt support.
> >
> > Best regards,
> > Vincenzo
> >
> > On Sun, May 24, 2015 at 6:45 AM, Shawn Heisey <ap...@elyograg.org>
> wrote:
> >
> >> On 5/23/2015 9:41 PM, Vincenzo D'Amore wrote:
> >> > Thanks Shawn,
> >> >
> >> > may be this is a silly question, but I looked around and didn't find
> an
> >> > answer...
> >> > Well, could I update solrconfig.xml for the collection while the
> >> instances
> >> > are running or should I restart the cluster/reload the cores?
> >>
> >> You can upload a new config to zookeeper with the zkcli program while
> >> Solr is running, and nothing will change, at least not immediately.  The
> >> new config will take effect when you reload the collection or restart
> >> all the Solr instances.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>



-- 
Vincenzo D'Amore
email: v.damore@gmail.com
skype: free.dev
mobile: +39 349 8513251

Re: SolrCloud 4.8 - Transaction log size over 1GB

Posted by Erick Erickson <er...@gmail.com>.
Vincenzo:

Here's perhaps more than you want to know about hard commits, soft
commits and transaction logs:

http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Sun, May 24, 2015 at 12:04 AM, Vincenzo D'Amore <v....@gmail.com> wrote:
> Thanks Shawn for your prompt support.
>
> Best regards,
> Vincenzo
>
> On Sun, May 24, 2015 at 6:45 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 5/23/2015 9:41 PM, Vincenzo D'Amore wrote:
>> > Thanks Shawn,
>> >
>> > may be this is a silly question, but I looked around and didn't find an
>> > answer...
>> > Well, could I update solrconfig.xml for the collection while the
>> instances
>> > are running or should I restart the cluster/reload the cores?
>>
>> You can upload a new config to zookeeper with the zkcli program while
>> Solr is running, and nothing will change, at least not immediately.  The
>> new config will take effect when you reload the collection or restart
>> all the Solr instances.
>>
>> Thanks,
>> Shawn
>>
>>

Re: SolrCloud 4.8 - Transaction log size over 1GB

Posted by Vincenzo D'Amore <v....@gmail.com>.
Thanks Shawn for your prompt support.

Best regards,
Vincenzo

On Sun, May 24, 2015 at 6:45 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 5/23/2015 9:41 PM, Vincenzo D'Amore wrote:
> > Thanks Shawn,
> >
> > may be this is a silly question, but I looked around and didn't find an
> > answer...
> > Well, could I update solrconfig.xml for the collection while the
> instances
> > are running or should I restart the cluster/reload the cores?
>
> You can upload a new config to zookeeper with the zkcli program while
> Solr is running, and nothing will change, at least not immediately.  The
> new config will take effect when you reload the collection or restart
> all the Solr instances.
>
> Thanks,
> Shawn
>
>

Re: SolrCloud 4.8 - Transaction log size over 1GB

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/23/2015 9:41 PM, Vincenzo D'Amore wrote:
> Thanks Shawn,
> 
> may be this is a silly question, but I looked around and didn't find an
> answer...
> Well, could I update solrconfig.xml for the collection while the instances
> are running or should I restart the cluster/reload the cores?

You can upload a new config to zookeeper with the zkcli program while
Solr is running, and nothing will change, at least not immediately.  The
new config will take effect when you reload the collection or restart
all the Solr instances.

Thanks,
Shawn


Re: SolrCloud 4.8 - Transaction log size over 1GB

Posted by Vincenzo D'Amore <v....@gmail.com>.
Thanks Shawn,

may be this is a silly question, but I looked around and didn't find an
answer...
Well, could I update solrconfig.xml for the collection while the instances
are running or should I restart the cluster/reload the cores?

On Sun, May 24, 2015 at 5:07 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 5/23/2015 8:56 PM, Vincenzo D'Amore wrote:
> > looking at tlog size I see there are many collection that have keep more
> > than 1GB of space.
> > Tlog are growing and the code that adds new documents never does an hard
> > commit.
> >
> > The question is must I fix the code that update the collections or can I
> do
> > an hard commit externally using collection api or via admin console?
>
> I strongly recommend that you configure autoCommit in your
> solrconfig.xml with openSearcher set to false.
>
> I will usually recommend an interval of 300000 (five minutes) but others
> recommend an interval of 15000 (15 seconds).  This kind of autoCommit
> generally does happen very quickly, but my philosophy is to keep the
> impact of any commit as low as possible ... which means doing them as
> infrequently as possible.
>
> The tradeoff on the autoCommit interval is the size of each individual
> transaction log.  If you are indexing documents very quickly, you
> probably want a shorter interval.
>
> The example autoCommit config that you can find on the following wiki
> page also has maxDocs ... it's up to you whether you include that part
> of the config.
>
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup_due_to_the_transaction_log
>
> Thanks,
> Shawn
>
>

Re: SolrCloud 4.8 - Transaction log size over 1GB

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/23/2015 8:56 PM, Vincenzo D'Amore wrote:
> looking at tlog size I see there are many collection that have keep more
> than 1GB of space.
> Tlog are growing and the code that adds new documents never does an hard
> commit.
> 
> The question is must I fix the code that update the collections or can I do
> an hard commit externally using collection api or via admin console?

I strongly recommend that you configure autoCommit in your
solrconfig.xml with openSearcher set to false.

I will usually recommend an interval of 300000 (five minutes) but others
recommend an interval of 15000 (15 seconds).  This kind of autoCommit
generally does happen very quickly, but my philosophy is to keep the
impact of any commit as low as possible ... which means doing them as
infrequently as possible.

The tradeoff on the autoCommit interval is the size of each individual
transaction log.  If you are indexing documents very quickly, you
probably want a shorter interval.

The example autoCommit config that you can find on the following wiki
page also has maxDocs ... it's up to you whether you include that part
of the config.

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup_due_to_the_transaction_log

Thanks,
Shawn