You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "vidit.asthana" <vi...@gmail.com> on 2015/03/15 14:05:01 UTC

Solr tlog and soft commit

I want to know what all thing gets written to index from tlog directory
whenever a soft commit is issued. 

I have a test SolrCloud setup and I can see that even if I disable the
hardcommit, and if I only issue soft commits, then also index directory
keeps increasing little by little, so I am presuming that something gets
written to it. 

When I issue a hard commit then index directory size grows drastically - as
expected.

I have read this awesome post -
http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
but this doesn't explain the above mentioned behavior.

Thanks in advance.

Vidit



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-tlog-and-soft-commit-tp4193105.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr tlog and soft commit

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
When you issue an atomic update request, Solr needs to lookup the stored
fields of the last document that you have added, process it according to
your atomic update and then index that document again (replacing the old
one).

If you send an atomic update request before the old document was committed
to the index then it has to be looked up from the transaction logs.

On Tue, Mar 17, 2015 at 12:47 PM, vidit.asthana <vi...@gmail.com>
wrote:

> Should I open a JIRA, in case there is no explanation of why all of a
> sudden
> transaction logs start piling up for some shard/replica?
>
> I have provided very detailed explanation in a different thread:
>
> http://lucene.472066.n3.nabble.com/Transaction-logs-not-getting-deleted-td4184635.html
>
> Also can someone explain in what way tlogs gets involved while sending a
> atomic update request?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-tlog-and-soft-commit-tp4193105p4193559.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr tlog and soft commit

Posted by "vidit.asthana" <vi...@gmail.com>.
Should I open a JIRA, in case there is no explanation of why all of a sudden
transaction logs start piling up for some shard/replica?

I have provided very detailed explanation in a different thread: 
http://lucene.472066.n3.nabble.com/Transaction-logs-not-getting-deleted-td4184635.html

Also can someone explain in what way tlogs gets involved while sending a
atomic update request?



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-tlog-and-soft-commit-tp4193105p4193559.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr tlog and soft commit

Posted by "vidit.asthana" <vi...@gmail.com>.
Can someone please reply to these questions? 

Thanks in advance.




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-tlog-and-soft-commit-tp4193105p4193311.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr tlog and soft commit

Posted by "vidit.asthana" <vi...@gmail.com>.
Its for both. I am facing some problem, and I want to get to the root of it
by understanding what happens when we issue an update.

The problem I am facing is that sometimes, old transaction logs are not
getting deleted for my solr cloud setup for one or two replicas, no matter
how many times I do a hard commit. They just keep piling up(I have seen upto
30GB). So I am issuing a hard commit and then deleting then manually. I want
to ensure that this doesn't cause any data loss. My hard commit duration is
set in a way(based on indexing rate) that the tlog should never grow beyond
500MB-600MB.

Why might be the reason that very old transaction log doesn't get deleted.
They only get rolled up in case of hard commit. This happens very randomly,
but once it happens for a replica, it keeps on happening for the same
replica again and again. Other replica's transaction log get deleted fine on
hard commit.

Another question:  What role does tlog play in case of atomic updates? Are
they scanned if I do an atomic update? In case my tlog grow very huge, then
will it effect indexing performance with atomic updates?



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-tlog-and-soft-commit-tp4193105p4193129.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr tlog and soft commit

Posted by Erick Erickson <er...@gmail.com>.
I have to ask what problem you're trying to solve, or is this just for
background? Your understanding looks fine to me.

Erick

On Sun, Mar 15, 2015 at 9:44 AM, vidit.asthana <vi...@gmail.com> wrote:
> Thanks Eric. Its super helpful!
>
> So here's my understanding so far:
>
> 1. On update, write the doc to tlog(which will be used only for recovery)
> 2. As soon as the docs size becomes greater than ramBufferSize, flush it to
> the latest segment inside the index directory.
> 3. Upto this point, even though the index directory size will grow, but the
> docs are neither searchable not durable.
> 4. As soon as I issue a soft commit, the documents inside the latest segment
> becomes searchable, but they are still not merged with main index. Hence
> they are still not durable.
> 5. At this point if a node goes down, the docs will still needs to be
> replayed from the tlog, before they are searchable. If someone deletes tlogs
> before starting that node, then the documents are lost. They can't be
> recovered from unclosed segments inside index directory.
> 6. As soon as I issue a hard commit, the docs becomes durable.
>
> So that means if I increase maxRamBufferSize, then there will be less file
> system access, but more load on memory.
>
> Please let me know if I get it right so far before I ask further questions?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-tlog-and-soft-commit-tp4193105p4193126.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr tlog and soft commit

Posted by "vidit.asthana" <vi...@gmail.com>.
Thanks Eric. Its super helpful!

So here's my understanding so far:

1. On update, write the doc to tlog(which will be used only for recovery)
2. As soon as the docs size becomes greater than ramBufferSize, flush it to
the latest segment inside the index directory.
3. Upto this point, even though the index directory size will grow, but the
docs are neither searchable not durable.
4. As soon as I issue a soft commit, the documents inside the latest segment
becomes searchable, but they are still not merged with main index. Hence
they are still not durable.
5. At this point if a node goes down, the docs will still needs to be
replayed from the tlog, before they are searchable. If someone deletes tlogs
before starting that node, then the documents are lost. They can't be
recovered from unclosed segments inside index directory.
6. As soon as I issue a hard commit, the docs becomes durable.

So that means if I increase maxRamBufferSize, then there will be less file
system access, but more load on memory.

Please let me know if I get it right so far before I ask further questions?



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-tlog-and-soft-commit-tp4193105p4193126.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr tlog and soft commit

Posted by Yonik Seeley <ys...@gmail.com>.
On Sun, Mar 15, 2015 at 12:09 PM, Erick Erickson
<er...@gmail.com> wrote:
> 1> Well, probably not. Hate to be confusing here, but if your ramBufferSizeMB
> setting is exceeded, then internal buffers will be flushed to the
> currently open segment in the
> index directory.

It's even more confusing though...
if you do a few adds and then do a soft commit, a new small segment
will be created and flushed to the Directory, but not fsync'd.  But by
default, the directory we use is NRTCachingDirectory which caches
small segments in memory, so those small segments won't even get
written to disk until a hard commit forces them out of the cache.

-Yonik


> You still won't be able to search it since no commits
> happened. You
> really have little control over when this happens.
>
> And, to make it more confusing still, if your process abnormally terminates,
> these docs _still_ won't be searchable when the node comes back up until they're
> replayed from the transaction log. Since the segment was never closed, the docs
> are invisible. But since they were in the tlog, they'll be recovered. Unclosed
> segment files will be cleaned up though.
>
> So usually you're right, an update won't change anything in the index directory.
> Except sometimes ;).
>
> The net-net here is that if you're NOT issuing any commits for a long
> time, you'll
> see the tlog grow pretty steadily, _and_ upon occasion you'll see step-wise
> jumps in the size of the index directory.
>
> 2> Nothing. This is just making stuff in the not-yet-committed state available
> for searching, all in memory.
>
> 3> Not quite sure what you're asking here. The doc will be in memory
> and the tlog,
> optionally it may have been flushed to the current index segment
> (although still not
> searchable).
>
>
> Best,
> Erick
>
> On Sun, Mar 15, 2015 at 7:11 AM, vidit.asthana <vi...@gmail.com> wrote:
>> Thanks for reply Yonik. I am very new to solrcloud and trying to understand
>> how the update requests are handled and what exactly happens at file system
>> level.
>>
>> 1. So lets say I send an update request, and I don't issue any type of
>> commit(neither hard nor soft), so will the document ever touch index
>> directory? From the blog, I understand that it gets written to tlog
>> directory.
>>
>> 2. Now if I issue a soft commit, then what will happen inside the index
>> directory?
>>
>> 3. By the time I don't issue a soft commit, where will that document
>> reside(completely in memory)?
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/Solr-tlog-and-soft-commit-tp4193105p4193109.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr tlog and soft commit

Posted by Erick Erickson <er...@gmail.com>.
1> Well, probably not. Hate to be confusing here, but if your ramBufferSizeMB
setting is exceeded, then internal buffers will be flushed to the
currently open segment in the
index directory. You still won't be able to search it since no commits
happened. You
really have little control over when this happens.

And, to make it more confusing still, if your process abnormally terminates,
these docs _still_ won't be searchable when the node comes back up until they're
replayed from the transaction log. Since the segment was never closed, the docs
are invisible. But since they were in the tlog, they'll be recovered. Unclosed
segment files will be cleaned up though.

So usually you're right, an update won't change anything in the index directory.
Except sometimes ;).

The net-net here is that if you're NOT issuing any commits for a long
time, you'll
see the tlog grow pretty steadily, _and_ upon occasion you'll see step-wise
jumps in the size of the index directory.

2> Nothing. This is just making stuff in the not-yet-committed state available
for searching, all in memory.

3> Not quite sure what you're asking here. The doc will be in memory
and the tlog,
optionally it may have been flushed to the current index segment
(although still not
searchable).


Best,
Erick

On Sun, Mar 15, 2015 at 7:11 AM, vidit.asthana <vi...@gmail.com> wrote:
> Thanks for reply Yonik. I am very new to solrcloud and trying to understand
> how the update requests are handled and what exactly happens at file system
> level.
>
> 1. So lets say I send an update request, and I don't issue any type of
> commit(neither hard nor soft), so will the document ever touch index
> directory? From the blog, I understand that it gets written to tlog
> directory.
>
> 2. Now if I issue a soft commit, then what will happen inside the index
> directory?
>
> 3. By the time I don't issue a soft commit, where will that document
> reside(completely in memory)?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-tlog-and-soft-commit-tp4193105p4193109.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr tlog and soft commit

Posted by "vidit.asthana" <vi...@gmail.com>.
Thanks for reply Yonik. I am very new to solrcloud and trying to understand
how the update requests are handled and what exactly happens at file system
level.

1. So lets say I send an update request, and I don't issue any type of
commit(neither hard nor soft), so will the document ever touch index
directory? From the blog, I understand that it gets written to tlog
directory.

2. Now if I issue a soft commit, then what will happen inside the index
directory? 

3. By the time I don't issue a soft commit, where will that document
reside(completely in memory)?



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-tlog-and-soft-commit-tp4193105p4193109.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr tlog and soft commit

Posted by Yonik Seeley <ys...@gmail.com>.
Your basic assumptions about the underlying mechanisms are incorrect.
The size of the index has nothing to do with the transaction logs...
and transaction logs are never "written to index" except in recovery.
You would see the same index size behavior w/o transaction logs, and
it has to do with some data being cached in memory on soft commits but
always being flushed to disk on hard commits.

-Yonik


On Sun, Mar 15, 2015 at 9:05 AM, vidit.asthana <vi...@gmail.com> wrote:
> I want to know what all thing gets written to index from tlog directory
> whenever a soft commit is issued.
>
> I have a test SolrCloud setup and I can see that even if I disable the
> hardcommit, and if I only issue soft commits, then also index directory
> keeps increasing little by little, so I am presuming that something gets
> written to it.
>
> When I issue a hard commit then index directory size grows drastically - as
> expected.
>
> I have read this awesome post -
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> but this doesn't explain the above mentioned behavior.
>
> Thanks in advance.
>
> Vidit
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-tlog-and-soft-commit-tp4193105.html
> Sent from the Solr - User mailing list archive at Nabble.com.