You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mark Miller (JIRA)" <ji...@apache.org> on 2017/01/04 13:58:58 UTC
[jira] [Commented] (SOLR-9922) Write buffering updates to another tlog

    [ https://issues.apache.org/jira/browse/SOLR-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798322#comment-15798322 ] 

Mark Miller commented on SOLR-9922:
-----------------------------------

bq. In my opinion, in case of HDFSUpdateLog we should store buffer tlog in local file system, because these files have very short life and small.

-1, When you have a single shared filesystem, you really don't want to start splitting data management across distrib and a local filesystem as much as possible. Buffered data is not necessarily short lived, clusters that start with a high rate of incoming documents and many different other factors can make cases where these files are long lived and can take a lot of space. Much of the appeal of hdfs is managing space for large storage requirements from a single front rather than a single front plus each node.

[~yseeley@gmail.com] should really weigh in on this idea of splitting up the tlog.

> Write buffering updates to another tlog
> ---------------------------------------
>
>                 Key: SOLR-9922
>                 URL: https://issues.apache.org/jira/browse/SOLR-9922
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>         Attachments: SOLR-9922.patch
>
>
> Currently, we write buffering logs to current tlog and not apply that updates to index. Then we rely on replay log to apply that updates to index. But at the same time there are some updates also write to current tlog and applied to the index. 
> For example, during peersync, if new updates come to replica we will end up with this tlog
> tlog : old1, new1, new2, old2, new3, old3
> old updates belong to peersync, and these updates are applied to the index.
> new updates belong to buffering updates, and these updates are not applied to the index.
> But writing all the updates to same current tlog make code base very complex. We should write buffering updates to another tlog file.
> By doing this, it will help our code base simpler. It also makes replica recovery for SOLR-9835 more easier. Because after peersync success we can copy new updates from temporary file to current tlog, for example
> tlog : old1, old2, old3
> temporary tlog : new1, new2, new3
> -->
> tlog : old1, old2, old3, new1, new2, new3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org