You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Vikas Saurabh <vi...@gmail.com> on 2015/07/01 15:05:48 UTC

[Discuss] Should DocNodeStore notify underlying stores about completion of backgroundRead

Hi,

Let me give a bit of background first. I am looking at OAK-2106[0] -
using replication status to have a stronger bound of replication lag
of secondaries and hence be more informed while providing
ReadPreference[1].

---- Background ----
While assessing if secondaries would be useful to read a document, the
current logic looks for the parent document from cache - hence, if
found, the parent status is up-to-date as far as last backgroundRead
is done. Then, the parent is checked if it was modified within
replication lag window.
With the assumption that backgroundRead would have happened within
last 6 hours, the current hard-code value of replication lag suffices.
With replication lag being pulled from replication status, we'd need
to use min(lastBackgroundReadTime, currTime-maxReplicationLag).

There's an issue (still unresolved) for Mongo Java driver (JAVA-805)
which is about observing such status. Until, that is resolved, we'd
need to poll for replication lag using replSetGetStatus [3] command.

Since, there's a fairly tight coupling between usefulness of how
recent is the value replication lag and how frequently are we able to
do backgroundReads, it feels that we can tie up polling for
replication lag along with backgroundReads.
---- /Background ----

While replication lag calculation is entirely a domain of
MongoDocumentStore, but I'd not want to have a new thread to just keep
polling for replication lag. Otoh, currently, there's no way for
MongoDocStore to know when backgroundReads happened.

It'd great if DocumentNodeStore can notify the underlying store with
something like 'backgroundReadDone'. It can remain no-op for all the
stores and MongoDocStore can utilize this as a way to fetch
replication status.

Does it make sense to have such a mechanism?

Thanks,
Vikas

[0]: https://issues.apache.org/jira/browse/OAK-2106
[1]: http://docs.mongodb.org/manual/core/read-preference/
[2]: https://jira.mongodb.org/browse/JAVA-805
[3]: http://docs.mongodb.org/manual/reference/command/replSetGetStatus/#dbcmd.replSetGetStatus

Re: [Discuss] Should DocNodeStore notify underlying stores about completion of backgroundRead

Posted by Vikas Saurabh <vi...@gmail.com>.
Hi,

> I would rather not introduce the concept of the background read
> to the DocumentStore. we already have the invalidation logic
> and the 'maxCacheAge' in the DocumentStore.find() method. I
> think we should rather revisit this mechanism and make sure it
> works well with the requirements we have regarding cached reads.
> a related issue was recently created by Julian, on the freshness
> of DocumentStore.query(): OAK-3037.
>
> in my view, there is a more urgent need to clarify the current
> DocumentStore API regarding consistency guarantees and think
> about consistency requirements we have in the upper layer. the
> DocumentNodeStore implementation should read with relaxed
> consistency whenever possible and also be able to communicate this
> to the DocumentStore implementation.
>

If I understand correctly, the DocumentStore should try to keep up
with storage (mongo, rdb, etc) as much as possible. And then we'd want
DocumentNodeStore to query it with parameters guiding how recent
results are expected (like the maxCacheAge in find method). Yup,
that'd make sense.

So, I'd then go ahead with inside MongoDocumentStore to track
replication lag in the replica set. I guess it'd make sense to expose
the polling interval as configuration parameter.

Thanks for the inputs.

--Vikas

Re: [Discuss] Should DocNodeStore notify underlying stores about completion of backgroundRead

Posted by Marcel Reutegger <mr...@adobe.com>.
Hi,

I would rather not introduce the concept of the background read
to the DocumentStore. we already have the invalidation logic
and the 'maxCacheAge' in the DocumentStore.find() method. I
think we should rather revisit this mechanism and make sure it
works well with the requirements we have regarding cached reads.
a related issue was recently created by Julian, on the freshness
of DocumentStore.query(): OAK-3037.

in my view, there is a more urgent need to clarify the current
DocumentStore API regarding consistency guarantees and think
about consistency requirements we have in the upper layer. the
DocumentNodeStore implementation should read with relaxed
consistency whenever possible and also be able to communicate this
to the DocumentStore implementation.

Regards
 Marcel

On 01/07/15 15:05, "Vikas Saurabh" wrote:

>Hi,
>
>Let me give a bit of background first. I am looking at OAK-2106[0] -
>using replication status to have a stronger bound of replication lag
>of secondaries and hence be more informed while providing
>ReadPreference[1].
>
>---- Background ----
>While assessing if secondaries would be useful to read a document, the
>current logic looks for the parent document from cache - hence, if
>found, the parent status is up-to-date as far as last backgroundRead
>is done. Then, the parent is checked if it was modified within
>replication lag window.
>With the assumption that backgroundRead would have happened within
>last 6 hours, the current hard-code value of replication lag suffices.
>With replication lag being pulled from replication status, we'd need
>to use min(lastBackgroundReadTime, currTime-maxReplicationLag).
>
>There's an issue (still unresolved) for Mongo Java driver (JAVA-805)
>which is about observing such status. Until, that is resolved, we'd
>need to poll for replication lag using replSetGetStatus [3] command.
>
>Since, there's a fairly tight coupling between usefulness of how
>recent is the value replication lag and how frequently are we able to
>do backgroundReads, it feels that we can tie up polling for
>replication lag along with backgroundReads.
>---- /Background ----
>
>While replication lag calculation is entirely a domain of
>MongoDocumentStore, but I'd not want to have a new thread to just keep
>polling for replication lag. Otoh, currently, there's no way for
>MongoDocStore to know when backgroundReads happened.
>
>It'd great if DocumentNodeStore can notify the underlying store with
>something like 'backgroundReadDone'. It can remain no-op for all the
>stores and MongoDocStore can utilize this as a way to fetch
>replication status.
>
>Does it make sense to have such a mechanism?
>
>Thanks,
>Vikas
>
>[0]: https://issues.apache.org/jira/browse/OAK-2106
>[1]: http://docs.mongodb.org/manual/core/read-preference/
>[2]: https://jira.mongodb.org/browse/JAVA-805
>[3]: 
>http://docs.mongodb.org/manual/reference/command/replSetGetStatus/#dbcmd.r
>eplSetGetStatus