You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Alex Parvulescu (JIRA)" <ji...@apache.org> on 2014/08/22 16:22:11 UTC

[jira] [Commented] (OAK-2039) SegmentNodeStore might not create a checkpoint

    [ https://issues.apache.org/jira/browse/OAK-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106887#comment-14106887 ] 

Alex Parvulescu commented on OAK-2039:
--------------------------------------

This started from the warnings issues by the async indexer. Just to clarify, there is no data loss, the async indexer will pick up all the changes on the next run cycle (5 seconds currently).
I've now added some warning logs to the SegmentNodeStore to try to figure out if this is simply a system-load problem (current design is: it will try 5 times to grab the commit lock, then give up) or something more profound.

> SegmentNodeStore might not create a checkpoint
> ----------------------------------------------
>
>                 Key: OAK-2039
>                 URL: https://issues.apache.org/jira/browse/OAK-2039
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segmentmk
>            Reporter: Chetan Mehrotra
>            Priority: Minor
>             Fix For: 1.1, 1.0.5
>
>         Attachments: OAK-2039-alex.patch
>
>
> As per [~edivad] in the {{SegmentNodeStore.checkpoint(long)}} the invocation might return a checkpoint even though it has not been created
> Starting from 
> {code:java|title=AsyncIndexUpdate.java#235}
> // there are some recent changes, so let's create a new checkpoint
> String afterCheckpoint = store.checkpoint(lifetime);
> NodeState after = store.retrieve(afterCheckpoint);
> if (after == null) {
>     log.warn("Unable to retrieve newly created checkpoint {},"
>             + " skipping the {} index update", afterCheckpoint, name);
>     return;
> }
> String checkpointToRelease = afterCheckpoint;
> try {
>     updateIndex(before, beforeCheckpoint, after, afterCheckpoint);
>     // the update succeeded, i.e. it no longer fails
> {code}
> and then
> {code:java|title=SegmentNodeStore.java#205}
> public synchronized String checkpoint(long lifetime) {
>     checkArgument(lifetime > 0);
>     String name = UUID.randomUUID().toString();
>     long now = System.currentTimeMillis();
>     // try 5 times
>     for (int i = 0; i < 5; i++) {
>         if (commitSemaphore.tryAcquire()) { 
>             try {
>                 refreshHead();
>                 SegmentNodeState state = head.get();
>                 SegmentNodeBuilder builder = state.builder();
>                 NodeBuilder checkpoints = builder.child("checkpoints");
>                 for (String n : checkpoints.getChildNodeNames()) {
>                     NodeBuilder cp = checkpoints.getChildNode(n);
>                     PropertyState ts = cp.getProperty("timestamp");
>                     if (ts == null
>                             || ts.getType() != Type.LONG
>                             || now > ts.getValue(Type.LONG)) {
>                         cp.remove();
>                     }
>                 }
>                 NodeBuilder cp = checkpoints.child(name);
>                 cp.setProperty("timestamp",  now + lifetime);
>                 cp.setChildNode(ROOT, state.getChildNode(ROOT));
>                 SegmentNodeState newState = builder.getNodeState();
>                 if (store.setHead(state, newState)) {
>                     refreshHead();
>                     return name;
>                 }
>             } finally {
>                 commitSemaphore.release();
>             }
>         }
>     }
>     return name;
> }
> {code}
> we can see that it always return a checkpoint name even if it fails to create it (as by {{@Nonnull}} contract I would say). But if it fails to acquire lock for 5 times (no sleep in the meanwhile?) it does it silently and thus return a checkpoint which is not valid.
> This might cause indexing to not work properly as it relies on the fact that it can access previous version of content through the returned checkpoint



--
This message was sent by Atlassian JIRA
(v6.2#6252)