You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2014/08/18 08:31:19 UTC

[jira] [Updated] (OAK-2039) SegmentNodeStore might not create a checkpoint

     [ https://issues.apache.org/jira/browse/OAK-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chetan Mehrotra updated OAK-2039:
---------------------------------

    Attachment: OAK-2039-alex.patch

Attaching a [patch|^OAK-2039-alex.patch] by [~alexparvulescu] which adds debug/warn logging when such an issue occurs

> SegmentNodeStore might not create a checkpoint
> ----------------------------------------------
>
>                 Key: OAK-2039
>                 URL: https://issues.apache.org/jira/browse/OAK-2039
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segmentmk
>            Reporter: Chetan Mehrotra
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: OAK-2039-alex.patch
>
>
> As per [~edivad] in the {{SegmentNodeStore.checkpoint(long)}} the invocation might return a checkpoint even though it has not been created
> Starting from 
> {code:java|title=AsyncIndexUpdate.java#235}
> // there are some recent changes, so let's create a new checkpoint
> String afterCheckpoint = store.checkpoint(lifetime);
> NodeState after = store.retrieve(afterCheckpoint);
> if (after == null) {
>     log.warn("Unable to retrieve newly created checkpoint {},"
>             + " skipping the {} index update", afterCheckpoint, name);
>     return;
> }
> String checkpointToRelease = afterCheckpoint;
> try {
>     updateIndex(before, beforeCheckpoint, after, afterCheckpoint);
>     // the update succeeded, i.e. it no longer fails
> {code}
> and then
> {code:java|title=SegmentNodeStore.java#205}
> public synchronized String checkpoint(long lifetime) {
>     checkArgument(lifetime > 0);
>     String name = UUID.randomUUID().toString();
>     long now = System.currentTimeMillis();
>     // try 5 times
>     for (int i = 0; i < 5; i++) {
>         if (commitSemaphore.tryAcquire()) { 
>             try {
>                 refreshHead();
>                 SegmentNodeState state = head.get();
>                 SegmentNodeBuilder builder = state.builder();
>                 NodeBuilder checkpoints = builder.child("checkpoints");
>                 for (String n : checkpoints.getChildNodeNames()) {
>                     NodeBuilder cp = checkpoints.getChildNode(n);
>                     PropertyState ts = cp.getProperty("timestamp");
>                     if (ts == null
>                             || ts.getType() != Type.LONG
>                             || now > ts.getValue(Type.LONG)) {
>                         cp.remove();
>                     }
>                 }
>                 NodeBuilder cp = checkpoints.child(name);
>                 cp.setProperty("timestamp",  now + lifetime);
>                 cp.setChildNode(ROOT, state.getChildNode(ROOT));
>                 SegmentNodeState newState = builder.getNodeState();
>                 if (store.setHead(state, newState)) {
>                     refreshHead();
>                     return name;
>                 }
>             } finally {
>                 commitSemaphore.release();
>             }
>         }
>     }
>     return name;
> }
> {code}
> we can see that it always return a checkpoint name even if it fails to create it (as by {{@Nonnull}} contract I would say). But if it fails to acquire lock for 5 times (no sleep in the meanwhile?) it does it silently and thus return a checkpoint which is not valid.
> This might cause indexing to not work properly as it relies on the fact that it can access previous version of content through the returned checkpoint



--
This message was sent by Atlassian JIRA
(v6.2#6252)