You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Timothee Maret (JIRA)" <ji...@apache.org> on 2016/10/29 08:05:58 UTC

[jira] [Updated] (OAK-5034) FileStoreUtil#readSegmentWithRetry max retry delay is too short to be functional

     [ https://issues.apache.org/jira/browse/OAK-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timothee Maret updated OAK-5034:
--------------------------------
    Description: 
The commit {{1765838}} introduced the {{FileStoreUtil#readSegmentWithRetry}} util and reduced the period between two tries (from 2sec to 0.125s) while the total number of tries did not change.

This does not give enough time for the server to find references and segments, thus causing exceptions such as
{code}
29.10.2016 05:07:37.242 *ERROR* [sling-default-2-Registered Service.605] org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSync Failed synchronizing state.
java.lang.IllegalStateException: Unable to read references of segment 5168c878-3a3f-49d0-aea9-b8b57d5d867f from primary
        at org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.readReferences(StandbyClientSyncExecution.java:196)
        at org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.copySegmentHierarchyFromPrimary(StandbyClientSyncExecution.java:130)
        at org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.compareAgainstBaseState(StandbyClientSyncExecution.java:94)
        at org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.execute(StandbyClientSyncExecution.java:74)
        at org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSync.run(StandbyClientSync.java:143)
        at org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:118)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}
and causing the client to throw exceptions, ultimately causing IT tests to fail.

IIUC, the minimum period to retry should be bigger than a TarMK flush cycle (5 sec).

  was:
The commit {{1765838}} introduced the {{FileStoreUtil#readSegmentWithRetry}} util and reduced the period between two tries (from 2sec to 0.125s) while the total number of tries did not change.

This does not give enough time for the server to find references and segments, thus causing exceptions such as
{code}
29.10.2016 05:07:37.242 *ERROR* [sling-default-2-Registered Service.605] org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSync Failed synchronizing state.
java.lang.IllegalStateException: Unable to read references of segment 5168c878-3a3f-49d0-aea9-b8b57d5d867f from primary
        at org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.readReferences(StandbyClientSyncExecution.java:196)
        at org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.copySegmentHierarchyFromPrimary(StandbyClientSyncExecution.java:130)
        at org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.compareAgainstBaseState(StandbyClientSyncExecution.java:94)
        at org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.execute(StandbyClientSyncExecution.java:74)
        at org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSync.run(StandbyClientSync.java:143)
        at org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:118)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}
and causing the client to throw exceptions, ultimately causing IT tests to fail.


> FileStoreUtil#readSegmentWithRetry max retry delay is too short to be functional
> --------------------------------------------------------------------------------
>
>                 Key: OAK-5034
>                 URL: https://issues.apache.org/jira/browse/OAK-5034
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>    Affects Versions: Segment Tar 0.0.16
>            Reporter: Timothee Maret
>            Assignee: Timothee Maret
>
> The commit {{1765838}} introduced the {{FileStoreUtil#readSegmentWithRetry}} util and reduced the period between two tries (from 2sec to 0.125s) while the total number of tries did not change.
> This does not give enough time for the server to find references and segments, thus causing exceptions such as
> {code}
> 29.10.2016 05:07:37.242 *ERROR* [sling-default-2-Registered Service.605] org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSync Failed synchronizing state.
> java.lang.IllegalStateException: Unable to read references of segment 5168c878-3a3f-49d0-aea9-b8b57d5d867f from primary
>         at org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.readReferences(StandbyClientSyncExecution.java:196)
>         at org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.copySegmentHierarchyFromPrimary(StandbyClientSyncExecution.java:130)
>         at org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.compareAgainstBaseState(StandbyClientSyncExecution.java:94)
>         at org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSyncExecution.execute(StandbyClientSyncExecution.java:74)
>         at org.apache.jackrabbit.oak.segment.standby.client.StandbyClientSync.run(StandbyClientSync.java:143)
>         at org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:118)
>         at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> and causing the client to throw exceptions, ultimately causing IT tests to fail.
> IIUC, the minimum period to retry should be bigger than a TarMK flush cycle (5 sec).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)