You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Bharat Viswanadham (Jira)" <ji...@apache.org> on 2020/11/13 21:09:00 UTC
[jira] [Updated] (RATIS-1156) Segmented RaftLogWorker does not shutdown after task failure

     [ https://issues.apache.org/jira/browse/RATIS-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bharat Viswanadham updated RATIS-1156:
--------------------------------------
    Description: 
task.execute() failed, we store the exception in logIOException, we notify StateMachine but does not shut down server, it will pick next task and fail the task exceptionally and notify statemachine.


This Jira is to discuss do we need to bring the old behavior of shutting down the server.

{code:java}
try {
        Task task = queue.poll(ONE_SECOND);
        if (task != null) {
          task.stopTimerOnDequeue();
          try {
            if (logIOException != null) {
              throw logIOException;
            } else {
              Timer.Context executionTimeContext =
                  raftLogMetrics.getRaftLogTaskExecutionTimer(task.getClass().getSimpleName().toLowerCase()).time();
              task.execute();
              executionTimeContext.stop();
            }
          } catch (IOException e) {
            if (task.getEndIndex() < lastWrittenIndex) {
              LOG.info("Ignore IOException when handling task " + task
                  + " which is smaller than the lastWrittenIndex."
                  + " There should be a snapshot installed.", e);
            } else {
              task.failed(e);
              if (logIOException == null) {
                logIOException = new RaftLogIOException("Log already failed"
                    + " at index " + task.getEndIndex()
                    + " for task " + task, e);
              }
              continue;
            }
          }
          task.done();
        }
{code}

cc [~arp] [~hanishakoneru] [~msingh] [~szetszwo]


  was:
task.execute() failed, we store the exception in logIOException, we notify StateMachine but does not shut down server, it will pick next task and fail the task exceptionally.


This Jira is to discuss do we need to bring the old behavior of shutting down the server.

{code:java}
try {
        Task task = queue.poll(ONE_SECOND);
        if (task != null) {
          task.stopTimerOnDequeue();
          try {
            if (logIOException != null) {
              throw logIOException;
            } else {
              Timer.Context executionTimeContext =
                  raftLogMetrics.getRaftLogTaskExecutionTimer(task.getClass().getSimpleName().toLowerCase()).time();
              task.execute();
              executionTimeContext.stop();
            }
          } catch (IOException e) {
            if (task.getEndIndex() < lastWrittenIndex) {
              LOG.info("Ignore IOException when handling task " + task
                  + " which is smaller than the lastWrittenIndex."
                  + " There should be a snapshot installed.", e);
            } else {
              task.failed(e);
              if (logIOException == null) {
                logIOException = new RaftLogIOException("Log already failed"
                    + " at index " + task.getEndIndex()
                    + " for task " + task, e);
              }
              continue;
            }
          }
          task.done();
        }
{code}

cc [~arp] [~hanishakoneru] [~msingh] [~szetszwo]



> Segmented RaftLogWorker does not shutdown after task failure
> ------------------------------------------------------------
>
>                 Key: RATIS-1156
>                 URL: https://issues.apache.org/jira/browse/RATIS-1156
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: Bharat Viswanadham
>            Priority: Major
>
> task.execute() failed, we store the exception in logIOException, we notify StateMachine but does not shut down server, it will pick next task and fail the task exceptionally and notify statemachine.
> This Jira is to discuss do we need to bring the old behavior of shutting down the server.
> {code:java}
> try {
>         Task task = queue.poll(ONE_SECOND);
>         if (task != null) {
>           task.stopTimerOnDequeue();
>           try {
>             if (logIOException != null) {
>               throw logIOException;
>             } else {
>               Timer.Context executionTimeContext =
>                   raftLogMetrics.getRaftLogTaskExecutionTimer(task.getClass().getSimpleName().toLowerCase()).time();
>               task.execute();
>               executionTimeContext.stop();
>             }
>           } catch (IOException e) {
>             if (task.getEndIndex() < lastWrittenIndex) {
>               LOG.info("Ignore IOException when handling task " + task
>                   + " which is smaller than the lastWrittenIndex."
>                   + " There should be a snapshot installed.", e);
>             } else {
>               task.failed(e);
>               if (logIOException == null) {
>                 logIOException = new RaftLogIOException("Log already failed"
>                     + " at index " + task.getEndIndex()
>                     + " for task " + task, e);
>               }
>               continue;
>             }
>           }
>           task.done();
>         }
> {code}
> cc [~arp] [~hanishakoneru] [~msingh] [~szetszwo]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)