You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@batchee.apache.org by "Romain Manni-Bucau (Jira)" <ji...@apache.org> on 2020/03/11 16:18:00 UTC

[jira] [Resolved] (BATCHEE-138) Failure in Read-Process-Write Loop: Somehow one of the metrics was zero

     [ https://issues.apache.org/jira/browse/BATCHEE-138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Romain Manni-Bucau resolved BATCHEE-138.
----------------------------------------
    Fix Version/s: 0.6
       Resolution: Fixed

Thanks a lot for the report and analyzis Christian, should be good on master.

 

Side note: it had another vicious effect, the step execution was in state FAILED instead of STOPPED.

> Failure in Read-Process-Write Loop: Somehow one of the metrics was zero
> -----------------------------------------------------------------------
>
>                 Key: BATCHEE-138
>                 URL: https://issues.apache.org/jira/browse/BATCHEE-138
>             Project: BatchEE
>          Issue Type: Bug
>          Components: jbatch-core
>    Affects Versions: 0.5-incubating
>            Reporter: Christian Berger
>            Assignee: Romain Manni-Bucau
>            Priority: Minor
>             Fix For: 0.6
>
>
> Hi BatchEE-Team,
> I'm not able to stop a chunk processing step via JobOperator.stop() properly. I always get an exception that some metrics are wrong:
> {code:java}
> 2020-03-10 15:42:34.668 CET ; [5397] ; [AN72] ; [SEVERE] ; an72.org.apache.batchee.container.impl.controller.chunk.ChunkStepController ; Failure in Read-Process-Write Loop
> java.lang.IllegalStateException: Somehow one of the metrics was zero. Read count: 56, Filter count: -1, Write count: 57
>  at org.apache.batchee.container.impl.controller.chunk.ChunkStepController.updateNormalMetrics(ChunkStepController.java:650)
>  at org.apache.batchee.container.impl.controller.chunk.ChunkStepController.invokeChunk(ChunkStepController.java:613)
>  at org.apache.batchee.container.impl.controller.chunk.ChunkStepController.invokeCoreStep(ChunkStepController.java:732)
>  at org.apache.batchee.container.impl.controller.BaseStepController.execute(BaseStepController.java:157)
>  at org.apache.batchee.container.impl.controller.ExecutionTransitioner.doExecutionLoop(ExecutionTransitioner.java:106)
>  at org.apache.batchee.container.impl.controller.JobThreadRootController.originateExecutionOnThread(JobThreadRootController.java:110)
>  at org.apache.batchee.container.util.BatchWorkUnit.run(BatchWorkUnit.java:62){code}
> After investigating the code, I've found the root cause. Anyway, let's understand the request flow first:
> Related class: ChunkStepController
> At the beginning, the invokeChunk method is being called which then calls the readAndProcess method  ...
> {code:java}
> private void invokeChunk() {
> ....
>  while (true) {
> ..
> final List<Object> chunkToWrite = readAndProcess();
> ...
> }
> {code}
>  
> This code increments the read counter at the very beginning although it might be possible that no further read items are available. 
>  
> {code:java}
> private List<Object> readAndProcess() {
>  List<Object> chunkToWrite = new ArrayList<Object>();
>  Object itemRead;
>  Object itemProcessed;
>  int readProcessedCount = 0;
> while (true) {
>  currentItemStatus = new SingleItemStatus();
>  currentChunkStatus.incrementItemsTouchedInCurrentChunk(); //inc read counter
>  itemRead = readItem(); // now we call the readItem method
> ..
> }
> }{code}
> Let's assume no items are left. Then the readItem method will mark the currentChunkStatus as finished:
>  
> {code:java}
> t
> private Object readItem() {
>  Object itemRead = null;
>  ...
> itemRead = readerProxy.readItem(); // this returns null in our case
> ...
> // itemRead == null means we reached the end of
>  // the readerProxy "resultset"
>  currentChunkStatus.setFinished(itemRead == null); // so we mark the chunk as finished
> {code}
>  
> Thus we will return to invokeChunk() where we call updateNormalMetrics(chunkToWrite.size()); after some time:
> Here we decrease the read counter again, when the chunk is marked as finished. Obviously, that's because we increased it before but we did not ready an item anymore:
>  
> {code:java}
> private void updateNormalMetrics(int writeCount) {
>  int readCount = currentChunkStatus.getItemsTouchedInCurrentChunk();
>  if (currentChunkStatus.isFinished()) {
>  readCount--;
>  }
>  int filterCount = readCount - writeCount;
> if (readCount < 0 || filterCount < 0 || writeCount < 0) {
>  throw new IllegalStateException("Somehow one of the metrics was zero. Read count: " + readCount +
>  ", Filter count: " + filterCount + ", Write count: " + writeCount);
>  }
> {code}
>  
> So far so good.
> Now let's assume the chunk step processing does not complete properly but it gets stopped via Job-Operator invocation. In that case the readAndProcess method marks the chunk as finished as well:
>  
> {code:java}
> private List<Object> readAndProcess() { 
> ...
> while (true) {
>  currentItemStatus = new SingleItemStatus();
>  currentChunkStatus.incrementItemsTouchedInCurrentChunk();
>  itemRead = readItem();
> ...
> // write buffer size reached
>  // This will force the current item to finish processing on a stop request
>  if (stepContext.getBatchStatus().equals(BatchStatus.STOPPING)) {
>  currentChunkStatus.setFinished(true);
>  }
> ...
> // last record in readerProxy reached 
> if (currentChunkStatus.isFinished()) { //it's not the last record but we stopped the processing
> break; 
> }
> {code}
>  
> Now we return back to invokeChunk() where we call updateNormalMetrics(chunkToWrite.size()) again. However, this time - it's not the last record - the decrease operation makes no sense and we fail with the given exception:
>  
> {code:java}
> private void updateNormalMetrics(int writeCount) {
>  int readCount = currentChunkStatus.getItemsTouchedInCurrentChunk();
>  if (currentChunkStatus.isFinished()) {
>  readCount--;
>  }
>  int filterCount = readCount - writeCount;
> if (readCount < 0 || filterCount < 0 || writeCount < 0) {
>  throw new IllegalStateException("Somehow one of the metrics was zero. Read count: " + readCount +
>  ", Filter count: " + filterCount + ", Write count: " + writeCount);
>  }
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)