You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@batchee.apache.org by "Romain Manni-Bucau (Jira)" <ji...@apache.org> on 2020/03/11 16:18:00 UTC
[jira] [Resolved] (BATCHEE-138) Failure in Read-Process-Write Loop:
Somehow one of the metrics was zero
[ https://issues.apache.org/jira/browse/BATCHEE-138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Romain Manni-Bucau resolved BATCHEE-138.
----------------------------------------
Fix Version/s: 0.6
Resolution: Fixed
Thanks a lot for the report and analyzis Christian, should be good on master.
Side note: it had another vicious effect, the step execution was in state FAILED instead of STOPPED.
> Failure in Read-Process-Write Loop: Somehow one of the metrics was zero
> -----------------------------------------------------------------------
>
> Key: BATCHEE-138
> URL: https://issues.apache.org/jira/browse/BATCHEE-138
> Project: BatchEE
> Issue Type: Bug
> Components: jbatch-core
> Affects Versions: 0.5-incubating
> Reporter: Christian Berger
> Assignee: Romain Manni-Bucau
> Priority: Minor
> Fix For: 0.6
>
>
> Hi BatchEE-Team,
> I'm not able to stop a chunk processing step via JobOperator.stop() properly. I always get an exception that some metrics are wrong:
> {code:java}
> 2020-03-10 15:42:34.668 CET ; [5397] ; [AN72] ; [SEVERE] ; an72.org.apache.batchee.container.impl.controller.chunk.ChunkStepController ; Failure in Read-Process-Write Loop
> java.lang.IllegalStateException: Somehow one of the metrics was zero. Read count: 56, Filter count: -1, Write count: 57
> at org.apache.batchee.container.impl.controller.chunk.ChunkStepController.updateNormalMetrics(ChunkStepController.java:650)
> at org.apache.batchee.container.impl.controller.chunk.ChunkStepController.invokeChunk(ChunkStepController.java:613)
> at org.apache.batchee.container.impl.controller.chunk.ChunkStepController.invokeCoreStep(ChunkStepController.java:732)
> at org.apache.batchee.container.impl.controller.BaseStepController.execute(BaseStepController.java:157)
> at org.apache.batchee.container.impl.controller.ExecutionTransitioner.doExecutionLoop(ExecutionTransitioner.java:106)
> at org.apache.batchee.container.impl.controller.JobThreadRootController.originateExecutionOnThread(JobThreadRootController.java:110)
> at org.apache.batchee.container.util.BatchWorkUnit.run(BatchWorkUnit.java:62){code}
> After investigating the code, I've found the root cause. Anyway, let's understand the request flow first:
> Related class: ChunkStepController
> At the beginning, the invokeChunk method is being called which then calls the readAndProcess method ...
> {code:java}
> private void invokeChunk() {
> ....
> while (true) {
> ..
> final List<Object> chunkToWrite = readAndProcess();
> ...
> }
> {code}
>
> This code increments the read counter at the very beginning although it might be possible that no further read items are available.
>
> {code:java}
> private List<Object> readAndProcess() {
> List<Object> chunkToWrite = new ArrayList<Object>();
> Object itemRead;
> Object itemProcessed;
> int readProcessedCount = 0;
> while (true) {
> currentItemStatus = new SingleItemStatus();
> currentChunkStatus.incrementItemsTouchedInCurrentChunk(); //inc read counter
> itemRead = readItem(); // now we call the readItem method
> ..
> }
> }{code}
> Let's assume no items are left. Then the readItem method will mark the currentChunkStatus as finished:
>
> {code:java}
> t
> private Object readItem() {
> Object itemRead = null;
> ...
> itemRead = readerProxy.readItem(); // this returns null in our case
> ...
> // itemRead == null means we reached the end of
> // the readerProxy "resultset"
> currentChunkStatus.setFinished(itemRead == null); // so we mark the chunk as finished
> {code}
>
> Thus we will return to invokeChunk() where we call updateNormalMetrics(chunkToWrite.size()); after some time:
> Here we decrease the read counter again, when the chunk is marked as finished. Obviously, that's because we increased it before but we did not ready an item anymore:
>
> {code:java}
> private void updateNormalMetrics(int writeCount) {
> int readCount = currentChunkStatus.getItemsTouchedInCurrentChunk();
> if (currentChunkStatus.isFinished()) {
> readCount--;
> }
> int filterCount = readCount - writeCount;
> if (readCount < 0 || filterCount < 0 || writeCount < 0) {
> throw new IllegalStateException("Somehow one of the metrics was zero. Read count: " + readCount +
> ", Filter count: " + filterCount + ", Write count: " + writeCount);
> }
> {code}
>
> So far so good.
> Now let's assume the chunk step processing does not complete properly but it gets stopped via Job-Operator invocation. In that case the readAndProcess method marks the chunk as finished as well:
>
> {code:java}
> private List<Object> readAndProcess() {
> ...
> while (true) {
> currentItemStatus = new SingleItemStatus();
> currentChunkStatus.incrementItemsTouchedInCurrentChunk();
> itemRead = readItem();
> ...
> // write buffer size reached
> // This will force the current item to finish processing on a stop request
> if (stepContext.getBatchStatus().equals(BatchStatus.STOPPING)) {
> currentChunkStatus.setFinished(true);
> }
> ...
> // last record in readerProxy reached
> if (currentChunkStatus.isFinished()) { //it's not the last record but we stopped the processing
> break;
> }
> {code}
>
> Now we return back to invokeChunk() where we call updateNormalMetrics(chunkToWrite.size()) again. However, this time - it's not the last record - the decrease operation makes no sense and we fail with the given exception:
>
> {code:java}
> private void updateNormalMetrics(int writeCount) {
> int readCount = currentChunkStatus.getItemsTouchedInCurrentChunk();
> if (currentChunkStatus.isFinished()) {
> readCount--;
> }
> int filterCount = readCount - writeCount;
> if (readCount < 0 || filterCount < 0 || writeCount < 0) {
> throw new IllegalStateException("Somehow one of the metrics was zero. Read count: " + readCount +
> ", Filter count: " + filterCount + ", Write count: " + writeCount);
> }
> {code}
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)