You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/07 14:32:01 UTC

[GitHub] [druid] liran-funaro commented on a change in pull request #10336: More structured way to handle parse exceptions

liran-funaro commented on a change in pull request #10336:
URL: https://github.com/apache/druid/pull/10336#discussion_r484402253



##########
File path: processing/src/test/java/org/apache/druid/segment/incremental/OnheapIncrementalIndexBenchmark.java
##########
@@ -227,9 +224,7 @@ protected AddToFactsResult addToFacts(
           }
           catch (ParseException e) {
             // "aggregate" can throw ParseExceptions if a selector expects something but gets something else.
-            if (getReportParseExceptions()) {
-              throw e;
-            }
+            throw e;

Review comment:
       If we never intend to log/report this exception, we can remove the try/catch clause entirely.

##########
File path: processing/src/main/java/org/apache/druid/segment/incremental/OnheapIncrementalIndex.java
##########
@@ -186,7 +185,7 @@ protected AddToFactsResult addToFacts(
       } else {
         // We lost a race
         aggs = concurrentGet(prev);
-        parseExceptionMessages = doAggregate(metrics, aggs, rowContainer, row);
+        doAggregate(metrics, aggs, rowContainer, row, parseExceptionMessages);

Review comment:
       Currently (before and after this PR), a row that contains partially corrupted/incorrect data is aggregated anyway (partially).
   This may result in wrong statistics inference by the user due to inconsistencies between the number of rows each aggregator accumulated.
   IMO, the decision of allowing this scenario should be of the user. 
   But since currently, there is no way to roll-back aggregation, I don't see how this can be easily implemented.
   Maybe adding parse as a preliminary step to aggregation, but this is an entirely different subject.
   "return early without the second aggregation" would only aggravate this scenario, so I don't think it is the way to go either. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org