You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/06/14 16:47:41 UTC

[GitHub] [incubator-druid] leventov commented on a change in pull request #7838: Improve IncrementalIndex concurrency scalability

leventov commented on a change in pull request #7838: Improve IncrementalIndex concurrency scalability
URL: https://github.com/apache/incubator-druid/pull/7838#discussion_r293877970
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
 ##########
 @@ -627,98 +624,96 @@ IncrementalIndexRowResult toIncrementalIndexRow(InputRow row)
     if (row.getTimestampFromEpoch() < minTimestamp) {
       throw new IAE("Cannot add row[%s] because it is below the minTimestamp[%s]", row, DateTimes.utc(minTimestamp));
     }
-
     final List<String> rowDimensions = row.getDimensions();
 
-    Object[] dims;
-    List<Object> overflow = null;
+    Map<String, Object> rowDimKeys = new HashMap<>();
     long dimsKeySize = 0;
     List<String> parseExceptionMessages = new ArrayList<>();
-    synchronized (dimensionDescs) {
-      dims = new Object[dimensionDescs.size()];
-      for (String dimension : rowDimensions) {
-        if (Strings.isNullOrEmpty(dimension)) {
-          continue;
-        }
-        boolean wasNewDim = false;
-        ColumnCapabilitiesImpl capabilities;
-        DimensionDesc desc = dimensionDescs.get(dimension);
-        if (desc != null) {
-          capabilities = desc.getCapabilities();
-        } else {
-          wasNewDim = true;
-          capabilities = columnCapabilities.get(dimension);
-          if (capabilities == null) {
-            capabilities = new ColumnCapabilitiesImpl();
-            // For schemaless type discovery, assume everything is a String for now, can change later.
-            capabilities.setType(ValueType.STRING);
-            capabilities.setDictionaryEncoded(true);
-            capabilities.setHasBitmapIndexes(true);
-            columnCapabilities.put(dimension, capabilities);
-          }
-          DimensionHandler handler = DimensionHandlerUtils.getHandlerFromCapabilities(dimension, capabilities, null);
-          desc = addNewDimension(dimension, capabilities, handler);
-        }
-        DimensionHandler handler = desc.getHandler();
-        DimensionIndexer indexer = desc.getIndexer();
-        Object dimsKey = null;
-        try {
-          dimsKey = indexer.processRowValsToUnsortedEncodedKeyComponent(
-              row.getRaw(dimension),
-              true
-          );
+
+    DimensionData prevDimensionData = this.dimensions.get();
+    DimensionData dimensionData = null;
+    for (String dimension : rowDimensions) {
+      if (Strings.isNullOrEmpty(dimension)) {
+        continue;
+      }
+
+      if (rowDimKeys.containsKey(dimension)) {
+        // If the dims map already contains a mapping at this index, it means we have seen this dimension already on this input row.
+        throw new ISE("Dimension[%s] occurred more than once in InputRow", dimension);
 
 Review comment:
   First, it would be better to print the full faulty row.
   
   Second, I think the behavior should still be permissive. We can ignore repetitive occurrences of a key, just log it on ERROR level. Or, `IncrementalIndexRowResult` should be amended with extra error data.
   
   I invite @egor-ryashin, @jihoonson, and @jon-wei to chime if they disagree.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org