You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/09/01 15:37:06 UTC

[GitHub] [pinot] KKcorps opened a new pull request, #9320: Allow ingestion of errored records with incorrect datatype

KKcorps opened a new pull request, #9320:
URL: https://github.com/apache/pinot/pull/9320

   Currently, if there is an exception during data type transform, we simply stop the ingestion and throw the error. The PR introduces a config so that if user sets `useDefaultValueOnError: true` in the tableConfig, we simply use `null` or defaultValue  instead of throwing error for the particular column or record. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] KKcorps commented on a diff in pull request #9320: Allow ingestion of errored records with incorrect datatype

Posted by GitBox <gi...@apache.org>.
KKcorps commented on code in PR #9320:
URL: https://github.com/apache/pinot/pull/9320#discussion_r961310197


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java:
##########
@@ -40,14 +47,24 @@
  */
 @SuppressWarnings("rawtypes")
 public class DataTypeTransformer implements RecordTransformer {
+  private static final Logger LOGGER = LoggerFactory.getLogger(DataTypeTransformer.class);
+
   private final Map<String, PinotDataType> _dataTypes = new HashMap<>();
+  private boolean _useDefaultValueOnError;

Review Comment:
   done



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java:
##########
@@ -40,14 +47,24 @@
  */
 @SuppressWarnings("rawtypes")
 public class DataTypeTransformer implements RecordTransformer {
+  private static final Logger LOGGER = LoggerFactory.getLogger(DataTypeTransformer.class);
+
   private final Map<String, PinotDataType> _dataTypes = new HashMap<>();
+  private boolean _useDefaultValueOnError;

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] KKcorps commented on a diff in pull request #9320: Allow ingestion of errored records with incorrect datatype

Posted by GitBox <gi...@apache.org>.
KKcorps commented on code in PR #9320:
URL: https://github.com/apache/pinot/pull/9320#discussion_r960890767


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java:
##########
@@ -95,7 +103,12 @@ public GenericRow transform(GenericRow record) {
 
         record.putValue(column, value);
       } catch (Exception e) {
-        throw new RuntimeException("Caught exception while transforming data type for column: " + column, e);
+        if (!_useDefaultValueOnError) {
+          throw new RuntimeException("Caught exception while transforming data type for column: " + column, e);
+        } else {
+          LOGGER.debug("Caught exception while transforming data type for column: {}", column, e);
+          record.putValue(column, null);

Review Comment:
   Yes. Since `NullValueTransformer` takes care of this, I don't want to duplicate that logic and introduce the risk of it diverging from the current logic in future patches.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mayankshriv commented on pull request #9320: Allow ingestion of errored records with incorrect datatype

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on PR #9320:
URL: https://github.com/apache/pinot/pull/9320#issuecomment-1234459557

   Can you also address time column here? if it is null, do we have a default value for it, and if so what's that value?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] KKcorps merged pull request #9320: Allow ingestion of errored records with incorrect datatype

Posted by GitBox <gi...@apache.org>.
KKcorps merged PR #9320:
URL: https://github.com/apache/pinot/pull/9320


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #9320: Allow ingestion of errored records with incorrect datatype

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #9320:
URL: https://github.com/apache/pinot/pull/9320#issuecomment-1234518562

   # [Codecov](https://codecov.io/gh/apache/pinot/pull/9320?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#9320](https://codecov.io/gh/apache/pinot/pull/9320?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (2c356b9) into [master](https://codecov.io/gh/apache/pinot/commit/04c5a1af184b0f181a44b3f7f0fed63c18b450bf?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (04c5a1a) will **decrease** coverage by `38.66%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #9320       +/-   ##
   =============================================
   - Coverage     63.49%   24.83%   -38.67%     
   + Complexity     4991       53     -4938     
   =============================================
     Files          1820     1861       +41     
     Lines         97342    99262     +1920     
     Branches      14906    15127      +221     
   =============================================
   - Hits          61806    24647    -37159     
   - Misses        30955    72115    +41160     
   + Partials       4581     2500     -2081     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration2 | `24.83% <0.00%> (?)` | |
   | unittests1 | `?` | |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/9320?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../local/recordtransformer/CompositeTransformer.java](https://codecov.io/gh/apache/pinot/pull/9320/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9yZWNvcmR0cmFuc2Zvcm1lci9Db21wb3NpdGVUcmFuc2Zvcm1lci5qYXZh) | `0.00% <ø> (-83.34%)` | :arrow_down: |
   | [...t/local/recordtransformer/DataTypeTransformer.java](https://codecov.io/gh/apache/pinot/pull/9320/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9yZWNvcmR0cmFuc2Zvcm1lci9EYXRhVHlwZVRyYW5zZm9ybWVyLmphdmE=) | `0.00% <0.00%> (-89.62%)` | :arrow_down: |
   | [.../apache/pinot/spi/config/table/IndexingConfig.java](https://codecov.io/gh/apache/pinot/pull/9320/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL0luZGV4aW5nQ29uZmlnLmphdmE=) | `0.00% <0.00%> (-92.78%)` | :arrow_down: |
   | [...he/pinot/spi/utils/builder/TableConfigBuilder.java](https://codecov.io/gh/apache/pinot/pull/9320/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvYnVpbGRlci9UYWJsZUNvbmZpZ0J1aWxkZXIuamF2YQ==) | `0.00% <0.00%> (-83.34%)` | :arrow_down: |
   | [...in/java/org/apache/pinot/spi/utils/BytesUtils.java](https://codecov.io/gh/apache/pinot/pull/9320/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQnl0ZXNVdGlscy5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../java/org/apache/pinot/spi/utils/BooleanUtils.java](https://codecov.io/gh/apache/pinot/pull/9320/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQm9vbGVhblV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...java/org/apache/pinot/spi/trace/BaseRecording.java](https://codecov.io/gh/apache/pinot/pull/9320/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdHJhY2UvQmFzZVJlY29yZGluZy5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...java/org/apache/pinot/spi/trace/NoOpRecording.java](https://codecov.io/gh/apache/pinot/pull/9320/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdHJhY2UvTm9PcFJlY29yZGluZy5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/config/table/FSTType.java](https://codecov.io/gh/apache/pinot/pull/9320/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL0ZTVFR5cGUuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/config/user/RoleType.java](https://codecov.io/gh/apache/pinot/pull/9320/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3VzZXIvUm9sZVR5cGUuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [1525 more](https://codecov.io/gh/apache/pinot/pull/9320/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] KKcorps commented on a diff in pull request #9320: Allow ingestion of errored records with incorrect datatype

Posted by GitBox <gi...@apache.org>.
KKcorps commented on code in PR #9320:
URL: https://github.com/apache/pinot/pull/9320#discussion_r961310348


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java:
##########
@@ -40,14 +47,24 @@
  */
 @SuppressWarnings("rawtypes")
 public class DataTypeTransformer implements RecordTransformer {
+  private static final Logger LOGGER = LoggerFactory.getLogger(DataTypeTransformer.class);
+
   private final Map<String, PinotDataType> _dataTypes = new HashMap<>();
+  private boolean _useDefaultValueOnError;
+  private DateTimeFieldSpec _timeColumnSpec;

Review Comment:
   done



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java:
##########
@@ -59,6 +76,16 @@ public GenericRow transform(GenericRow record) {
         if (value == null) {
           continue;
         }
+
+        if (_useDefaultValueOnError && _timeColumnSpec != null && column.equals(_timeColumnSpec.getName())) {
+          DateTimeFormatSpec dateTimeFormatSpec = _timeColumnSpec.getFormatSpec();
+          long timeInMs = dateTimeFormatSpec.fromFormatToMillis(value.toString());
+          if (!TimeUtils.timeValueInValidRange(timeInMs)) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] KKcorps commented on a diff in pull request #9320: Allow ingestion of errored records with incorrect datatype

Posted by GitBox <gi...@apache.org>.
KKcorps commented on code in PR #9320:
URL: https://github.com/apache/pinot/pull/9320#discussion_r961310115


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java:
##########
@@ -784,6 +786,14 @@ public void setNullHandlingEnabled(boolean nullHandlingEnabled) {
     _nullHandlingEnabled = nullHandlingEnabled;
   }
 
+  public boolean useDefaultValueOnError() {

Review Comment:
   done



##########
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/IndexingConfig.java:
##########
@@ -283,6 +284,14 @@ public void setNullHandlingEnabled(boolean nullHandlingEnabled) {
     _nullHandlingEnabled = nullHandlingEnabled;
   }
 
+  public boolean useDefaultValueOnError() {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a diff in pull request #9320: Allow ingestion of errored records with incorrect datatype

Posted by GitBox <gi...@apache.org>.
klsince commented on code in PR #9320:
URL: https://github.com/apache/pinot/pull/9320#discussion_r960852813


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java:
##########
@@ -95,7 +103,12 @@ public GenericRow transform(GenericRow record) {
 
         record.putValue(column, value);
       } catch (Exception e) {
-        throw new RuntimeException("Caught exception while transforming data type for column: " + column, e);
+        if (!_useDefaultValueOnError) {
+          throw new RuntimeException("Caught exception while transforming data type for column: " + column, e);
+        } else {
+          LOGGER.debug("Caught exception while transforming data type for column: {}", column, e);
+          record.putValue(column, null);

Review Comment:
   is it intended to always set null here and let NullValueTransformer (which happens after DataTypeTransformer) to change null to the default value set for nullValueHandling?It's pretty neat to me. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on pull request #9320: Allow ingestion of errored records with incorrect datatype

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on PR #9320:
URL: https://github.com/apache/pinot/pull/9320#issuecomment-1236042015

   Can you please update the pinot doc for the changes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] KKcorps commented on pull request #9320: Allow ingestion of errored records with incorrect datatype

Posted by GitBox <gi...@apache.org>.
KKcorps commented on PR #9320:
URL: https://github.com/apache/pinot/pull/9320#issuecomment-1234785795

   > Can we also address time column here? We should have users control the default value when it is null or not in valid time range.
   Added this change. If the config `useDefaultValueOnError` is enabled and `timeColumn` is set, we will check if each timestamp lies in the valid range or not. If not, then we set it to `null` and let `NullValueTransform` take care of replacing it with appropriate default value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] klsince commented on a diff in pull request #9320: Allow ingestion of errored records with incorrect datatype

Posted by GitBox <gi...@apache.org>.
klsince commented on code in PR #9320:
URL: https://github.com/apache/pinot/pull/9320#discussion_r961196323


##########
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/IndexingConfig.java:
##########
@@ -283,6 +284,14 @@ public void setNullHandlingEnabled(boolean nullHandlingEnabled) {
     _nullHandlingEnabled = nullHandlingEnabled;
   }
 
+  public boolean useDefaultValueOnError() {

Review Comment:
   from my tests, this has to start with `is`, like `isUseDefaultValueOnError`, in order to add it to TableConfig, otherwise it's discarded and marked as unrecognized unrecognizedProperties.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java:
##########
@@ -59,6 +76,16 @@ public GenericRow transform(GenericRow record) {
         if (value == null) {
           continue;
         }
+
+        if (_useDefaultValueOnError && _timeColumnSpec != null && column.equals(_timeColumnSpec.getName())) {
+          DateTimeFormatSpec dateTimeFormatSpec = _timeColumnSpec.getFormatSpec();
+          long timeInMs = dateTimeFormatSpec.fromFormatToMillis(value.toString());
+          if (!TimeUtils.timeValueInValidRange(timeInMs)) {

Review Comment:
   nit: add a DEBUG level for this? 



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java:
##########
@@ -784,6 +786,14 @@ public void setNullHandlingEnabled(boolean nullHandlingEnabled) {
     _nullHandlingEnabled = nullHandlingEnabled;
   }
 
+  public boolean useDefaultValueOnError() {

Review Comment:
   this looks like not used. but anyway, might want to change its name to isUseDefaultValueOnError()



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #9320: Allow ingestion of errored records with incorrect datatype

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #9320:
URL: https://github.com/apache/pinot/pull/9320#discussion_r961333301


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java:
##########
@@ -40,14 +47,33 @@
  */
 @SuppressWarnings("rawtypes")
 public class DataTypeTransformer implements RecordTransformer {
+  private static final Logger LOGGER = LoggerFactory.getLogger(DataTypeTransformer.class);
+
   private final Map<String, PinotDataType> _dataTypes = new HashMap<>();
+  private final boolean _continueOnError;
+  private final boolean _validateTimeValues;
+  private final String _timeColumnName;
+  private final DateTimeFormatSpec _timeColumnSpec;

Review Comment:
   (nit)
   ```suggestion
     private final DateTimeFormatSpec _timeFormatSpec;
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java:
##########
@@ -40,14 +47,33 @@
  */
 @SuppressWarnings("rawtypes")
 public class DataTypeTransformer implements RecordTransformer {
+  private static final Logger LOGGER = LoggerFactory.getLogger(DataTypeTransformer.class);
+
   private final Map<String, PinotDataType> _dataTypes = new HashMap<>();
+  private final boolean _continueOnError;
+  private final boolean _validateTimeValues;
+  private final String _timeColumnName;
+  private final DateTimeFormatSpec _timeColumnSpec;
 
-  public DataTypeTransformer(Schema schema) {
+  public DataTypeTransformer(TableConfig tableConfig, Schema schema) {
     for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
       if (!fieldSpec.isVirtualColumn()) {
         _dataTypes.put(fieldSpec.getName(), PinotDataType.getPinotDataTypeForIngestion(fieldSpec));
       }
     }
+
+    _continueOnError = tableConfig.getIndexingConfig().isContinueOnError();
+    _validateTimeValues = tableConfig.getIndexingConfig().isValidateTimeValue();
+    _timeColumnName = tableConfig.getValidationConfig().getTimeColumnName();
+
+    DateTimeFormatSpec timeColumnSpec = null;
+    if (StringUtils.isNotEmpty(_timeColumnName)) {
+      DateTimeFieldSpec dateTimeFieldSpec = schema.getSpecForTimeColumn(_timeColumnName);
+      if (dateTimeFieldSpec != null) {
+        timeColumnSpec = dateTimeFieldSpec.getFormatSpec();

Review Comment:
   We should throw exception here



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java:
##########
@@ -59,6 +85,17 @@ public GenericRow transform(GenericRow record) {
         if (value == null) {
           continue;
         }
+
+        if (_validateTimeValues && _timeColumnSpec != null && column.equals(_timeColumnName)) {
+          long timeInMs = _timeColumnSpec.fromFormatToMillis(value.toString());
+          if (!TimeUtils.timeValueInValidRange(timeInMs)) {
+            LOGGER.debug("Time value {} is not in valid range for column: {}, must be between: {}",

Review Comment:
   We should check if `continueOnError` is on, and throw exception when it is off



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #9320: Allow ingestion of errored records with incorrect datatype

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #9320:
URL: https://github.com/apache/pinot/pull/9320#discussion_r961260316


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java:
##########
@@ -40,14 +47,24 @@
  */
 @SuppressWarnings("rawtypes")
 public class DataTypeTransformer implements RecordTransformer {
+  private static final Logger LOGGER = LoggerFactory.getLogger(DataTypeTransformer.class);
+
   private final Map<String, PinotDataType> _dataTypes = new HashMap<>();
+  private boolean _useDefaultValueOnError;

Review Comment:
   Let's make 2 configs for these 2 things: `continueOnError` and `validateTimeValue`
   
   Suggest changing `useDefaultValueOnError` to `continueOnError` because it can be used for other ingestion errors, and we don't want to tie it to the implementation



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java:
##########
@@ -40,14 +47,24 @@
  */
 @SuppressWarnings("rawtypes")
 public class DataTypeTransformer implements RecordTransformer {
+  private static final Logger LOGGER = LoggerFactory.getLogger(DataTypeTransformer.class);
+
   private final Map<String, PinotDataType> _dataTypes = new HashMap<>();
+  private boolean _useDefaultValueOnError;
+  private DateTimeFieldSpec _timeColumnSpec;
 
-  public DataTypeTransformer(Schema schema) {
+  public DataTypeTransformer(TableConfig tableConfig, Schema schema) {
     for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
       if (!fieldSpec.isVirtualColumn()) {
         _dataTypes.put(fieldSpec.getName(), PinotDataType.getPinotDataTypeForIngestion(fieldSpec));
       }
     }
+
+    _useDefaultValueOnError = tableConfig.getIndexingConfig().useDefaultValueOnError();

Review Comment:
   Suggest making it inside the `IngestionConfig`



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java:
##########
@@ -40,14 +47,24 @@
  */
 @SuppressWarnings("rawtypes")
 public class DataTypeTransformer implements RecordTransformer {
+  private static final Logger LOGGER = LoggerFactory.getLogger(DataTypeTransformer.class);
+
   private final Map<String, PinotDataType> _dataTypes = new HashMap<>();
+  private boolean _useDefaultValueOnError;

Review Comment:
   These fields can be `final`



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/DataTypeTransformer.java:
##########
@@ -40,14 +47,24 @@
  */
 @SuppressWarnings("rawtypes")
 public class DataTypeTransformer implements RecordTransformer {
+  private static final Logger LOGGER = LoggerFactory.getLogger(DataTypeTransformer.class);
+
   private final Map<String, PinotDataType> _dataTypes = new HashMap<>();
+  private boolean _useDefaultValueOnError;
+  private DateTimeFieldSpec _timeColumnSpec;

Review Comment:
   We may store the `_timeColumnName` and `_timeFormatSpec` for better performance



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org