You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/23 06:43:17 UTC

[GitHub] [pinot] Jackie-Jiang opened a new pull request, #8960: Refector DateTime field specs to reduce overhead

Jackie-Jiang opened a new pull request, #8960:
URL: https://github.com/apache/pinot/pull/8960

   - Integrate the validation logic into the constructor to avoid the overhead of extra validation and regex match
   - Remove the unnecessary format reconstruction
   - Cache `DateTimeFormatSpec` and `DateTimeGranularitySpec` within `DateTimeFieldSpec` to avoid parsing the spec string multiple times
   - Use the new introduced pattern format in #8632 
   - This can improve the performance of the `dateTimeConvert` function


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8960: Refector DateTime field specs to reduce overhead

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8960:
URL: https://github.com/apache/pinot/pull/8960#discussion_r905611444


##########
pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/mappers/SegmentPreprocessingMapper.java:
##########
@@ -68,15 +68,36 @@ public void setup(final Context context) {
       String timeColumnValue = _jobConf.get(InternalConfigConstants.TIME_COLUMN_VALUE);
       String pushFrequency = _jobConf.get(InternalConfigConstants.SEGMENT_PUSH_FREQUENCY);
 
-      String timeType = _jobConf.get(InternalConfigConstants.SEGMENT_TIME_TYPE);
-      String timeFormat = _jobConf.get(InternalConfigConstants.SEGMENT_TIME_FORMAT);
+      String timeFormatStr = _jobConf.get(InternalConfigConstants.SEGMENT_TIME_FORMAT);
+      DateTimeFieldSpec.TimeFormat timeFormat;
+      try {
+        timeFormat = DateTimeFieldSpec.TimeFormat.valueOf(timeFormatStr);
+      } catch (Exception e) {
+        throw new IllegalArgumentException("Invalid time format: " + timeFormatStr);
+      }
       DateTimeFormatSpec dateTimeFormatSpec;
-      if (timeFormat.equals(DateTimeFieldSpec.TimeFormat.EPOCH.toString()) || timeFormat.equals(
-          DateTimeFieldSpec.TimeFormat.TIMESTAMP.toString())) {
-        dateTimeFormatSpec = new DateTimeFormatSpec(1, timeType, timeFormat);
-      } else {
-        dateTimeFormatSpec = new DateTimeFormatSpec(1, timeType, timeFormat,
-            _jobConf.get(InternalConfigConstants.SEGMENT_TIME_SDF_PATTERN));
+      switch (timeFormat) {
+        case EPOCH:

Review Comment:
   Good catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #8960: Refector DateTime field specs to reduce overhead

Posted by GitBox <gi...@apache.org>.
xiangfu0 commented on code in PR #8960:
URL: https://github.com/apache/pinot/pull/8960#discussion_r905583504


##########
pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/mappers/SegmentPreprocessingMapper.java:
##########
@@ -68,15 +68,36 @@ public void setup(final Context context) {
       String timeColumnValue = _jobConf.get(InternalConfigConstants.TIME_COLUMN_VALUE);
       String pushFrequency = _jobConf.get(InternalConfigConstants.SEGMENT_PUSH_FREQUENCY);
 
-      String timeType = _jobConf.get(InternalConfigConstants.SEGMENT_TIME_TYPE);
-      String timeFormat = _jobConf.get(InternalConfigConstants.SEGMENT_TIME_FORMAT);
+      String timeFormatStr = _jobConf.get(InternalConfigConstants.SEGMENT_TIME_FORMAT);
+      DateTimeFieldSpec.TimeFormat timeFormat;
+      try {
+        timeFormat = DateTimeFieldSpec.TimeFormat.valueOf(timeFormatStr);
+      } catch (Exception e) {
+        throw new IllegalArgumentException("Invalid time format: " + timeFormatStr);
+      }
       DateTimeFormatSpec dateTimeFormatSpec;
-      if (timeFormat.equals(DateTimeFieldSpec.TimeFormat.EPOCH.toString()) || timeFormat.equals(
-          DateTimeFieldSpec.TimeFormat.TIMESTAMP.toString())) {
-        dateTimeFormatSpec = new DateTimeFormatSpec(1, timeType, timeFormat);
-      } else {
-        dateTimeFormatSpec = new DateTimeFormatSpec(1, timeType, timeFormat,
-            _jobConf.get(InternalConfigConstants.SEGMENT_TIME_SDF_PATTERN));
+      switch (timeFormat) {
+        case EPOCH:

Review Comment:
   This should be case SIMPLE_DATE_FORMAT ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang merged pull request #8960: Refector DateTime field specs to reduce overhead

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged PR #8960:
URL: https://github.com/apache/pinot/pull/8960


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #8960: Refector DateTime field specs to reduce overhead

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #8960:
URL: https://github.com/apache/pinot/pull/8960#issuecomment-1165002503

   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8960?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8960](https://codecov.io/gh/apache/pinot/pull/8960?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (fc61300) into [master](https://codecov.io/gh/apache/pinot/commit/5cd0ded9088be55dc2468688bac79fc24b080b03?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5cd0ded) will **decrease** coverage by `40.81%`.
   > The diff coverage is `1.89%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #8960       +/-   ##
   =============================================
   - Coverage     69.67%   28.86%   -40.82%     
   + Complexity     4884       47     -4837     
   =============================================
     Files          1817     1805       -12     
     Lines         94789    94452      -337     
     Branches      14178    14133       -45     
   =============================================
   - Hits          66046    27261    -38785     
   - Misses        24113    64639    +40526     
   + Partials       4630     2552     -2078     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `26.58% <1.89%> (-0.01%)` | :arrow_down: |
   | integration2 | `25.07% <1.89%> (+0.24%)` | :arrow_up: |
   | unittests1 | `?` | |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8960?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...roker/routing/segmentpruner/TimeSegmentPruner.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9zZWdtZW50cHJ1bmVyL1RpbWVTZWdtZW50UHJ1bmVyLmphdmE=) | `0.00% <0.00%> (-74.23%)` | :arrow_down: |
   | [...he/pinot/controller/util/AutoAddInvertedIndex.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci91dGlsL0F1dG9BZGRJbnZlcnRlZEluZGV4LmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...tion/batch/common/SegmentGenerationTaskRunner.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtcGx1Z2lucy9waW5vdC1iYXRjaC1pbmdlc3Rpb24vcGlub3QtYmF0Y2gtaW5nZXN0aW9uLWNvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvcGx1Z2luL2luZ2VzdGlvbi9iYXRjaC9jb21tb24vU2VnbWVudEdlbmVyYXRpb25UYXNrUnVubmVyLmphdmE=) | `0.00% <0.00%> (-69.85%)` | :arrow_down: |
   | [...pache/pinot/plugin/inputformat/avro/AvroUtils.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtcGx1Z2lucy9waW5vdC1pbnB1dC1mb3JtYXQvcGlub3QtYXZyby1iYXNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9wbHVnaW4vaW5wdXRmb3JtYXQvYXZyby9BdnJvVXRpbHMuamF2YQ==) | `0.00% <0.00%> (-68.39%)` | :arrow_down: |
   | [.../local/recordtransformer/NullValueTransformer.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9yZWNvcmR0cmFuc2Zvcm1lci9OdWxsVmFsdWVUcmFuc2Zvcm1lci5qYXZh) | `0.00% <0.00%> (-97.37%)` | :arrow_down: |
   | [...ache/pinot/segment/local/utils/IngestionUtils.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9Jbmdlc3Rpb25VdGlscy5qYXZh) | `0.00% <0.00%> (-28.08%)` | :arrow_down: |
   | [.../apache/pinot/segment/local/utils/SchemaUtils.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9TY2hlbWFVdGlscy5qYXZh) | `0.00% <0.00%> (-97.37%)` | :arrow_down: |
   | [...ot/segment/spi/creator/SegmentGeneratorConfig.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2NyZWF0b3IvU2VnbWVudEdlbmVyYXRvckNvbmZpZy5qYXZh) | `0.00% <0.00%> (-82.50%)` | :arrow_down: |
   | [.../spi/creator/name/SegmentNameGeneratorFactory.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2NyZWF0b3IvbmFtZS9TZWdtZW50TmFtZUdlbmVyYXRvckZhY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-60.00%)` | :arrow_down: |
   | [...a/org/apache/pinot/spi/data/DateTimeFieldSpec.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9EYXRlVGltZUZpZWxkU3BlYy5qYXZh) | `0.00% <0.00%> (-89.75%)` | :arrow_down: |
   | ... and [1268 more](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8960?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8960?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [5cd0ded...fc61300](https://codecov.io/gh/apache/pinot/pull/8960?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8960: Refector DateTime field specs to reduce overhead

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8960:
URL: https://github.com/apache/pinot/pull/8960#discussion_r910960563


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/DateTimeFieldSpec.java:
##########
@@ -70,17 +72,8 @@ public DateTimeFieldSpec() {
    */
   public DateTimeFieldSpec(String name, DataType dataType, String format, String granularity) {
     super(name, dataType, true);
-    Preconditions.checkNotNull(name);
-    Preconditions.checkNotNull(dataType);
-    if (Character.isDigit(format.charAt(0))) {
-      DateTimeFormatSpec.validateFormat(format);
-    } else {
-      DateTimeFormatSpec.validatePipeFormat(format);
-    }
-    DateTimeGranularitySpec.validateGranularity(granularity);
-
-    _format = format;
-    _granularity = granularity;
+    setFormat(format);

Review Comment:
   It is covered in `SchemaUtilsTest`.
   After a second thought, decided to not directly parse the `format` and `granularity` in the setter to prevent the extra overhead of reading schema. The parsing is performed lazily in the getter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #8960: Refector DateTime field specs to reduce overhead

Posted by GitBox <gi...@apache.org>.
xiangfu0 commented on code in PR #8960:
URL: https://github.com/apache/pinot/pull/8960#discussion_r905588167


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/DateTimeFieldSpec.java:
##########
@@ -70,17 +72,8 @@ public DateTimeFieldSpec() {
    */
   public DateTimeFieldSpec(String name, DataType dataType, String format, String granularity) {
     super(name, dataType, true);
-    Preconditions.checkNotNull(name);
-    Preconditions.checkNotNull(dataType);
-    if (Character.isDigit(format.charAt(0))) {
-      DateTimeFormatSpec.validateFormat(format);
-    } else {
-      DateTimeFormatSpec.validatePipeFormat(format);
-    }
-    DateTimeGranularitySpec.validateGranularity(granularity);
-
-    _format = format;
-    _granularity = granularity;
+    setFormat(format);

Review Comment:
   Will json serde work correctly for this class? Maybe add a test for serde and comparison ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org