You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/23 06:43:17 UTC
[GitHub] [pinot] Jackie-Jiang opened a new pull request, #8960: Refector DateTime field specs to reduce overhead
Jackie-Jiang opened a new pull request, #8960:
URL: https://github.com/apache/pinot/pull/8960
- Integrate the validation logic into the constructor to avoid the overhead of extra validation and regex match
- Remove the unnecessary format reconstruction
- Cache `DateTimeFormatSpec` and `DateTimeGranularitySpec` within `DateTimeFieldSpec` to avoid parsing the spec string multiple times
- Use the new introduced pattern format in #8632
- This can improve the performance of the `dateTimeConvert` function
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8960: Refector DateTime field specs to reduce overhead
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8960:
URL: https://github.com/apache/pinot/pull/8960#discussion_r905611444
##########
pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/mappers/SegmentPreprocessingMapper.java:
##########
@@ -68,15 +68,36 @@ public void setup(final Context context) {
String timeColumnValue = _jobConf.get(InternalConfigConstants.TIME_COLUMN_VALUE);
String pushFrequency = _jobConf.get(InternalConfigConstants.SEGMENT_PUSH_FREQUENCY);
- String timeType = _jobConf.get(InternalConfigConstants.SEGMENT_TIME_TYPE);
- String timeFormat = _jobConf.get(InternalConfigConstants.SEGMENT_TIME_FORMAT);
+ String timeFormatStr = _jobConf.get(InternalConfigConstants.SEGMENT_TIME_FORMAT);
+ DateTimeFieldSpec.TimeFormat timeFormat;
+ try {
+ timeFormat = DateTimeFieldSpec.TimeFormat.valueOf(timeFormatStr);
+ } catch (Exception e) {
+ throw new IllegalArgumentException("Invalid time format: " + timeFormatStr);
+ }
DateTimeFormatSpec dateTimeFormatSpec;
- if (timeFormat.equals(DateTimeFieldSpec.TimeFormat.EPOCH.toString()) || timeFormat.equals(
- DateTimeFieldSpec.TimeFormat.TIMESTAMP.toString())) {
- dateTimeFormatSpec = new DateTimeFormatSpec(1, timeType, timeFormat);
- } else {
- dateTimeFormatSpec = new DateTimeFormatSpec(1, timeType, timeFormat,
- _jobConf.get(InternalConfigConstants.SEGMENT_TIME_SDF_PATTERN));
+ switch (timeFormat) {
+ case EPOCH:
Review Comment:
Good catch!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on a diff in pull request #8960: Refector DateTime field specs to reduce overhead
Posted by GitBox <gi...@apache.org>.
xiangfu0 commented on code in PR #8960:
URL: https://github.com/apache/pinot/pull/8960#discussion_r905583504
##########
pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/mappers/SegmentPreprocessingMapper.java:
##########
@@ -68,15 +68,36 @@ public void setup(final Context context) {
String timeColumnValue = _jobConf.get(InternalConfigConstants.TIME_COLUMN_VALUE);
String pushFrequency = _jobConf.get(InternalConfigConstants.SEGMENT_PUSH_FREQUENCY);
- String timeType = _jobConf.get(InternalConfigConstants.SEGMENT_TIME_TYPE);
- String timeFormat = _jobConf.get(InternalConfigConstants.SEGMENT_TIME_FORMAT);
+ String timeFormatStr = _jobConf.get(InternalConfigConstants.SEGMENT_TIME_FORMAT);
+ DateTimeFieldSpec.TimeFormat timeFormat;
+ try {
+ timeFormat = DateTimeFieldSpec.TimeFormat.valueOf(timeFormatStr);
+ } catch (Exception e) {
+ throw new IllegalArgumentException("Invalid time format: " + timeFormatStr);
+ }
DateTimeFormatSpec dateTimeFormatSpec;
- if (timeFormat.equals(DateTimeFieldSpec.TimeFormat.EPOCH.toString()) || timeFormat.equals(
- DateTimeFieldSpec.TimeFormat.TIMESTAMP.toString())) {
- dateTimeFormatSpec = new DateTimeFormatSpec(1, timeType, timeFormat);
- } else {
- dateTimeFormatSpec = new DateTimeFormatSpec(1, timeType, timeFormat,
- _jobConf.get(InternalConfigConstants.SEGMENT_TIME_SDF_PATTERN));
+ switch (timeFormat) {
+ case EPOCH:
Review Comment:
This should be case SIMPLE_DATE_FORMAT ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang merged pull request #8960: Refector DateTime field specs to reduce overhead
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged PR #8960:
URL: https://github.com/apache/pinot/pull/8960
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter commented on pull request #8960: Refector DateTime field specs to reduce overhead
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #8960:
URL: https://github.com/apache/pinot/pull/8960#issuecomment-1165002503
# [Codecov](https://codecov.io/gh/apache/pinot/pull/8960?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#8960](https://codecov.io/gh/apache/pinot/pull/8960?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (fc61300) into [master](https://codecov.io/gh/apache/pinot/commit/5cd0ded9088be55dc2468688bac79fc24b080b03?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5cd0ded) will **decrease** coverage by `40.81%`.
> The diff coverage is `1.89%`.
```diff
@@ Coverage Diff @@
## master #8960 +/- ##
=============================================
- Coverage 69.67% 28.86% -40.82%
+ Complexity 4884 47 -4837
=============================================
Files 1817 1805 -12
Lines 94789 94452 -337
Branches 14178 14133 -45
=============================================
- Hits 66046 27261 -38785
- Misses 24113 64639 +40526
+ Partials 4630 2552 -2078
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `26.58% <1.89%> (-0.01%)` | :arrow_down: |
| integration2 | `25.07% <1.89%> (+0.24%)` | :arrow_up: |
| unittests1 | `?` | |
| unittests2 | `?` | |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8960?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...roker/routing/segmentpruner/TimeSegmentPruner.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9zZWdtZW50cHJ1bmVyL1RpbWVTZWdtZW50UHJ1bmVyLmphdmE=) | `0.00% <0.00%> (-74.23%)` | :arrow_down: |
| [...he/pinot/controller/util/AutoAddInvertedIndex.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci91dGlsL0F1dG9BZGRJbnZlcnRlZEluZGV4LmphdmE=) | `0.00% <0.00%> (ø)` | |
| [...tion/batch/common/SegmentGenerationTaskRunner.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtcGx1Z2lucy9waW5vdC1iYXRjaC1pbmdlc3Rpb24vcGlub3QtYmF0Y2gtaW5nZXN0aW9uLWNvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvcGx1Z2luL2luZ2VzdGlvbi9iYXRjaC9jb21tb24vU2VnbWVudEdlbmVyYXRpb25UYXNrUnVubmVyLmphdmE=) | `0.00% <0.00%> (-69.85%)` | :arrow_down: |
| [...pache/pinot/plugin/inputformat/avro/AvroUtils.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtcGx1Z2lucy9waW5vdC1pbnB1dC1mb3JtYXQvcGlub3QtYXZyby1iYXNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9wbHVnaW4vaW5wdXRmb3JtYXQvYXZyby9BdnJvVXRpbHMuamF2YQ==) | `0.00% <0.00%> (-68.39%)` | :arrow_down: |
| [.../local/recordtransformer/NullValueTransformer.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9yZWNvcmR0cmFuc2Zvcm1lci9OdWxsVmFsdWVUcmFuc2Zvcm1lci5qYXZh) | `0.00% <0.00%> (-97.37%)` | :arrow_down: |
| [...ache/pinot/segment/local/utils/IngestionUtils.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9Jbmdlc3Rpb25VdGlscy5qYXZh) | `0.00% <0.00%> (-28.08%)` | :arrow_down: |
| [.../apache/pinot/segment/local/utils/SchemaUtils.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9TY2hlbWFVdGlscy5qYXZh) | `0.00% <0.00%> (-97.37%)` | :arrow_down: |
| [...ot/segment/spi/creator/SegmentGeneratorConfig.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2NyZWF0b3IvU2VnbWVudEdlbmVyYXRvckNvbmZpZy5qYXZh) | `0.00% <0.00%> (-82.50%)` | :arrow_down: |
| [.../spi/creator/name/SegmentNameGeneratorFactory.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2NyZWF0b3IvbmFtZS9TZWdtZW50TmFtZUdlbmVyYXRvckZhY3RvcnkuamF2YQ==) | `0.00% <0.00%> (-60.00%)` | :arrow_down: |
| [...a/org/apache/pinot/spi/data/DateTimeFieldSpec.java](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9EYXRlVGltZUZpZWxkU3BlYy5qYXZh) | `0.00% <0.00%> (-89.75%)` | :arrow_down: |
| ... and [1268 more](https://codecov.io/gh/apache/pinot/pull/8960/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8960?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8960?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [5cd0ded...fc61300](https://codecov.io/gh/apache/pinot/pull/8960?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8960: Refector DateTime field specs to reduce overhead
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8960:
URL: https://github.com/apache/pinot/pull/8960#discussion_r910960563
##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/DateTimeFieldSpec.java:
##########
@@ -70,17 +72,8 @@ public DateTimeFieldSpec() {
*/
public DateTimeFieldSpec(String name, DataType dataType, String format, String granularity) {
super(name, dataType, true);
- Preconditions.checkNotNull(name);
- Preconditions.checkNotNull(dataType);
- if (Character.isDigit(format.charAt(0))) {
- DateTimeFormatSpec.validateFormat(format);
- } else {
- DateTimeFormatSpec.validatePipeFormat(format);
- }
- DateTimeGranularitySpec.validateGranularity(granularity);
-
- _format = format;
- _granularity = granularity;
+ setFormat(format);
Review Comment:
It is covered in `SchemaUtilsTest`.
After a second thought, decided to not directly parse the `format` and `granularity` in the setter to prevent the extra overhead of reading schema. The parsing is performed lazily in the getter.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] xiangfu0 commented on a diff in pull request #8960: Refector DateTime field specs to reduce overhead
Posted by GitBox <gi...@apache.org>.
xiangfu0 commented on code in PR #8960:
URL: https://github.com/apache/pinot/pull/8960#discussion_r905588167
##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/DateTimeFieldSpec.java:
##########
@@ -70,17 +72,8 @@ public DateTimeFieldSpec() {
*/
public DateTimeFieldSpec(String name, DataType dataType, String format, String granularity) {
super(name, dataType, true);
- Preconditions.checkNotNull(name);
- Preconditions.checkNotNull(dataType);
- if (Character.isDigit(format.charAt(0))) {
- DateTimeFormatSpec.validateFormat(format);
- } else {
- DateTimeFormatSpec.validatePipeFormat(format);
- }
- DateTimeGranularitySpec.validateGranularity(granularity);
-
- _format = format;
- _granularity = granularity;
+ setFormat(format);
Review Comment:
Will json serde work correctly for this class? Maybe add a test for serde and comparison ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org