You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "itschrispeck (via GitHub)" <gi...@apache.org> on 2023/11/17 01:07:29 UTC

[PR] bugfix: use PropertiesWriter to escape index_map keys properly [pinot]

itschrispeck opened a new pull request, #12018:
URL: https://github.com/apache/pinot/pull/12018

   Previously segment build failed if a column name contained `:` or `=`, since these characters have special meaning in `Properties` files. For example, an exception produced for a column like `headers.:auth` is: 
   
   ```
   j.lang.IllegalStateException: Index separator not found: headers. , segment: /data/pinot-server/dataDir/table_REALTIME/table__127__0__20231020T1604Z/v3 at c.g.common.base.Preconditions.checkState(Preconditions.java:504) at o.a.p.s.s.s.ColumnIndexUtils.parseIndexMapKeys(ColumnIndexUtils.java:43) at o.a.p.s.l.s.s.SingleFileIndexDirectory.loadMap(SingleFileIndexDirectory.java:220) at o.a.p.s.l.s.s.SingleFileIndexDirectory.load(SingleFileIndexDirectory.java:209) at o.a.p.s.l.s.s.SingleFileIndexDirectory.<init>(SingleFileIndexDirectory.java:121) at o.a.p.s.l.s.s.SegmentLocalFSDirectory.loadData(SegmentLocalFSDirectory.java:262) at o.a.p.s.l.s.s.SegmentLocalFSDirectory.load(SegmentLocalFSDirectory.java:247) at o.a.p.s.l.s.s.SegmentLocalFSDirectory.<init>(SegmentLocalFSDirectory.java:98) at o.a.p.s.l.s.s.SegmentLocalFSDirectory.<init>(SegmentLocalFSDirectory.java:81) at o.a.p.s.l.l...
   ``` 
   
   This changes the writing to be done via PropertiesWriter instead of building the string explicitly. 
   
   Testing: unit tests + deployed in a cluster and verified segments were sealed properly 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] bugfix: use PropertiesWriter to escape index_map keys properly [pinot]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #12018:
URL: https://github.com/apache/pinot/pull/12018#issuecomment-1815613170

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/12018?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   Attention: `8 lines` in your changes are missing coverage. Please review.
   > Comparison is base [(`2beb9a4`)](https://app.codecov.io/gh/apache/pinot/commit/2beb9a4938d7d7c9481fd2546075e2a2475fe0ec?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 61.61% compared to head [(`0aaa120`)](https://app.codecov.io/gh/apache/pinot/pull/12018?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 46.77%.
   > Report is 19 commits behind head on master.
   
   | [Files](https://app.codecov.io/gh/apache/pinot/pull/12018?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Patch % | Lines |
   |---|---|---|
   | [.../local/segment/store/SingleFileIndexDirectory.java](https://app.codecov.io/gh/apache/pinot/pull/12018?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L3N0b3JlL1NpbmdsZUZpbGVJbmRleERpcmVjdG9yeS5qYXZh) | 0.00% | [8 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12018?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   
   <details><summary>Additional details and impacted files</summary>
   
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #12018       +/-   ##
   =============================================
   - Coverage     61.61%   46.77%   -14.85%     
   - Complexity      207      943      +736     
   =============================================
     Files          2385     1787      -598     
     Lines        129214    93696    -35518     
     Branches      20003    15158     -4845     
   =============================================
   - Hits          79613    43822    -35791     
   - Misses        43801    46755     +2954     
   + Partials       5800     3119     -2681     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pinot/pull/12018/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/12018/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration](https://app.codecov.io/gh/apache/pinot/pull/12018/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration1](https://app.codecov.io/gh/apache/pinot/pull/12018/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration2](https://app.codecov.io/gh/apache/pinot/pull/12018/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-11](https://app.codecov.io/gh/apache/pinot/pull/12018/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-21](https://app.codecov.io/gh/apache/pinot/pull/12018/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.77% <0.00%> (-14.71%)` | :arrow_down: |
   | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/12018/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/12018/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.77% <0.00%> (+19.18%)` | :arrow_up: |
   | [temurin](https://app.codecov.io/gh/apache/pinot/pull/12018/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.77% <0.00%> (-14.85%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pinot/pull/12018/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.77% <0.00%> (-14.84%)` | :arrow_down: |
   | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/12018/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.77% <0.00%> (-0.18%)` | :arrow_down: |
   | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/12018/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   
   </details>
   
   [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/pinot/pull/12018?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).   
   :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] bugfix: use PropertiesWriter to escape index_map keys properly [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #12018:
URL: https://github.com/apache/pinot/pull/12018#issuecomment-1821808518

   I guess this is not the only place where it can fail. E.g. segment metadata is also a properties configuration.
   Should we consider counting them as reserved characters, and add validation on table name and column name to not allow using them?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] bugfix: use PropertiesWriter to escape index_map keys properly [pinot]

Posted by "itschrispeck (via GitHub)" <gi...@apache.org>.
itschrispeck commented on PR #12018:
URL: https://github.com/apache/pinot/pull/12018#issuecomment-1821839790

   > I guess this is not the only place where it can fail. E.g. segment metadata is also a properties configuration. Should we consider counting them as reserved characters, and add validation on table name and column name to not allow using them?
   
   We would really like the ability to use these characters in column names, since other DBs like ClickHouse do and supporting them makes programmatic column creation simple. This let's us avoid complexity/avoid creating some mapping to substitute out reserved characters.
   
   Re: segment metadata, this already correctly handles these characters. e.g. from `metadata.properties` we see these characters are escaped correctly: 
   
   ```
   column.headers.\:authority.cardinality = -2147483648
   column.headers.\:authority.totalDocs = 1108200
   column.headers.\:authority.dataType = STRING
   column.headers.\:authority.bitsPerElement = 31
   column.headers.\:authority.lengthOfEachEntry = 0
   column.headers.\:authority.columnType = DIMENSION
   column.headers.\:authority.isSorted = false
   column.headers.\:authority.hasDictionary = false
   column.headers.\:authority.isSingleValues = true
   column.headers.\:authority.maxNumberOfMultiValues = -1
   column.headers.\:authority.totalNumberOfEntries = 1108200
   column.headers.\:authority.isAutoGenerated = false
   column.headers.\:authority.minValue = 127.0.0.1:5435
   column.headers.\:authority.maxValue = null
   column.headers.\:authority.defaultNullValue = null
   ```
   
   AFAI could tell the issue is only for index_map, which used some custom logic to write the file instead of relying on PropertiesConfiguration functionality. We've been using this patch for a couple weeks and haven't run into any other query/operational/ingestion bugs due to the special column name


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] bugfix: use PropertiesWriter to escape index_map keys properly [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #12018:
URL: https://github.com/apache/pinot/pull/12018#discussion_r1401394888


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/store/SingleFileIndexDirectory.java:
##########
@@ -431,24 +434,26 @@ private static String getKey(String column, String indexName, boolean isStartOff
   }
 
   @VisibleForTesting
-  static void persistIndexMaps(List<IndexEntry> entries, PrintWriter writer) {
+  static void persistIndexMaps(List<IndexEntry> entries, PrintWriter writer) throws IOException {

Review Comment:
   (format) Please apply Pinot Style



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org