You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "abhioncbr (via GitHub)" <gi...@apache.org> on 2024/02/16 02:31:13 UTC

[I] commons-configuration2 performance issue during segment metadata load [pinot]

abhioncbr opened a new issue, #12433:
URL: https://github.com/apache/pinot/issues/12433

   Recently, we [upgraded](https://github.com/apache/pinot/issues/11085) common configuration from v1 to v2.  With common configuration v2, we observe some performance issues during segment metadata initialization, especially with tables with hundreds of columns. 
   
   JFR profiles show most of the time spent in the method call [doParseProperty](https://github.com/apache/commons-configuration/blob/master/src/main/java/org/apache/commons/configuration2/PropertiesConfiguration.java#L512); also it's performing `StringEscapeUtils.unescapeJava` both on keys and values.
   ![image](https://github.com/apache/pinot/assets/1076944/93240a19-2610-4496-a256-78ccd2a77f56)
   
   This issue talks about some of the options to overcome performance issues.
   
   - one of the options is to write with a custom property reader for reading the segment metadata files. With commons-configuration2, it's easily possible. Here is the [link](https://commons.apache.org/proper/commons-configuration/userguide/howto_properties.html#Custom_properties_readers_and_writers) 
   - We are also considering the versioning in the segment metadata files for backward compatibility and future enhancements.
   - For easy rollout, probably table-level configuration.  
   
   cc: @Jackie-Jiang @klsince 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] commons-configuration2 performance issue during segment metadata load [pinot]

Posted by "itschrispeck (via GitHub)" <gi...@apache.org>.
itschrispeck commented on issue #12433:
URL: https://github.com/apache/pinot/issues/12433#issuecomment-1953523983

   Hi @abhioncbr, this is an interesting bottleneck. So I understand - by skipping the unescaping step this will prevent column names from containing either `:` or `=`, right? 
   
   I was hoping Pinot would eventually support these characters in column names officially as most other DBs do (I had started some [PR](https://github.com/apache/pinot/pull/12018), but didn't get around to refactoring based on feedback). 
   
   Unfortunately, we're running an internal patch and supporting hundreds of tables with these special columns already. It sounds like to support these names in the future/to avoid breaking our internal branch we'd have to make this reader pluggable and support a version that does allow for escaping keys?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] commons-configuration2 performance issue during segment metadata load [pinot]

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on issue #12433:
URL: https://github.com/apache/pinot/issues/12433#issuecomment-1947688248

   I am planning to work on this. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] commons-configuration2 performance issue during segment metadata load [pinot]

Posted by "itschrispeck (via GitHub)" <gi...@apache.org>.
itschrispeck commented on issue #12433:
URL: https://github.com/apache/pinot/issues/12433#issuecomment-1954962034

   Got it, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] commons-configuration2 performance issue during segment metadata load [pinot]

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on issue #12433:
URL: https://github.com/apache/pinot/issues/12433#issuecomment-1954569204

   Thanks, @itschrispeck, for mentioning it. 
   
   While discussing the bottleneck and possible fixes, we discussed how to support the unique character in keys, and we decided to add the table/cluster-level configuration to use this custom segment metadata reader. By default, we will use the default commons-configuration2 property reader. Here is just for [reference](https://github.com/apache/pinot/pull/12440/files#diff-d09bd334e1cce37e8365083907f226d95a7505d762de54d4f220f624a3775127R25).
   
   This [PR](https://github.com/apache/pinot/pull/12440) just introduced the custom segment metadata property reader; I am still working on it. 
   
   Please let me know if you are okay; I will start tagging you in the PRs corresponding to this work. Thanks
   
   cc: @Jackie-Jiang / @klsince 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] commons-configuration2 performance issue during segment metadata load [pinot]

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on issue #12433:
URL: https://github.com/apache/pinot/issues/12433#issuecomment-1948791380

   @xiangfu0 this is more about skipping unnecessary escaping of the segment metadata keys to make initialization faster, along with  the introduction of versioning of segment metdata. 
   The one you have mentioned is the fix for double unescaping, happening in commons-configuration2 implementation and other in Pinot. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] commons-configuration2 performance issue during segment metadata load [pinot]

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on issue #12433:
URL: https://github.com/apache/pinot/issues/12433#issuecomment-1947857684

   I think this is already fixed in https://github.com/apache/pinot/pull/12405, can you please check?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org