You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/09/16 11:45:46 UTC

[GitHub] [druid] clintropolis opened a new pull request, #13101: nested column serializer performance improvement for sparse columns

clintropolis opened a new pull request, #13101:
URL: https://github.com/apache/druid/pull/13101

   ### Description
   This PR improves `NestedDataColumnSerializer` to no longer explicitly write null values to the field writers for the missing values of every row, instead moving passing the row counter to the field writers so that they can backfill null values in bulk if they haven't been written to for some number of rows.
   
   Using an extreme example of a nested column with a dataset having 1 row with ~9000 columns and 999,999 rows with ~10 columns, prior to this patch ingestion took ~30 minutes and after takes ~5.
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 merged pull request #13101: nested column serializer performance improvement for sparse columns

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 merged PR #13101:
URL: https://github.com/apache/druid/pull/13101


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org