You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by GitBox <gi...@apache.org> on 2022/05/17 04:16:39 UTC

[GitHub] [orc] dengweisysu opened a new issue, #1117: add row count limit config for one stripe

dengweisysu opened a new issue, #1117:
URL: https://github.com/apache/orc/issues/1117

   for query engine like prestoļ¼Œstripe is the base unit for query concurrency, one stripe can only be processed by one split.
   In current implement of orc writer, the only config which can control row count in stripe is the "orc.stripe.size".
   But for different kind of table, the row count is difficult to use.
   - for table with much columns( eg. 100 columns), 64MB may contain  5000 rows.
   - for table with less columns(eg. 5 columns), 64MB may contain 100000 rows.
   
   for presto, normal olap query only read a subset of table columns, the row count is the key factor of query performance. If one stripe contain much rows, the query performance may become too low.
   
   So, besides the config "orc.stripe.size", we need another config like "orc.stripe.row.count" to control the row count of one stripe.
   The similar config has been introduced to cudf ( a GPU DataFrame library base on apache arrow): https://github.com/rapidsai/cudf/issues/9261


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] guiyanakuang closed issue #1117: ORC-1172: Add row count limit config for one stripe

Posted by GitBox <gi...@apache.org>.
guiyanakuang closed issue #1117: ORC-1172: Add row count limit config for one stripe
URL: https://github.com/apache/orc/issues/1117


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] guiyanakuang commented on issue #1117: ORC-1172: Add row count limit config for one stripe

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on issue #1117:
URL: https://github.com/apache/orc/issues/1117#issuecomment-1135644056

   This is resolved via https://github.com/apache/orc/pull/1118


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun commented on issue #1117: add row count limit config for one stripe

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #1117:
URL: https://github.com/apache/orc/issues/1117#issuecomment-1130241918

   Thank you for making a issue and PR, @dengweisysu .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org