You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/02/05 01:31:00 UTC

[jira] [Commented] (IMPALA-11104) Revisit computeMinScalarColumnMemReservation for ORC async IO

    [ https://issues.apache.org/jira/browse/IMPALA-11104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487392#comment-17487392 ] 

Quanlong Huang commented on IMPALA-11104:
-----------------------------------------

Thanks for filing this!

FWIW, one difference is that in Parquet all column types can be dictionary encoded, whereas in ORC only string types can be dictionary encoded.

> Revisit computeMinScalarColumnMemReservation for ORC async IO
> -------------------------------------------------------------
>
>                 Key: IMPALA-11104
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11104
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Riza Suminto
>            Priority: Major
>
> HdfsScanNode.computeMinScalarColumnMemReservation has estimate to reduce memory reservation for a column lower than DEFAULT_COLUMN_SCAN_RANGE_RESERVATION (4MB).
> [https://github.com/apache/impala/blob/df528fe/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java#L2226-L2235]
>  
> This estimate is based on Parquet table. We need to revisit this estimate for ORC table.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org