You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (JIRA)" <ji...@apache.org> on 2019/08/05 23:42:00 UTC

[jira] [Updated] (IMPALA-8835) Reduce unnecessary memory usage for Thrift structures

     [ https://issues.apache.org/jira/browse/IMPALA-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joe McDonnell updated IMPALA-8835:
----------------------------------
    Description: 
Several Thrift structures act like unions but use memory for all fields. For example, TExprNode is a struct with many mutually exclusive fields:
{noformat}
...
// The function to execute. Not set for SlotRefs and Literals.
5: optional Types.TFunction fn

// If set, child[vararg_start_idx] is the first vararg child.
6: optional i32 vararg_start_idx

7: optional TBoolLiteral bool_literal
8: optional TCaseExpr case_expr
9: optional TDateLiteral date_literal
10: optional TFloatLiteral float_literal
11: optional TIntLiteral int_literal
12: optional TInPredicate in_predicate
13: optional TIsNullPredicate is_null_pred
14: optional TLiteralPredicate literal_pred
15: optional TSlotRef slot_ref
16: optional TStringLiteral string_literal
17: optional TTupleIsNullPredicate tuple_is_null_pred
18: optional TDecimalLiteral decimal_literal
19: optional TAggregateExpr agg_expr
20: optional TTimestampLiteral timestamp_literal
21: optional TKuduPartitionExpr kudu_partition_expr{noformat}
This inflates the size of the structure when it is not encoded. In C++, TExprNode is 720 bytes based on this simple test:
{code:java}
TEST(PrintSizeTest, TExpr) {
  impala::TExprNode expr;
  LOG(INFO) << "Sizeof(TExprNode) = " << sizeof(expr);
}
{code}
There should be able to reduce this considerably, and with some close attention it may be able to be reduced 10x or more.

TExprNode is notable because it can be used many times (e.g. for complicated expressions or for large numbers of partitions). However, this may be true for other structs.

  was:
Several Thrift structures act like unions without actually using a Thrift union. For example, TExprNode is a struct with many mutually exclusive fields:
{noformat}
...
// The function to execute. Not set for SlotRefs and Literals.
5: optional Types.TFunction fn

// If set, child[vararg_start_idx] is the first vararg child.
6: optional i32 vararg_start_idx

7: optional TBoolLiteral bool_literal
8: optional TCaseExpr case_expr
9: optional TDateLiteral date_literal
10: optional TFloatLiteral float_literal
11: optional TIntLiteral int_literal
12: optional TInPredicate in_predicate
13: optional TIsNullPredicate is_null_pred
14: optional TLiteralPredicate literal_pred
15: optional TSlotRef slot_ref
16: optional TStringLiteral string_literal
17: optional TTupleIsNullPredicate tuple_is_null_pred
18: optional TDecimalLiteral decimal_literal
19: optional TAggregateExpr agg_expr
20: optional TTimestampLiteral timestamp_literal
21: optional TKuduPartitionExpr kudu_partition_expr{noformat}
This inflates the size of the structure when it is not encoded. In C++, TExprNode is 720 bytes based on this simple test:
{code:java}
TEST(PrintSizeTest, TExpr) {
  impala::TExprNode expr;
  LOG(INFO) << "Sizeof(TExprNode) = " << sizeof(expr);
}
{code}
A union should be able to reduce this considerably, and with some close attention it may be able to be reduced 10x or more.

TExprNode is notable because it can be used many times (e.g. for complicated expressions or for large numbers of partitions). However, this may be true for other structs.


> Reduce unnecessary memory usage for Thrift structures
> -----------------------------------------------------
>
>                 Key: IMPALA-8835
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8835
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.3.0
>            Reporter: Joe McDonnell
>            Priority: Critical
>
> Several Thrift structures act like unions but use memory for all fields. For example, TExprNode is a struct with many mutually exclusive fields:
> {noformat}
> ...
> // The function to execute. Not set for SlotRefs and Literals.
> 5: optional Types.TFunction fn
> // If set, child[vararg_start_idx] is the first vararg child.
> 6: optional i32 vararg_start_idx
> 7: optional TBoolLiteral bool_literal
> 8: optional TCaseExpr case_expr
> 9: optional TDateLiteral date_literal
> 10: optional TFloatLiteral float_literal
> 11: optional TIntLiteral int_literal
> 12: optional TInPredicate in_predicate
> 13: optional TIsNullPredicate is_null_pred
> 14: optional TLiteralPredicate literal_pred
> 15: optional TSlotRef slot_ref
> 16: optional TStringLiteral string_literal
> 17: optional TTupleIsNullPredicate tuple_is_null_pred
> 18: optional TDecimalLiteral decimal_literal
> 19: optional TAggregateExpr agg_expr
> 20: optional TTimestampLiteral timestamp_literal
> 21: optional TKuduPartitionExpr kudu_partition_expr{noformat}
> This inflates the size of the structure when it is not encoded. In C++, TExprNode is 720 bytes based on this simple test:
> {code:java}
> TEST(PrintSizeTest, TExpr) {
>   impala::TExprNode expr;
>   LOG(INFO) << "Sizeof(TExprNode) = " << sizeof(expr);
> }
> {code}
> There should be able to reduce this considerably, and with some close attention it may be able to be reduced 10x or more.
> TExprNode is notable because it can be used many times (e.g. for complicated expressions or for large numbers of partitions). However, this may be true for other structs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org