You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/10/16 20:07:00 UTC
[jira] [Work logged] (HIVE-26633) Make thrift max message size configurable

     [ https://issues.apache.org/jira/browse/HIVE-26633?focusedWorklogId=817389&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-817389 ]

ASF GitHub Bot logged work on HIVE-26633:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Oct/22 20:06
            Start Date: 16/Oct/22 20:06
    Worklog Time Spent: 10m 
      Work Description: jfsii opened a new pull request, #3674:
URL: https://github.com/apache/hive/pull/3674

   ### What changes were proposed in this pull request?
   Make thrift max message size configurable in a variety of contexts. The most important being between HS2 and HMS since message sizes can get quite large.
   
   ### Why are the changes needed?
   Large metadata messages since the upgrade to thrift 0.16 can hit an internal thrift limit of 100 MB. This change allows this limit to be configured.
   
   ### Does this PR introduce _any_ user-facing change?
   It adds a new configuration option to HiveConf and defaults to 1gb. Additionally also adds the ability to adjust the beeline connection thrift max message size (mostly as a safety valve, since I have not seen it get hit here).
   
   ### How was this patch tested?
   Manually on a kerberized cluster.




Issue Time Tracking
-------------------

            Worklog Id:     (was: 817389)
    Remaining Estimate: 0h
            Time Spent: 10m

> Make thrift max message size configurable
> -----------------------------------------
>
>                 Key: HIVE-26633
>                 URL: https://issues.apache.org/jira/browse/HIVE-26633
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: John Sherman
>            Assignee: John Sherman
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since thrift >= 0.14, thrift now enforces max message sizes through a TConfiguration object as described here:
> [https://github.com/apache/thrift/blob/master/doc/specs/thrift-tconfiguration.md]
> By default MaxMessageSize gets set to 100MB.
> As a result it is possible for HMS clients not to be able to retrieve certain metadata for tables with a large amount of partitions or other metadata.
> For example on a cluster configured with kerberos between hs2 and hms, querying a large table (10k partitions, 200 columns with names of 200 characters) results in this backtrace:
> {code:java}
> org.apache.thrift.transport.TTransportException: MaxMessageSize reached
> at org.apache.thrift.transport.TEndpointTransport.countConsumedMessageBytes(TEndpointTransport.java:96) 
> at org.apache.thrift.transport.TMemoryInputTransport.read(TMemoryInputTransport.java:97) 
> at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:390) 
> at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:39) 
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:109) 
> at org.apache.hadoop.hive.metastore.security.TFilterTransport.readAll(TFilterTransport.java:63) 
> at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:464) 
> at org.apache.thrift.protocol.TBinaryProtocol.readByte(TBinaryProtocol.java:329) 
> at org.apache.thrift.protocol.TBinaryProtocol.readFieldBegin(TBinaryProtocol.java:273) 
> at org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.read(FieldSchema.java:461) 
> at org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.read(FieldSchema.java:454) 
> at org.apache.hadoop.hive.metastore.api.FieldSchema.read(FieldSchema.java:388) 
> at org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.read(StorageDescriptor.java:1269) 
> at org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.read(StorageDescriptor.java:1248) 
> at org.apache.hadoop.hive.metastore.api.StorageDescriptor.read(StorageDescriptor.java:1110) 
> at org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.read(Partition.java:1270) 
> at org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.read(Partition.java:1205) 
> at org.apache.hadoop.hive.metastore.api.Partition.read(Partition.java:1062) 
> at org.apache.hadoop.hive.metastore.api.PartitionsByExprResult$PartitionsByExprResultStandardScheme.read(PartitionsByExprResult.java:420) 
> at org.apache.hadoop.hive.metastore.api.PartitionsByExprResult$PartitionsByExprResultStandardScheme.read(PartitionsByExprResult.java:399) 
> at org.apache.hadoop.hive.metastore.api.PartitionsByExprResult.read(PartitionsByExprResult.java:335) 
> at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_expr_result$get_partitions_by_expr_resultStandardScheme.read(ThriftHiveMetastore.java)  {code}
> Making this configurable (and defaulting to a higher value) would allow these tables to still be accessible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)