You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Aihua Xu (JIRA)" <ji...@apache.org> on 2018/03/26 17:46:00 UTC

[jira] [Commented] (HIVE-19040) get_partitions_by_expr() implementation in HiveMetaStore causes backward incompatibility easily

    [ https://issues.apache.org/jira/browse/HIVE-19040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414214#comment-16414214 ] 

Aihua Xu commented on HIVE-19040:
---------------------------------

[~alangates], [~vihangk1], [~thejas]  and [~ashutoshc]  Since we are moving HMS as a standalone project, This seems to be a more serious issue on how to keep it backward compatible. Want to hear your option. 

So the issue is, give a simple query like {{select * from tbl1 where x in (1,2)}}. If the table is partitioned, hive will pass the partition filtering expression to HMS to return the filtered partitions. The client serializes such expression into byte array using kryo and the HMS deserializes back to the expression. If the both sides use the same ql, then everything should be fine, but if there are any changes to UDFs, then the client (e.g. HS2) could serialize using the old UDFs and the server could deserialize using the new UDFs. 

I guess we want to keep certain backward compatibilities, right? But with this interface, it's hard for us to tell if such interface is compatible since we could change UDFs and there are so many of them. 

Let me know if it's a valid concern. Do we need to worry about this scenario? 

 

> get_partitions_by_expr() implementation  in HiveMetaStore causes backward incompatibility easily
> ------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-19040
>                 URL: https://issues.apache.org/jira/browse/HIVE-19040
>             Project: Hive
>          Issue Type: Improvement
>          Components: Standalone Metastore
>    Affects Versions: 2.0.0
>            Reporter: Aihua Xu
>            Priority: Major
>
> In the HiveMetaStore implementation of {{public PartitionsByExprResult get_partitions_by_expr(PartitionsByExprRequest req) throws TException}} , an expression is serialized into byte array from the client side and passed through  PartitionsByExprRequest. Then HMS will deserialize back into the expression and filter the partitions by it.
> Such partition filtering expression can contain various UDFs. If there are some changes to one of the UDFs between different Hive versions, HS2 on the older version will serialize the expression in old format which won't be able to be deserialized by HMS on the newer version.  One example of that is, GenericUDFIn class adds {{transient}}  to the field constantInSet which will cause such incompatibility.
> One approach I'm thinking of is, instead of converting the expression object to byte array, we can pass the expression string directly. 
>  
>  
>   
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)