You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Vinayakumar B (Jira)" <ji...@apache.org> on 2019/12/18 19:04:00 UTC

[jira] [Comment Edited] (HADOOP-16621) [pb-upgrade] spark-hive doesn't compile against hadoop trunk because of Token's marshalling

    [ https://issues.apache.org/jira/browse/HADOOP-16621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999436#comment-16999436 ] 

Vinayakumar B edited comment on HADOOP-16621 at 12/18/19 7:03 PM:
------------------------------------------------------------------

Sorry for being late.

Following public APIs were introduced by HADOOP-12563 back in 2016 in Hadoop 3.0.0 version.
{code:java}
public Token(TokenProto tokenPB);

public TokenProto toTokenProto();
{code}
Ideally there should not be any public API with @Public interface with protobuf in signature.
 Right now, this is breaking the binary compatibility of downstream due to protobuf version upgrade. Because generated proto classes' super class name is changed to {{GeneratedMessage3}} from {{GeneratedMessage}} in 2.5.0 protobuf.

So possible options to proceed will be only
 # Remove all public methods with protobuf signature replace with helper classes to do the same job. as being done in HDFS' {{PBHelperClient.java}}. This will break the compatibility if by any chance these methods are being used outside hadoop-common module (also Hadoop project overall, as upgrade happens all Hadoop components together).
 # Mark methods deprecated, Keep the old 'TokenProto' class with 2.5.0 generated protobuf committed to repo. And rename current {{TokenProto}} to {{TokenProto3}} and all their occurances throughout project (Hopefully TokenProto is not used outside Hadoop project). And skip shading of 2.5.0 TokenProto. Can remove methods and committed TokenProto class.

 

Approach #1 is would be easy and direct change, but again compatibility issue if these methods used by other projects which is most unlikely.

[~stevel@apache.org] / [~vinodkv] / [~raviprak] is it okay to remove above mentioned methods ? and replace with something similar to {{PBHelperClient#convert(Token<?> tok)}} and {{PBHelperClient#convert(TokenProto tok)}}

 

Approach #2 is a workaround still keeping the Compatibility but unnecessary (most possibly unused ) code will be present in repo.  

    Also #2 is possible only after HADOOP-16596 is in, to support both 2.5.0 and 3.x versions of protobuf together.

This change is very much mandatory to allow spark(and others, which just imports Token classes) to compile/run successfully without need to explicitly set the protobuf version same as Hadoop.

 

Please let me know your opinions.


was (Author: vinayrpet):
Sorry for being late.

Following public APIs were introduced by HADOOP-12563 back in 2016 in Hadoop 3.0.0 version.
{code:java}
public Token(TokenProto tokenPB);

public TokenProto toTokenProto();
{code}
Ideally there should not be any public API with @Public interface with protobuf in signature.
 Right now, this is breaking the binary compatibility of downstream due to protobuf version upgrade. Because generated proto classes' super class name is changed to {{GeneratedMessage3}} from {{GeneratedMessage}} in 2.5.0 protobuf.

So possible options to proceed will be only
 # Remove all public methods with protobuf signature replace with helper classes to do the same job. as being done in HDFS' {{PBHelperClient.java}}. This will break the compatibility if by any chance these methods are being used outside hadoop-common module (also Hadoop project overall, as upgrade happens all Hadoop components together).
 # Mark methods deprecated, Keep the old 'TokenProto' class with 2.5.0 generated protobuf committed to repo. And rename current {{TokenProto}} to {{TokenProto3}} and all their occurances throughout project (Hopefully TokenProto is not used outside Hadoop project). And skip shading of 2.5.0 TokenProto. Can remove methods and committed TokenProto class.

 

Approach #1 is would be easy and direct change, but again compatibility issue if these methods used by other projects which is most unlikely.

[~stevel@apache.org] / [~vinodkv] / [~raviprak] is it okay to remove above mentioned methods ? and replace with something similar to {{PBHelperClient#convert(Token<?> tok)}} and {{PBHelperClient#convert(TokenProto tok)}}

 

Approach #2 is a workaround still keeping the Compatibility but unnecessary (most possibly unused ) code will be present in repo.


 This change is very much mandatory to allow spark(and others, which just imports Token classes) to compile/run successfully without need to explicitly set the protobuf version same as Hadoop.

 

Please let me know your opinions.

> [pb-upgrade] spark-hive doesn't compile against hadoop trunk because of Token's marshalling
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-16621
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16621
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: common
>    Affects Versions: 3.3.0
>            Reporter: Steve Loughran
>            Priority: Major
>
> the move to protobuf 3.x stops spark building because Token has a method which returns a protobuf, and now its returning some v3 types.
> if we want to isolate downstream code from protobuf changes, we need to move that marshalling method from token and put in a helper class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org