You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2021/03/28 09:29:00 UTC

[jira] [Comment Edited] (TEZ-4152) Upgrade to protobuf 3.x and take care of relocated protobuf classes

    [ https://issues.apache.org/jira/browse/TEZ-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090713#comment-17090713 ] 

László Bodor edited comment on TEZ-4152 at 3/28/21, 9:28 AM:
-------------------------------------------------------------

UPDATE: feel free to skip this comment, now it seems like I'm proceeding in TEZ-4154
------------------

seems like it's a bit harder than I first thought
my plan was a solution where we don't need to change tez protobuf code when we upgrade to hadoop 3.3, I was thinking of some wrappers, but I bumped into walls everywhere, so far

Here are the current constraints:
1. Hadoop's rpc engine will cast the provided message instance to its current Message class (now it's com.google.protobuf.Message, but from hadoop 3.3, it'll be a org.apache.hadoop.thirdparty.protobuf)
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L237

2. Considering the exception trace above, let's start from TezClient.submitDAGSession, which calls a method which implements a generated interface and expects a specific message subclass: SubmitDAGRequestProto
https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/api/client/rpc/DAGClientAMProtocolBlockingPBServerImpl.java#L159
SubmitDAGRequestProto is final

So I need an instance which is SubmitDAGRequestProto but cannot be a subclass of it (it's final) but at the same time I need an abstraction from Message interface (as I want it to be easily pluggable) 

1. if I simply wrap original SubmitDAGRequestProto into a <? extends Message>, which esentially a superclass of SubmitDAGRequestProto, I would solve the hadoop adaptation (as the wrapper can be changed easily for the whole project), but I cannot cast it back to SubmitDAGRequestProto
2. I cannot even use a subclass of SubmitDAGRequestProto for wrapping it, becuase SubmitDAGRequestProto is generated as final by protobuf by design

cc: [~ashutoshc], [~jeagles]
or we can do hadoop 3.3 upgrade with a breaking change, where we simply adapt to its new thirdparty protobuf package
I still think there is a clean, non-breaking way of creating some abstraction layer between hadoop and tez, but I haven't figured out

(I was thinking about shading as always, but in this case shading itself is kind of orthogonal to the problems mentioned above)


was (Author: abstractdog):
UPDATE: feel free to skip this comment, now it seems like I'm proceeding in TEZ-4154
------------------

seems like it's a bit harder than I first thought
my plan was a solution where we don't need to change tez protobuf code when we upgrade to hadoop 3.3, I was thinking of some wrappers, but I bumped into walls everywhere, so far

Here are the current constraints:
1. Hadoop's rpc engine will cast the provided message instance to its current Message class (now it's com.google.protobuf.Message, from hadoop 3.3, it'll be a org.apache.hadoop.thirdparty.protobuf)
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L237

2. Considering the exception trace above, let's start from TezClient.submitDAGSession, which calls a method which implements a generated interface and expects a specific message subclass: SubmitDAGRequestProto
https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/api/client/rpc/DAGClientAMProtocolBlockingPBServerImpl.java#L159
SubmitDAGRequestProto is final

So I need an instance which is SubmitDAGRequestProto but cannot be a subclass of it (it's final) but at the same time I need an abstraction from Message interface (as I want it to be easily pluggable) 

1. if I simply wrap original SubmitDAGRequestProto into a <? extends Message>, which esentially a superclass of SubmitDAGRequestProto, I would solve the hadoop adaptation (as the wrapper can be changed easily for the whole project), but I cannot cast it back to SubmitDAGRequestProto
2. I cannot even use a subclass of SubmitDAGRequestProto for wrapping it, becuase SubmitDAGRequestProto is generated as final by protobuf by design

cc: [~ashutoshc], [~jeagles]
or we can do hadoop 3.3 upgrade with a breaking change, where we simply adapt to its new thirdparty protobuf package
I still think there is a clean, non-breaking way of creating some abstraction layer between hadoop and tez, but I haven't figured out

(I was thinking about shading as always, but in this case shading itself is kind of orthogonal to the problems mentioned above)

> Upgrade to protobuf 3.x and take care of relocated protobuf classes
> -------------------------------------------------------------------
>
>                 Key: TEZ-4152
>                 URL: https://issues.apache.org/jira/browse/TEZ-4152
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>
> Jiras under HADOOP-13363 cover the process of protobuf upgrade and relocation in Hadoop.
> Tez is on protobuf 2.5, while hadoop 3.3 is on protobuf 3.x.
> Tez usually follows hadoop with dependencies, so a hadoop 3.3 upgrade means a protobuf upgrade in tez as well, and an additional relocation-ish step will be needed in tez as e.g. hadoop expects protobuf messages with org.apache.hadoop.thirdparty.protobuf package, even if it's not exposed to public APIs, for example:
> {code}
> java.lang.ClassCastException: org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$SubmitDAGRequestProto cannot be cast to org.apache.hadoop.thirdparty.protobuf.Message
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> 	at com.sun.proxy.$Proxy11.submitDAG(Unknown Source)
> 	at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:706)
> 	at org.apache.tez.client.TezClient.submitDAG(TezClient.java:593)
> 	at org.apache.tez.dag.app.TestMockDAGAppMaster.testMixedEdgeRouting(TestMockDAGAppMaster.java:392)
> {code}
> Here is the failing line:
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java#L237
> {code}
> final Message theRequest = (Message) args[1];
> {code}
> relocation means full rewrite in binary, so here Message refers to org.apache.hadoop.thirdparty.protobuf.Message, but tez supplies a non-relocated one, which implements com.google.protobuf.Message (hadoop method signature contains Object, so it compiles)
> so even if protobuf 2.5 is fully compatible with 3.x (not checked yet), we will need extra effort on tez side to generate protobuf messages which are compatible with hadoop relocated messages...as these classes are generated from proto files, we cannot and shouldn't hack them by manual java source code manipulation
> I'm thinking of a maven profile based approach which can take care of both protobuf 3 and relocation compatible protobuf objects from tez side



--
This message was sent by Atlassian Jira
(v8.3.4#803005)