You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Andrew Otto <ot...@wikimedia.org> on 2022/06/09 12:29:06 UTC

Flink Shaded dependencies and extending Flink APIs

Hi all,

I'm working on an integration project trying to write some library code
that will allow us at the Wikimedia Foundation to use Flink with our 'Event
Platform <https://wikitech.wikimedia.org/wiki/Event_Platform>'.
Specifically, I'm trying to write a reusable step near the end of a
pipeline that will ensure our JSON events satisfy some criteria before
producing them to Kafka.  Details here
<https://phabricator.wikimedia.org/T310218>.

I'm experimenting with writing my own custom format
<https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/#encoding--decoding-formats>
to
do this.  But all I really need to do is override
JsonRowDataSerializationSchema's
serialize method
<https://github.com/apache/flink/blob/master/flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/JsonRowDataSerializationSchema.java#L90-L101>
and
augment and validate the ObjectNode before it is serialized to byte[].

I'm running into an issue where the ObjectNode that is used by Flink here
is the shaded one: org.apache.flink.shaded.jackson2.com.fasterxml.jackson.
databind.node.ObjectNode, whereas the WMF code
<https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities/src/main/java/org/wikimedia/eventutilities/core/event/JsonEventGenerator.java#85>
I want to use to augment the ObjectNode is using a regular non shaded one.
I can't pass the shaded ObjectNode instance to a function that takes a non
shaded one, and I can't cast the shaded ObjectNode to non shaded either.

My Q is: is there a way to extend Flink APIs that use shaded dependencies?
I suppose I could copy/paste the whole of the "json" format code that I
need into my project and just make it my own, but this feels quite
obnoxious.

Thank you!
-Andrew Otto
 Wikimedia Foundation

Re: Flink Shaded dependencies and extending Flink APIs

Posted by Andrew Otto <ot...@wikimedia.org>.
Hi all thanks for the responses.

> Create a module let's say "wikimedia-event-utilities-shaded"
This actually doesn't help me, as wikimedia-event-utilities is used as an
API by non Flink stuff too, so I don't want to use the shaded ObjectNode in
the API params.

> Another solution is that you can serialize then deserialize the "different"
ObjectNode
Haha, I thought of this too and then was like...no way, too crazy!

> Both flink-shaded, any relocation pattern and JsonRowDataSerializationSchema
are Flink internals that users shouldn't use/rely on.
Yeah, in hindsight, I think the right solution is to make my own
SerializationSchema, even if that is mostly copy/pasting the internal Flink
one, rather than extending it.

I have another question around JSON and Flink, but I'll start a new thread
for that.

Thank you!




On Mon, Jun 13, 2022 at 7:17 AM Chesnay Schepler <ch...@apache.org> wrote:

> Can we find a more robust way to support this?
>
> Both flink-shaded, any relocation pattern and
> JsonRowDataSerializationSchema are Flink internals that users shouldn't
> use/rely on.
>
> On 13/06/2022 12:26, Qingsheng Ren wrote:
> > Hi Andrew,
> >
> > This is indeed a tricky case since Flink doesn't provide non-shaded
> > JAR for flink-json. One hacky solution in my mind is like:
> >
> > 1. Create a module let's say "wikimedia-event-utilities-shaded" that
> > relocates Jackson in the same way and uses the same Jackson version as
> > flink-shaded-jackson
> > 2. Deploy the module to a local or remote Maven repository
> > 3. Let your custom format depend on the
> > "wikimedia-event-utilities-shaded" module, then all Jackson
> > dependencies are relocated in the same way.
> >
> > Another solution is that you can serialize then deserialize the
> > "different" ObjectNode to do the conversion but this sacrifices the
> > performance.
> >
> > Hope this could be helpful!
> >
> > Best regards,
> >
> > Qingsheng
> >
> > On Thu, Jun 9, 2022 at 8:29 PM Andrew Otto <ot...@wikimedia.org> wrote:
> >> Hi all,
> >>
> >> I'm working on an integration project trying to write some library code
> that will allow us at the Wikimedia Foundation to use Flink with our 'Event
> Platform'.  Specifically, I'm trying to write a reusable step near the end
> of a pipeline that will ensure our JSON events satisfy some criteria before
> producing them to Kafka.  Details here.
> >>
> >> I'm experimenting with writing my own custom format to do this.  But
> all I really need to do is override JsonRowDataSerializationSchema's
> serialize method and augment and validate the ObjectNode before it is
> serialized to byte[].
> >>
> >> I'm running into an issue where the ObjectNode that is used by Flink
> here is the shaded one:
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ObjectNode,
> whereas the WMF code I want to use to augment the ObjectNode is using a
> regular non shaded one.  I can't pass the shaded ObjectNode instance to a
> function that takes a non shaded one, and I can't cast the shaded
> ObjectNode to non shaded either.
> >>
> >> My Q is: is there a way to extend Flink APIs that use shaded
> dependencies?  I suppose I could copy/paste the whole of the "json" format
> code that I need into my project and just make it my own, but this feels
> quite obnoxious.
> >>
> >> Thank you!
> >> -Andrew Otto
> >>   Wikimedia Foundation
> >>
> >>
>
>

Re: Flink Shaded dependencies and extending Flink APIs

Posted by Chesnay Schepler <ch...@apache.org>.
Can we find a more robust way to support this?

Both flink-shaded, any relocation pattern and 
JsonRowDataSerializationSchema are Flink internals that users shouldn't 
use/rely on.

On 13/06/2022 12:26, Qingsheng Ren wrote:
> Hi Andrew,
>
> This is indeed a tricky case since Flink doesn't provide non-shaded
> JAR for flink-json. One hacky solution in my mind is like:
>
> 1. Create a module let's say "wikimedia-event-utilities-shaded" that
> relocates Jackson in the same way and uses the same Jackson version as
> flink-shaded-jackson
> 2. Deploy the module to a local or remote Maven repository
> 3. Let your custom format depend on the
> "wikimedia-event-utilities-shaded" module, then all Jackson
> dependencies are relocated in the same way.
>
> Another solution is that you can serialize then deserialize the
> "different" ObjectNode to do the conversion but this sacrifices the
> performance.
>
> Hope this could be helpful!
>
> Best regards,
>
> Qingsheng
>
> On Thu, Jun 9, 2022 at 8:29 PM Andrew Otto <ot...@wikimedia.org> wrote:
>> Hi all,
>>
>> I'm working on an integration project trying to write some library code that will allow us at the Wikimedia Foundation to use Flink with our 'Event Platform'.  Specifically, I'm trying to write a reusable step near the end of a pipeline that will ensure our JSON events satisfy some criteria before producing them to Kafka.  Details here.
>>
>> I'm experimenting with writing my own custom format to do this.  But all I really need to do is override JsonRowDataSerializationSchema's serialize method and augment and validate the ObjectNode before it is serialized to byte[].
>>
>> I'm running into an issue where the ObjectNode that is used by Flink here is the shaded one: org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ObjectNode, whereas the WMF code I want to use to augment the ObjectNode is using a regular non shaded one.  I can't pass the shaded ObjectNode instance to a function that takes a non shaded one, and I can't cast the shaded ObjectNode to non shaded either.
>>
>> My Q is: is there a way to extend Flink APIs that use shaded dependencies?  I suppose I could copy/paste the whole of the "json" format code that I need into my project and just make it my own, but this feels quite obnoxious.
>>
>> Thank you!
>> -Andrew Otto
>>   Wikimedia Foundation
>>
>>


Re: Flink Shaded dependencies and extending Flink APIs

Posted by Qingsheng Ren <re...@gmail.com>.
Hi Andrew,

This is indeed a tricky case since Flink doesn't provide non-shaded
JAR for flink-json. One hacky solution in my mind is like:

1. Create a module let's say "wikimedia-event-utilities-shaded" that
relocates Jackson in the same way and uses the same Jackson version as
flink-shaded-jackson
2. Deploy the module to a local or remote Maven repository
3. Let your custom format depend on the
"wikimedia-event-utilities-shaded" module, then all Jackson
dependencies are relocated in the same way.

Another solution is that you can serialize then deserialize the
"different" ObjectNode to do the conversion but this sacrifices the
performance.

Hope this could be helpful!

Best regards,

Qingsheng

On Thu, Jun 9, 2022 at 8:29 PM Andrew Otto <ot...@wikimedia.org> wrote:
>
> Hi all,
>
> I'm working on an integration project trying to write some library code that will allow us at the Wikimedia Foundation to use Flink with our 'Event Platform'.  Specifically, I'm trying to write a reusable step near the end of a pipeline that will ensure our JSON events satisfy some criteria before producing them to Kafka.  Details here.
>
> I'm experimenting with writing my own custom format to do this.  But all I really need to do is override JsonRowDataSerializationSchema's serialize method and augment and validate the ObjectNode before it is serialized to byte[].
>
> I'm running into an issue where the ObjectNode that is used by Flink here is the shaded one: org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ObjectNode, whereas the WMF code I want to use to augment the ObjectNode is using a regular non shaded one.  I can't pass the shaded ObjectNode instance to a function that takes a non shaded one, and I can't cast the shaded ObjectNode to non shaded either.
>
> My Q is: is there a way to extend Flink APIs that use shaded dependencies?  I suppose I could copy/paste the whole of the "json" format code that I need into my project and just make it my own, but this feels quite obnoxious.
>
> Thank you!
> -Andrew Otto
>  Wikimedia Foundation
>
>