You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <ja...@potiuk.com> on 2022/02/01 14:28:53 UTC

[sig-multitenancy]: Meeting for multi-tenancy state and AIP-45/AIP-46 introductory discussions

Hello Everyone,

 I think it's about the time for the next sig-multitenancy meeting :

I created a doodle poll for next week - please mark your availability till
Friday the 4th.

https://doodle.com/poll/axvu2gz7zhv8ieye?utm_source=poll&utm_medium=link

I think what the rough agenda will be:

* AIP-43 Dag Processor Separation [1] - implementation progress - Mateusz
* AIP-44 Airflow Internal API [2] - voting progress (hopefully) -  Jarek
* AIP-45 Remove double DAG parsing [3] -  discussion - Ping
* AIP-46 Docker runtime isolation [4] - discussion - Ping
* Also there are some ideas (not yet in AIP form) around optimizing
DagProcessorLoop that might be good to talk about - also Ping.

If there are any more proposals - feel free to ping me.
I also encourage everyone to comment the AIP-45/46 proposals from Ping
before the meeting.

[1]
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation

[2]
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
[3]
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-45+Remove+double+dag+parsing+in+airflow+run
[4]
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Add+support+for+docker+runtime+isolation+for+airflow+tasks+and+dag+parsing

J.

Re: [sig-multitenancy]: Meeting for multi-tenancy state and AIP-45/AIP-46 introductory discussions

Posted by Jarek Potiuk <ja...@potiuk.com>.
BTW. I re-read your comment in the AIP and yeah ... I think I completely
misunderstood it :)

On Wed, Feb 16, 2022 at 6:08 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Just a reminder - meeting in ~ 50 minutes :)
>
> On Wed, Feb 16, 2022 at 2:34 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Happy to hear if others have some experiences with in-process (and what I
>> really want is to make some benchmarking to see how much overhead each
>> option involves. I'd say that the "coarseness" of the calls (with maybe
>> exception of Connection/variable retrieval etc. will make the
>> serialization/deserialization will have very little impact on performance
>> (but without actually checking it it's hard to say for sure). Another
>> option is if inter-process communication will turn into a problem (and I
>> saw people doing it in C++) - people did "rip" some parts of thrift to only
>> leave a "serialization/deserialization". But in our case - if we find that
>> either the need to have separate process or communication involves a lot of
>> overhead we could come back to the idea of delegating the calls via
>> decorators.
>>
>> On Wed, Feb 16, 2022 at 2:22 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> I looked at that too - and let me leave that as an option to explore in
>>> the first step. I will make a note.
>>>
>>> From what I checked - none of the current "ready-to-use" gRPC solutions
>>> have such an "in-process" option. I believe the "RPC framework re-use" for
>>> serialization/deserialization/transport might save a LOT of headache.
>>>
>>> However - Apache Thrift supports "shared-memory" transport. I still
>>> think it requires a separate process (To be confirmed).
>>> The gRPC  one supports local TCP and Unix Sockets only. The in-memory
>>> option is not there (though people asked for it
>>> https://github.com/grpc/grpc/issues/19959)
>>>
>>> J.
>>>
>>>
>>> On Wed, Feb 16, 2022 at 2:13 PM Ash Berlin-Taylor <as...@apache.org>
>>> wrote:
>>>
>>>> That wasn't actually quite what I had in mind :)
>>>>
>>>> I was thinking that we _wouldn't_ go cross process at all, but in the
>>>> "local"/direct mode we will as-directly-as-possible call the handler code.
>>>> So for local/no-isolation we would still use the handler for the RPC, but
>>>> there it's just not "remote".
>>>>
>>>> -ash
>>>>
>>>> On Wed, Feb 16 2022 at 13:01:11 +0100, Jarek Potiuk <ja...@potiuk.com>
>>>> wrote:
>>>>
>>>> Hey Everyone,
>>>>
>>>> Based on the feedback, I updated DAG-44
>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>> - the "implementation notes" with improved approach.
>>>>
>>>> Ash had a good suggestion (which I really like) that instead of
>>>> inventing our own decorators and different way of handling the internal and
>>>> external communication for the "coarse" functions that require the
>>>> database, we could approach it  differently - namely we could always use
>>>> RPC - no matter if we are in DB isolation mode or "no isolation" mode. Of
>>>> course in case of the "no isolation" mode, the communication should have
>>>> very low overhead (local TCP or Sockets, no authorization). I looked at
>>>> existing RPC implementations we could use for that and I narrowed down
>>>> potential choice of technologies to gRPC and Apache Thrift for that.
>>>>
>>>> This approach has multiple advantages:
>>>>
>>>> * we can leverage existing RPC implementations (Thrift and gRPC are
>>>> both mature and have integration with HTTPS, various authentication options
>>>> and can be also run using local sockets)
>>>> * the code will be much simpler to maintain - we will use existing
>>>> serialization mechanisms from those protocols
>>>> * no custom code for communication needed - both Thrift and gRPC have
>>>> all that is needed for scalable, robust communication
>>>>
>>>> I think this way we will be able to implement a more robust and
>>>> maintainable solution much faster.
>>>>
>>>> I also reached out to Apache Beam (they have support for both gRPC and
>>>> Thrift and are in the process of transitioning - from Thrift to gRPC as
>>>> primary protocol and I am sure they have done a lot of analysis that can
>>>> help us to make the final decision.
>>>>
>>>> This approach changes only the implementation details of the AIP-44 -
>>>> all the rest is the same, the approach, deployment options remain untouched
>>>> by this change.
>>>>
>>>> If you have any comments to that - feel free/ I will also discuss it
>>>> today at the meeting and if there will be general consensus that the
>>>> direction is right I would love to start voting on AIP-44 ideally tomorrow
>>>> - so that next week we can start implementing it. I am not sure if we want
>>>> to make a final decision about gRPC/Thrift (maybe there are people who have
>>>> good experience both and can share it here?).
>>>>
>>>> I think more detailed POC and benchmarking might be the first step of
>>>> the AiP - where we make the final choice based on an attempt to implement
>>>> POC for both - but I am also happy to listen to those who have more
>>>> experience with both (and maybe Beam experience will help with that)..
>>>>
>>>> J.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Feb 15, 2022 at 1:49 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>
>>>>> The meeting is tomorrow :)/ Feel free to join I will also record it
>>>>> and publish minutes!
>>>>>
>>>>> On Tue, Feb 15, 2022 at 12:31 PM Giorgio Zoppi <
>>>>> giorgio.zoppi@gmail.com> wrote:
>>>>> >
>>>>> > Hello Everyone,
>>>>> > is there any follow up of this meeting? I would like to participate
>>>>> if it's possible.
>>>>> > Best Regards,
>>>>> > Giorgio
>>>>> >
>>>>> > Il giorno mar 1 feb 2022 alle ore 15:29 Jarek Potiuk <
>>>>> jarek@potiuk.com> ha scritto:
>>>>> >>
>>>>> >> Hello Everyone,
>>>>> >>
>>>>> >>  I think it's about the time for the next sig-multitenancy meeting :
>>>>> >>
>>>>> >> I created a doodle poll for next week - please mark your
>>>>> availability till Friday the 4th.
>>>>> >>
>>>>> >>
>>>>> https://doodle.com/poll/axvu2gz7zhv8ieye?utm_source=poll&utm_medium=link
>>>>> >>
>>>>> >> I think what the rough agenda will be:
>>>>> >>
>>>>> >> * AIP-43 Dag Processor Separation [1] - implementation progress -
>>>>> Mateusz
>>>>> >> * AIP-44 Airflow Internal API [2] - voting progress (hopefully) -
>>>>> Jarek
>>>>> >> * AIP-45 Remove double DAG parsing [3] -  discussion - Ping
>>>>> >> * AIP-46 Docker runtime isolation [4] - discussion - Ping
>>>>> >> * Also there are some ideas (not yet in AIP form) around optimizing
>>>>> DagProcessorLoop that might be good to talk about - also Ping.
>>>>> >>
>>>>> >> If there are any more proposals - feel free to ping me.
>>>>> >> I also encourage everyone to comment the AIP-45/46 proposals from
>>>>> Ping before the meeting.
>>>>> >>
>>>>> >> [1]
>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation
>>>>> >> [2]
>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>> >> [3]
>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-45+Remove+double+dag+parsing+in+airflow+run
>>>>> >> [4]
>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Add+support+for+docker+runtime+isolation+for+airflow+tasks+and+dag+parsing
>>>>> >>
>>>>> >> J.
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Life is a chess game - Anonymous.
>>>>>
>>>>

Re: [sig-multitenancy]: Meeting for multi-tenancy state and AIP-45/AIP-46 introductory discussions

Posted by Jarek Potiuk <ja...@potiuk.com>.
Just a reminder - meeting in ~ 50 minutes :)

On Wed, Feb 16, 2022 at 2:34 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Happy to hear if others have some experiences with in-process (and what I
> really want is to make some benchmarking to see how much overhead each
> option involves. I'd say that the "coarseness" of the calls (with maybe
> exception of Connection/variable retrieval etc. will make the
> serialization/deserialization will have very little impact on performance
> (but without actually checking it it's hard to say for sure). Another
> option is if inter-process communication will turn into a problem (and I
> saw people doing it in C++) - people did "rip" some parts of thrift to only
> leave a "serialization/deserialization". But in our case - if we find that
> either the need to have separate process or communication involves a lot of
> overhead we could come back to the idea of delegating the calls via
> decorators.
>
> On Wed, Feb 16, 2022 at 2:22 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> I looked at that too - and let me leave that as an option to explore in
>> the first step. I will make a note.
>>
>> From what I checked - none of the current "ready-to-use" gRPC solutions
>> have such an "in-process" option. I believe the "RPC framework re-use" for
>> serialization/deserialization/transport might save a LOT of headache.
>>
>> However - Apache Thrift supports "shared-memory" transport. I still think
>> it requires a separate process (To be confirmed).
>> The gRPC  one supports local TCP and Unix Sockets only. The in-memory
>> option is not there (though people asked for it
>> https://github.com/grpc/grpc/issues/19959)
>>
>> J.
>>
>>
>> On Wed, Feb 16, 2022 at 2:13 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>>
>>> That wasn't actually quite what I had in mind :)
>>>
>>> I was thinking that we _wouldn't_ go cross process at all, but in the
>>> "local"/direct mode we will as-directly-as-possible call the handler code.
>>> So for local/no-isolation we would still use the handler for the RPC, but
>>> there it's just not "remote".
>>>
>>> -ash
>>>
>>> On Wed, Feb 16 2022 at 13:01:11 +0100, Jarek Potiuk <ja...@potiuk.com>
>>> wrote:
>>>
>>> Hey Everyone,
>>>
>>> Based on the feedback, I updated DAG-44
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>> - the "implementation notes" with improved approach.
>>>
>>> Ash had a good suggestion (which I really like) that instead of
>>> inventing our own decorators and different way of handling the internal and
>>> external communication for the "coarse" functions that require the
>>> database, we could approach it  differently - namely we could always use
>>> RPC - no matter if we are in DB isolation mode or "no isolation" mode. Of
>>> course in case of the "no isolation" mode, the communication should have
>>> very low overhead (local TCP or Sockets, no authorization). I looked at
>>> existing RPC implementations we could use for that and I narrowed down
>>> potential choice of technologies to gRPC and Apache Thrift for that.
>>>
>>> This approach has multiple advantages:
>>>
>>> * we can leverage existing RPC implementations (Thrift and gRPC are both
>>> mature and have integration with HTTPS, various authentication options and
>>> can be also run using local sockets)
>>> * the code will be much simpler to maintain - we will use existing
>>> serialization mechanisms from those protocols
>>> * no custom code for communication needed - both Thrift and gRPC have
>>> all that is needed for scalable, robust communication
>>>
>>> I think this way we will be able to implement a more robust and
>>> maintainable solution much faster.
>>>
>>> I also reached out to Apache Beam (they have support for both gRPC and
>>> Thrift and are in the process of transitioning - from Thrift to gRPC as
>>> primary protocol and I am sure they have done a lot of analysis that can
>>> help us to make the final decision.
>>>
>>> This approach changes only the implementation details of the AIP-44 -
>>> all the rest is the same, the approach, deployment options remain untouched
>>> by this change.
>>>
>>> If you have any comments to that - feel free/ I will also discuss it
>>> today at the meeting and if there will be general consensus that the
>>> direction is right I would love to start voting on AIP-44 ideally tomorrow
>>> - so that next week we can start implementing it. I am not sure if we want
>>> to make a final decision about gRPC/Thrift (maybe there are people who have
>>> good experience both and can share it here?).
>>>
>>> I think more detailed POC and benchmarking might be the first step of
>>> the AiP - where we make the final choice based on an attempt to implement
>>> POC for both - but I am also happy to listen to those who have more
>>> experience with both (and maybe Beam experience will help with that)..
>>>
>>> J.
>>>
>>>
>>>
>>>
>>> On Tue, Feb 15, 2022 at 1:49 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>>> The meeting is tomorrow :)/ Feel free to join I will also record it
>>>> and publish minutes!
>>>>
>>>> On Tue, Feb 15, 2022 at 12:31 PM Giorgio Zoppi <gi...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hello Everyone,
>>>> > is there any follow up of this meeting? I would like to participate
>>>> if it's possible.
>>>> > Best Regards,
>>>> > Giorgio
>>>> >
>>>> > Il giorno mar 1 feb 2022 alle ore 15:29 Jarek Potiuk <
>>>> jarek@potiuk.com> ha scritto:
>>>> >>
>>>> >> Hello Everyone,
>>>> >>
>>>> >>  I think it's about the time for the next sig-multitenancy meeting :
>>>> >>
>>>> >> I created a doodle poll for next week - please mark your
>>>> availability till Friday the 4th.
>>>> >>
>>>> >>
>>>> https://doodle.com/poll/axvu2gz7zhv8ieye?utm_source=poll&utm_medium=link
>>>> >>
>>>> >> I think what the rough agenda will be:
>>>> >>
>>>> >> * AIP-43 Dag Processor Separation [1] - implementation progress -
>>>> Mateusz
>>>> >> * AIP-44 Airflow Internal API [2] - voting progress (hopefully) -
>>>> Jarek
>>>> >> * AIP-45 Remove double DAG parsing [3] -  discussion - Ping
>>>> >> * AIP-46 Docker runtime isolation [4] - discussion - Ping
>>>> >> * Also there are some ideas (not yet in AIP form) around optimizing
>>>> DagProcessorLoop that might be good to talk about - also Ping.
>>>> >>
>>>> >> If there are any more proposals - feel free to ping me.
>>>> >> I also encourage everyone to comment the AIP-45/46 proposals from
>>>> Ping before the meeting.
>>>> >>
>>>> >> [1]
>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation
>>>> >> [2]
>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>> >> [3]
>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-45+Remove+double+dag+parsing+in+airflow+run
>>>> >> [4]
>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Add+support+for+docker+runtime+isolation+for+airflow+tasks+and+dag+parsing
>>>> >>
>>>> >> J.
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> > --
>>>> > Life is a chess game - Anonymous.
>>>>
>>>

Re: [sig-multitenancy]: Meeting for multi-tenancy state and AIP-45/AIP-46 introductory discussions

Posted by Jarek Potiuk <ja...@potiuk.com>.
Happy to hear if others have some experiences with in-process (and what I
really want is to make some benchmarking to see how much overhead each
option involves. I'd say that the "coarseness" of the calls (with maybe
exception of Connection/variable retrieval etc. will make the
serialization/deserialization will have very little impact on performance
(but without actually checking it it's hard to say for sure). Another
option is if inter-process communication will turn into a problem (and I
saw people doing it in C++) - people did "rip" some parts of thrift to only
leave a "serialization/deserialization". But in our case - if we find that
either the need to have separate process or communication involves a lot of
overhead we could come back to the idea of delegating the calls via
decorators.

On Wed, Feb 16, 2022 at 2:22 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> I looked at that too - and let me leave that as an option to explore in
> the first step. I will make a note.
>
> From what I checked - none of the current "ready-to-use" gRPC solutions
> have such an "in-process" option. I believe the "RPC framework re-use" for
> serialization/deserialization/transport might save a LOT of headache.
>
> However - Apache Thrift supports "shared-memory" transport. I still think
> it requires a separate process (To be confirmed).
> The gRPC  one supports local TCP and Unix Sockets only. The in-memory
> option is not there (though people asked for it
> https://github.com/grpc/grpc/issues/19959)
>
> J.
>
>
> On Wed, Feb 16, 2022 at 2:13 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> That wasn't actually quite what I had in mind :)
>>
>> I was thinking that we _wouldn't_ go cross process at all, but in the
>> "local"/direct mode we will as-directly-as-possible call the handler code.
>> So for local/no-isolation we would still use the handler for the RPC, but
>> there it's just not "remote".
>>
>> -ash
>>
>> On Wed, Feb 16 2022 at 13:01:11 +0100, Jarek Potiuk <ja...@potiuk.com>
>> wrote:
>>
>> Hey Everyone,
>>
>> Based on the feedback, I updated DAG-44
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>> - the "implementation notes" with improved approach.
>>
>> Ash had a good suggestion (which I really like) that instead of inventing
>> our own decorators and different way of handling the internal and external
>> communication for the "coarse" functions that require the database, we
>> could approach it  differently - namely we could always use RPC - no matter
>> if we are in DB isolation mode or "no isolation" mode. Of course in case of
>> the "no isolation" mode, the communication should have very low overhead
>> (local TCP or Sockets, no authorization). I looked at existing RPC
>> implementations we could use for that and I narrowed down potential choice
>> of technologies to gRPC and Apache Thrift for that.
>>
>> This approach has multiple advantages:
>>
>> * we can leverage existing RPC implementations (Thrift and gRPC are both
>> mature and have integration with HTTPS, various authentication options and
>> can be also run using local sockets)
>> * the code will be much simpler to maintain - we will use existing
>> serialization mechanisms from those protocols
>> * no custom code for communication needed - both Thrift and gRPC have all
>> that is needed for scalable, robust communication
>>
>> I think this way we will be able to implement a more robust and
>> maintainable solution much faster.
>>
>> I also reached out to Apache Beam (they have support for both gRPC and
>> Thrift and are in the process of transitioning - from Thrift to gRPC as
>> primary protocol and I am sure they have done a lot of analysis that can
>> help us to make the final decision.
>>
>> This approach changes only the implementation details of the AIP-44 - all
>> the rest is the same, the approach, deployment options remain untouched by
>> this change.
>>
>> If you have any comments to that - feel free/ I will also discuss it
>> today at the meeting and if there will be general consensus that the
>> direction is right I would love to start voting on AIP-44 ideally tomorrow
>> - so that next week we can start implementing it. I am not sure if we want
>> to make a final decision about gRPC/Thrift (maybe there are people who have
>> good experience both and can share it here?).
>>
>> I think more detailed POC and benchmarking might be the first step of the
>> AiP - where we make the final choice based on an attempt to implement POC
>> for both - but I am also happy to listen to those who have more experience
>> with both (and maybe Beam experience will help with that)..
>>
>> J.
>>
>>
>>
>>
>> On Tue, Feb 15, 2022 at 1:49 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> The meeting is tomorrow :)/ Feel free to join I will also record it
>>> and publish minutes!
>>>
>>> On Tue, Feb 15, 2022 at 12:31 PM Giorgio Zoppi <gi...@gmail.com>
>>> wrote:
>>> >
>>> > Hello Everyone,
>>> > is there any follow up of this meeting? I would like to participate if
>>> it's possible.
>>> > Best Regards,
>>> > Giorgio
>>> >
>>> > Il giorno mar 1 feb 2022 alle ore 15:29 Jarek Potiuk <ja...@potiuk.com>
>>> ha scritto:
>>> >>
>>> >> Hello Everyone,
>>> >>
>>> >>  I think it's about the time for the next sig-multitenancy meeting :
>>> >>
>>> >> I created a doodle poll for next week - please mark your availability
>>> till Friday the 4th.
>>> >>
>>> >>
>>> https://doodle.com/poll/axvu2gz7zhv8ieye?utm_source=poll&utm_medium=link
>>> >>
>>> >> I think what the rough agenda will be:
>>> >>
>>> >> * AIP-43 Dag Processor Separation [1] - implementation progress -
>>> Mateusz
>>> >> * AIP-44 Airflow Internal API [2] - voting progress (hopefully) -
>>> Jarek
>>> >> * AIP-45 Remove double DAG parsing [3] -  discussion - Ping
>>> >> * AIP-46 Docker runtime isolation [4] - discussion - Ping
>>> >> * Also there are some ideas (not yet in AIP form) around optimizing
>>> DagProcessorLoop that might be good to talk about - also Ping.
>>> >>
>>> >> If there are any more proposals - feel free to ping me.
>>> >> I also encourage everyone to comment the AIP-45/46 proposals from
>>> Ping before the meeting.
>>> >>
>>> >> [1]
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation
>>> >> [2]
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>> >> [3]
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-45+Remove+double+dag+parsing+in+airflow+run
>>> >> [4]
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Add+support+for+docker+runtime+isolation+for+airflow+tasks+and+dag+parsing
>>> >>
>>> >> J.
>>> >>
>>> >>
>>> >
>>> >
>>> > --
>>> > Life is a chess game - Anonymous.
>>>
>>

Re: [sig-multitenancy]: Meeting for multi-tenancy state and AIP-45/AIP-46 introductory discussions

Posted by Jarek Potiuk <ja...@potiuk.com>.
I looked at that too - and let me leave that as an option to explore in the
first step. I will make a note.

From what I checked - none of the current "ready-to-use" gRPC solutions
have such an "in-process" option. I believe the "RPC framework re-use" for
serialization/deserialization/transport might save a LOT of headache.

However - Apache Thrift supports "shared-memory" transport. I still think
it requires a separate process (To be confirmed).
The gRPC  one supports local TCP and Unix Sockets only. The in-memory
option is not there (though people asked for it
https://github.com/grpc/grpc/issues/19959)

J.


On Wed, Feb 16, 2022 at 2:13 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> That wasn't actually quite what I had in mind :)
>
> I was thinking that we _wouldn't_ go cross process at all, but in the
> "local"/direct mode we will as-directly-as-possible call the handler code.
> So for local/no-isolation we would still use the handler for the RPC, but
> there it's just not "remote".
>
> -ash
>
> On Wed, Feb 16 2022 at 13:01:11 +0100, Jarek Potiuk <ja...@potiuk.com>
> wrote:
>
> Hey Everyone,
>
> Based on the feedback, I updated DAG-44
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
> - the "implementation notes" with improved approach.
>
> Ash had a good suggestion (which I really like) that instead of inventing
> our own decorators and different way of handling the internal and external
> communication for the "coarse" functions that require the database, we
> could approach it  differently - namely we could always use RPC - no matter
> if we are in DB isolation mode or "no isolation" mode. Of course in case of
> the "no isolation" mode, the communication should have very low overhead
> (local TCP or Sockets, no authorization). I looked at existing RPC
> implementations we could use for that and I narrowed down potential choice
> of technologies to gRPC and Apache Thrift for that.
>
> This approach has multiple advantages:
>
> * we can leverage existing RPC implementations (Thrift and gRPC are both
> mature and have integration with HTTPS, various authentication options and
> can be also run using local sockets)
> * the code will be much simpler to maintain - we will use existing
> serialization mechanisms from those protocols
> * no custom code for communication needed - both Thrift and gRPC have all
> that is needed for scalable, robust communication
>
> I think this way we will be able to implement a more robust and
> maintainable solution much faster.
>
> I also reached out to Apache Beam (they have support for both gRPC and
> Thrift and are in the process of transitioning - from Thrift to gRPC as
> primary protocol and I am sure they have done a lot of analysis that can
> help us to make the final decision.
>
> This approach changes only the implementation details of the AIP-44 - all
> the rest is the same, the approach, deployment options remain untouched by
> this change.
>
> If you have any comments to that - feel free/ I will also discuss it today
> at the meeting and if there will be general consensus that the direction is
> right I would love to start voting on AIP-44 ideally tomorrow - so that
> next week we can start implementing it. I am not sure if we want to make a
> final decision about gRPC/Thrift (maybe there are people who have good
> experience both and can share it here?).
>
> I think more detailed POC and benchmarking might be the first step of the
> AiP - where we make the final choice based on an attempt to implement POC
> for both - but I am also happy to listen to those who have more experience
> with both (and maybe Beam experience will help with that)..
>
> J.
>
>
>
>
> On Tue, Feb 15, 2022 at 1:49 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> The meeting is tomorrow :)/ Feel free to join I will also record it
>> and publish minutes!
>>
>> On Tue, Feb 15, 2022 at 12:31 PM Giorgio Zoppi <gi...@gmail.com>
>> wrote:
>> >
>> > Hello Everyone,
>> > is there any follow up of this meeting? I would like to participate if
>> it's possible.
>> > Best Regards,
>> > Giorgio
>> >
>> > Il giorno mar 1 feb 2022 alle ore 15:29 Jarek Potiuk <ja...@potiuk.com>
>> ha scritto:
>> >>
>> >> Hello Everyone,
>> >>
>> >>  I think it's about the time for the next sig-multitenancy meeting :
>> >>
>> >> I created a doodle poll for next week - please mark your availability
>> till Friday the 4th.
>> >>
>> >>
>> https://doodle.com/poll/axvu2gz7zhv8ieye?utm_source=poll&utm_medium=link
>> >>
>> >> I think what the rough agenda will be:
>> >>
>> >> * AIP-43 Dag Processor Separation [1] - implementation progress -
>> Mateusz
>> >> * AIP-44 Airflow Internal API [2] - voting progress (hopefully) -
>> Jarek
>> >> * AIP-45 Remove double DAG parsing [3] -  discussion - Ping
>> >> * AIP-46 Docker runtime isolation [4] - discussion - Ping
>> >> * Also there are some ideas (not yet in AIP form) around optimizing
>> DagProcessorLoop that might be good to talk about - also Ping.
>> >>
>> >> If there are any more proposals - feel free to ping me.
>> >> I also encourage everyone to comment the AIP-45/46 proposals from Ping
>> before the meeting.
>> >>
>> >> [1]
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation
>> >> [2]
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>> >> [3]
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-45+Remove+double+dag+parsing+in+airflow+run
>> >> [4]
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Add+support+for+docker+runtime+isolation+for+airflow+tasks+and+dag+parsing
>> >>
>> >> J.
>> >>
>> >>
>> >
>> >
>> > --
>> > Life is a chess game - Anonymous.
>>
>

Re: [sig-multitenancy]: Meeting for multi-tenancy state and AIP-45/AIP-46 introductory discussions

Posted by Ash Berlin-Taylor <as...@apache.org>.
That wasn't actually quite what I had in mind :)

I was thinking that we _wouldn't_ go cross process at all, but in the 
"local"/direct mode we will as-directly-as-possible call the handler 
code. So for local/no-isolation we would still use the handler for the 
RPC, but there it's just not "remote".

-ash

On Wed, Feb 16 2022 at 13:01:11 +0100, Jarek Potiuk <ja...@potiuk.com> 
wrote:
> Hey Everyone,
> 
> Based on the feedback, I updated DAG-44 
> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API> 
> - the "implementation notes" with improved approach.
> 
> Ash had a good suggestion (which I really like) that instead of 
> inventing our own decorators and different way of handling the 
> internal and external communication for the "coarse" functions that 
> require the database, we could approach it  differently - namely we 
> could always use RPC - no matter if we are in DB isolation mode or 
> "no isolation" mode. Of course in case of the "no isolation" mode, 
> the communication should have very low overhead (local TCP or 
> Sockets, no authorization). I looked at existing RPC implementations 
> we could use for that and I narrowed down potential choice of 
> technologies to gRPC and Apache Thrift for that.
> 
> This approach has multiple advantages:
> 
> * we can leverage existing RPC implementations (Thrift and gRPC are 
> both mature and have integration with HTTPS, various authentication 
> options and can be also run using local sockets)
> * the code will be much simpler to maintain - we will use existing 
> serialization mechanisms from those protocols
> * no custom code for communication needed - both Thrift and gRPC have 
> all that is needed for scalable, robust communication
> 
> I think this way we will be able to implement a more robust and 
> maintainable solution much faster.
> 
> I also reached out to Apache Beam (they have support for both gRPC 
> and Thrift and are in the process of transitioning - from Thrift to 
> gRPC as primary protocol and I am sure they have done a lot of 
> analysis that can help us to make the final decision.
> 
> This approach changes only the implementation details of the AIP-44 - 
> all the rest is the same, the approach, deployment options remain 
> untouched by this change.
> 
> If you have any comments to that - feel free/ I will also discuss it 
> today at the meeting and if there will be general consensus that the 
> direction is right I would love to start voting on AIP-44 ideally 
> tomorrow - so that next week we can start implementing it. I am not 
> sure if we want to make a final decision about gRPC/Thrift (maybe 
> there are people who have good experience both and can share it 
> here?).
> 
> I think more detailed POC and benchmarking might be the first step of 
> the AiP - where we make the final choice based on an attempt to 
> implement POC for both - but I am also happy to listen to those who 
> have more experience with both (and maybe Beam experience will help 
> with that)..
> 
> J.
> 
> 
> 
> 
> On Tue, Feb 15, 2022 at 1:49 PM Jarek Potiuk <jarek@potiuk.com 
> <ma...@potiuk.com>> wrote:
>> The meeting is tomorrow :)/ Feel free to join I will also record it
>>  and publish minutes!
>> 
>>  On Tue, Feb 15, 2022 at 12:31 PM Giorgio Zoppi 
>> <giorgio.zoppi@gmail.com <ma...@gmail.com>> wrote:
>>  >
>>  > Hello Everyone,
>>  > is there any follow up of this meeting? I would like to 
>> participate if it's possible.
>>  > Best Regards,
>>  > Giorgio
>>  >
>>  > Il giorno mar 1 feb 2022 alle ore 15:29 Jarek Potiuk 
>> <jarek@potiuk.com <ma...@potiuk.com>> ha scritto:
>>  >>
>>  >> Hello Everyone,
>>  >>
>>  >>  I think it's about the time for the next sig-multitenancy 
>> meeting :
>>  >>
>>  >> I created a doodle poll for next week - please mark your 
>> availability till Friday the 4th.
>>  >>
>>  >> 
>> <https://doodle.com/poll/axvu2gz7zhv8ieye?utm_source=poll&utm_medium=link>
>>  >>
>>  >> I think what the rough agenda will be:
>>  >>
>>  >> * AIP-43 Dag Processor Separation [1] - implementation progress 
>> - Mateusz
>>  >> * AIP-44 Airflow Internal API [2] - voting progress (hopefully) 
>> -  Jarek
>>  >> * AIP-45 Remove double DAG parsing [3] -  discussion - Ping
>>  >> * AIP-46 Docker runtime isolation [4] - discussion - Ping
>>  >> * Also there are some ideas (not yet in AIP form) around 
>> optimizing DagProcessorLoop that might be good to talk about - also 
>> Ping.
>>  >>
>>  >> If there are any more proposals - feel free to ping me.
>>  >> I also encourage everyone to comment the AIP-45/46 proposals 
>> from Ping before the meeting.
>>  >>
>>  >> [1] 
>> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation>
>>  >> [2] 
>> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API>
>>  >> [3] 
>> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-45+Remove+double+dag+parsing+in+airflow+run>
>>  >> [4] 
>> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Add+support+for+docker+runtime+isolation+for+airflow+tasks+and+dag+parsing>
>>  >>
>>  >> J.
>>  >>
>>  >>
>>  >
>>  >
>>  > --
>>  > Life is a chess game - Anonymous.


Re: [sig-multitenancy]: Meeting for multi-tenancy state and AIP-45/AIP-46 introductory discussions

Posted by Jarek Potiuk <ja...@potiuk.com>.
Hey Everyone,

Based on the feedback, I updated DAG-44
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
- the "implementation notes" with improved approach.

Ash had a good suggestion (which I really like) that instead of inventing
our own decorators and different way of handling the internal and external
communication for the "coarse" functions that require the database, we
could approach it  differently - namely we could always use RPC - no matter
if we are in DB isolation mode or "no isolation" mode. Of course in case of
the "no isolation" mode, the communication should have very low overhead
(local TCP or Sockets, no authorization). I looked at existing RPC
implementations we could use for that and I narrowed down potential choice
of technologies to gRPC and Apache Thrift for that.

This approach has multiple advantages:

* we can leverage existing RPC implementations (Thrift and gRPC are both
mature and have integration with HTTPS, various authentication options and
can be also run using local sockets)
* the code will be much simpler to maintain - we will use existing
serialization mechanisms from those protocols
* no custom code for communication needed - both Thrift and gRPC have all
that is needed for scalable, robust communication

I think this way we will be able to implement a more robust and
maintainable solution much faster.

I also reached out to Apache Beam (they have support for both gRPC and
Thrift and are in the process of transitioning - from Thrift to gRPC as
primary protocol and I am sure they have done a lot of analysis that can
help us to make the final decision.

This approach changes only the implementation details of the AIP-44 - all
the rest is the same, the approach, deployment options remain untouched by
this change.

If you have any comments to that - feel free/ I will also discuss it today
at the meeting and if there will be general consensus that the direction is
right I would love to start voting on AIP-44 ideally tomorrow - so that
next week we can start implementing it. I am not sure if we want to make a
final decision about gRPC/Thrift (maybe there are people who have good
experience both and can share it here?).

I think more detailed POC and benchmarking might be the first step of the
AiP - where we make the final choice based on an attempt to implement POC
for both - but I am also happy to listen to those who have more experience
with both (and maybe Beam experience will help with that)..

J.




On Tue, Feb 15, 2022 at 1:49 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> The meeting is tomorrow :)/ Feel free to join I will also record it
> and publish minutes!
>
> On Tue, Feb 15, 2022 at 12:31 PM Giorgio Zoppi <gi...@gmail.com>
> wrote:
> >
> > Hello Everyone,
> > is there any follow up of this meeting? I would like to participate if
> it's possible.
> > Best Regards,
> > Giorgio
> >
> > Il giorno mar 1 feb 2022 alle ore 15:29 Jarek Potiuk <ja...@potiuk.com>
> ha scritto:
> >>
> >> Hello Everyone,
> >>
> >>  I think it's about the time for the next sig-multitenancy meeting :
> >>
> >> I created a doodle poll for next week - please mark your availability
> till Friday the 4th.
> >>
> >>
> https://doodle.com/poll/axvu2gz7zhv8ieye?utm_source=poll&utm_medium=link
> >>
> >> I think what the rough agenda will be:
> >>
> >> * AIP-43 Dag Processor Separation [1] - implementation progress -
> Mateusz
> >> * AIP-44 Airflow Internal API [2] - voting progress (hopefully) -  Jarek
> >> * AIP-45 Remove double DAG parsing [3] -  discussion - Ping
> >> * AIP-46 Docker runtime isolation [4] - discussion - Ping
> >> * Also there are some ideas (not yet in AIP form) around optimizing
> DagProcessorLoop that might be good to talk about - also Ping.
> >>
> >> If there are any more proposals - feel free to ping me.
> >> I also encourage everyone to comment the AIP-45/46 proposals from Ping
> before the meeting.
> >>
> >> [1]
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation
> >> [2]
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
> >> [3]
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-45+Remove+double+dag+parsing+in+airflow+run
> >> [4]
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Add+support+for+docker+runtime+isolation+for+airflow+tasks+and+dag+parsing
> >>
> >> J.
> >>
> >>
> >
> >
> > --
> > Life is a chess game - Anonymous.
>

Re: [sig-multitenancy]: Meeting for multi-tenancy state and AIP-45/AIP-46 introductory discussions

Posted by Jarek Potiuk <ja...@potiuk.com>.
The meeting is tomorrow :)/ Feel free to join I will also record it
and publish minutes!

On Tue, Feb 15, 2022 at 12:31 PM Giorgio Zoppi <gi...@gmail.com> wrote:
>
> Hello Everyone,
> is there any follow up of this meeting? I would like to participate if it's possible.
> Best Regards,
> Giorgio
>
> Il giorno mar 1 feb 2022 alle ore 15:29 Jarek Potiuk <ja...@potiuk.com> ha scritto:
>>
>> Hello Everyone,
>>
>>  I think it's about the time for the next sig-multitenancy meeting :
>>
>> I created a doodle poll for next week - please mark your availability till Friday the 4th.
>>
>> https://doodle.com/poll/axvu2gz7zhv8ieye?utm_source=poll&utm_medium=link
>>
>> I think what the rough agenda will be:
>>
>> * AIP-43 Dag Processor Separation [1] - implementation progress - Mateusz
>> * AIP-44 Airflow Internal API [2] - voting progress (hopefully) -  Jarek
>> * AIP-45 Remove double DAG parsing [3] -  discussion - Ping
>> * AIP-46 Docker runtime isolation [4] - discussion - Ping
>> * Also there are some ideas (not yet in AIP form) around optimizing DagProcessorLoop that might be good to talk about - also Ping.
>>
>> If there are any more proposals - feel free to ping me.
>> I also encourage everyone to comment the AIP-45/46 proposals from Ping before the meeting.
>>
>> [1] https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation
>> [2] https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>> [3] https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-45+Remove+double+dag+parsing+in+airflow+run
>> [4] https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Add+support+for+docker+runtime+isolation+for+airflow+tasks+and+dag+parsing
>>
>> J.
>>
>>
>
>
> --
> Life is a chess game - Anonymous.

Re: [sig-multitenancy]: Meeting for multi-tenancy state and AIP-45/AIP-46 introductory discussions

Posted by Giorgio Zoppi <gi...@gmail.com>.
Hello Everyone,
is there any follow up of this meeting? I would like to participate if it's
possible.
Best Regards,
Giorgio

Il giorno mar 1 feb 2022 alle ore 15:29 Jarek Potiuk <ja...@potiuk.com> ha
scritto:

> Hello Everyone,
>
>  I think it's about the time for the next sig-multitenancy meeting :
>
> I created a doodle poll for next week - please mark your availability till
> Friday the 4th.
>
> https://doodle.com/poll/axvu2gz7zhv8ieye?utm_source=poll&utm_medium=link
>
> I think what the rough agenda will be:
>
> * AIP-43 Dag Processor Separation [1] - implementation progress - Mateusz
> * AIP-44 Airflow Internal API [2] - voting progress (hopefully) -  Jarek
> * AIP-45 Remove double DAG parsing [3] -  discussion - Ping
> * AIP-46 Docker runtime isolation [4] - discussion - Ping
> * Also there are some ideas (not yet in AIP form) around optimizing
> DagProcessorLoop that might be good to talk about - also Ping.
>
> If there are any more proposals - feel free to ping me.
> I also encourage everyone to comment the AIP-45/46 proposals from Ping
> before the meeting.
>
> [1]
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation
>
> [2]
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
> [3]
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-45+Remove+double+dag+parsing+in+airflow+run
> [4]
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Add+support+for+docker+runtime+isolation+for+airflow+tasks+and+dag+parsing
>
> J.
>
>
>

-- 
Life is a chess game - Anonymous.

Re: [sig-multitenancy]: Meeting for multi-tenancy state and AIP-45/AIP-46 introductory discussions

Posted by Jarek Potiuk <ja...@potiuk.com>.
Hello Everyone,

It was not easy but I think we have a good (enough) time for everyone:

Wed, Feb 16, 7PM CET (10am PDT).

https://calendar.google.com/event?action=TEMPLATE&tmeid=M204cXByZmppOHMwY2Q0ZWloazRmODFia3AgcG90aXVrLmFwYWNoZS5vcmdAbQ&tmsrc=potiuk.apache.org%40gmail.com

Rough agenda for now:


   -

   AIP-43 Dag Processor Separation [1] - implementation progress - Mateusz
   -

   AIP-44 Airflow Internal API [2] - voting ready? (hopefully) -  Jarek
   -

   AIP-45 Remove double DAG parsing [3] -  discussion - Ping
   -

   AIP-46 Docker runtime isolation [4] - discussion - Ping
   -

   Also there are some ideas (not yet in AIP form) around optimizing
   DagProcessorLoop that might be good to talk about - also Ping.


If you have more to talk about - feel free to reach out to me.

J;


On Mon, Feb 7, 2022 at 10:28 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Hello everyone,
>
> Seems it is really difficult to get "best time" for all the important
> stakeholders. This week seems to be impossible to get people together from
> the different groups. And I think the biggest value of such meeting is that
> we can ask question and get to some conclusions faster.
>
> Let me try the next week - I am for one more available so I prepared a new
> doodle for that one:
>
> Can you please mark your availability there:
>
> https://doodle.com/poll/q2ypvd9h5wfh7us2?utm_source=poll&utm_medium=link
>
> J.
>
>
> On Tue, Feb 1, 2022 at 8:02 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Sure - anyone can join - just sign up in doodle with the available slots
>> and I will announce the time for the meeting time on Friday.
>>
>> J.
>>
>> On Tue, Feb 1, 2022 at 6:47 PM Giorgio Zoppi <gi...@gmail.com>
>> wrote:
>>
>>> Hello,
>>> can i join the meeting? I am reviewing airflow in depth. I am interested
>>> in helping.
>>> Best Regards,
>>> Giorgio.
>>>
>>>
>>>
>>>

Re: [sig-multitenancy]: Meeting for multi-tenancy state and AIP-45/AIP-46 introductory discussions

Posted by Jarek Potiuk <ja...@potiuk.com>.
Hello everyone,

Seems it is really difficult to get "best time" for all the important
stakeholders. This week seems to be impossible to get people together from
the different groups. And I think the biggest value of such meeting is that
we can ask question and get to some conclusions faster.

Let me try the next week - I am for one more available so I prepared a new
doodle for that one:

Can you please mark your availability there:

https://doodle.com/poll/q2ypvd9h5wfh7us2?utm_source=poll&utm_medium=link

J.


On Tue, Feb 1, 2022 at 8:02 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Sure - anyone can join - just sign up in doodle with the available slots
> and I will announce the time for the meeting time on Friday.
>
> J.
>
> On Tue, Feb 1, 2022 at 6:47 PM Giorgio Zoppi <gi...@gmail.com>
> wrote:
>
>> Hello,
>> can i join the meeting? I am reviewing airflow in depth. I am interested
>> in helping.
>> Best Regards,
>> Giorgio.
>>
>>
>>
>>

Re: [sig-multitenancy]: Meeting for multi-tenancy state and AIP-45/AIP-46 introductory discussions

Posted by Jarek Potiuk <ja...@potiuk.com>.
Sure - anyone can join - just sign up in doodle with the available slots
and I will announce the time for the meeting time on Friday.

J.

On Tue, Feb 1, 2022 at 6:47 PM Giorgio Zoppi <gi...@gmail.com>
wrote:

> Hello,
> can i join the meeting? I am reviewing airflow in depth. I am interested
> in helping.
> Best Regards,
> Giorgio.
>
>
>
>

Re: [sig-multitenancy]: Meeting for multi-tenancy state and AIP-45/AIP-46 introductory discussions

Posted by Giorgio Zoppi <gi...@gmail.com>.
Hello,
can i join the meeting? I am reviewing airflow in depth. I am interested
in helping.
Best Regards,
Giorgio.