You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <ja...@potiuk.com> on 2022/08/04 08:41:47 UTC

[VOTE] AIP-44 - Airflow Internal API

Hey everyone,

I would like to cast a vote for "AIP-44 - Airflow Internal API".

The AIP-44 is here:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API

Discussion thread:
https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3

The voting will last for 5 days (until 9th of August 2022 11:00 CEST), and
until at least 3 binding votes have been cast.

Please vote accordingly:

[ ] + 1 approve
[ ] + 0 no opinion
[ ] - 1 disapprove with the reason

Only votes from PMC members are binding, but members of the community are
encouraged to check the AIP and vote with "(non-binding)".

----

Just a summary of where we are:

It's been long in the making, but I think it might be a great step-forward
to our long-term multi-tenancy goal. I believe the proposal I have is quite
well thought out and discussed a lot in the past:

* we have a working POC for implementation used for performance testing:
https://github.com/apache/airflow/pull/25094
* it is based on on industry-standard open-source gRPC (which is already
our dependency) which fits better the RPC "model" we need than our public
REST API.
* it has moderate performance impact and rather good
maintainability features (very little impact on regular development effort)
* it is fully backwards compatible - the new DB isolation will be an
optional feature
* has a solid plan for full test coverage in our CI
* has a backing and plans for more extensive complete testing in "real"
environment with Google Composer team support
* allows for further extensions as part of AIP-1 (I am planning to
re-establish sig-multitenancy effort for follow up AIPs once this one is
well in progress).


J.

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Jeambrun Pierre <pi...@gmail.com>.
I feel like this is a big step forward.

It looks very promising.

+1 (non-binding)

Pierre Jeambrun

Le mer. 10 août 2022 à 19:50, Ash Berlin-Taylor <as...@apache.org> a écrit :

> Sorry to weigh in at the last minute, but I'm wary of gRPC over just JSON,
> so -1 to that specific choice. Everything else I'm happy with.
>
> I (or Andrew G) will follow up with more details shortly.
>
> -ash
>
> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <ja...@potiuk.com>
> wrote:
>
> Oh yeah :)
>
> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pi...@umich.edu> wrote:
>
>> ah, good call.
>>
>> I guess the email template can be updated:
>>
>> Only votes from PMC members are binding, but members of the community are
>>> encouraged to check the AIP and vote with "(non-binding)".
>>
>>
>>
>> +1 (binding)
>>
>> Thanks,
>>
>> Ping
>>
>>
>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's votes
>>> are binding. See
>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>>> :D
>>>
>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu> wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>> Thanks,
>>>>
>>>> Ping
>>>>
>>>>
>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>
>>>>> Hey everyone,
>>>>>
>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>>>
>>>>> The AIP-44 is here:
>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>>
>>>>> Discussion thread:
>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>>
>>>>> The voting will last for 5 days (until 9th of August 2022 11:00
>>>>> CEST), and until at least 3 binding votes have been cast.
>>>>>
>>>>> Please vote accordingly:
>>>>>
>>>>> [ ] + 1 approve
>>>>> [ ] + 0 no opinion
>>>>> [ ] - 1 disapprove with the reason
>>>>>
>>>>> Only votes from PMC members are binding, but members of the community
>>>>> are encouraged to check the AIP and vote with "(non-binding)".
>>>>>
>>>>> ----
>>>>>
>>>>> Just a summary of where we are:
>>>>>
>>>>> It's been long in the making, but I think it might be a great
>>>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>>>> have is quite well thought out and discussed a lot in the past:
>>>>>
>>>>> * we have a working POC for implementation used for performance
>>>>> testing:  https://github.com/apache/airflow/pull/25094
>>>>> * it is based on on industry-standard open-source gRPC (which is
>>>>> already our dependency) which fits better the RPC "model" we need than our
>>>>> public REST API.
>>>>> * it has moderate performance impact and rather good
>>>>> maintainability features (very little impact on regular development effort)
>>>>> * it is fully backwards compatible - the new DB isolation will be an
>>>>> optional feature
>>>>> * has a solid plan for full test coverage in our CI
>>>>> * has a backing and plans for more extensive complete testing in
>>>>> "real" environment with Google Composer team support
>>>>> * allows for further extensions as part of AIP-1 (I am planning to
>>>>> re-establish sig-multitenancy effort for follow up AIPs once this one is
>>>>> well in progress).
>>>>>
>>>>>
>>>>> J.
>>>>>
>>>>>
>>>>>
>>>>>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by "Ferruzzi, Dennis" <fe...@amazon.com.INVALID>.
I know this is past the deadline, but I wanted to wait to see the conversation play out to make a more informed call.

I'd vote a (non-binding) +1.


________________________________
From: Abhishek Bhakat <ab...@astronomer.io.INVALID>
Sent: Thursday, August 11, 2022 2:06 AM
To: dev@airflow.apache.org
Subject: RE: [EXTERNAL][VOTE] AIP-44 - Airflow Internal API


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


+1 (non-binding) for gRPC only.

On 11-Aug-2022 at 8:42:52 AM, Ash Berlin-Taylor <as...@apache.org>> wrote:
+1 (binding) now with that change. Thank you very much.

I'm also okay with us to proceed with only gRPC for now -- with this architectural change it's much easier to replace it in future if we want/need to, and for others (such as service providers like Google/Amazon/Astronomer) to experiment with custom API implementations.

I'm not able to point at any examples, as the ones I know of personally aren't open source; they were all written for companies projects/products.

I'm am sorry that we missed it -- speaking personally I've had _a lot_ going on at home and I've barely had time to "look around" at what else is going on in Airflow over the last few months. I'm only now just having the head space to actually think in detail about other things that what is right in front of me, hence why it came now.

-ash

On Thu, Aug 11 2022 at 09:27:39 +02:00:00, Jarek Potiuk <ja...@potiuk.com>> wrote:
I made the updates in https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API to reflect the above. As I suspected, there were really very few changes needed, to make it "GRPC/JSON OpenAPI" agnostic. Ir does not change the "gist" and "purpose" of the AIP and we can easily turn it into implementation detail after extended POC is complete.

On Thu, Aug 11, 2022 at 9:08 AM Jarek Potiuk <ja...@potiuk.com>> wrote:
Hey Ash, Andrew,

TL;DR; I slept over it to try to understand what just happened, and I have a proposal.

* I am happy to extend my POC with the pure JSon/Open API approach - providing that Andrew/Ash point me to some good examples where the RPC style API has been implemented. If you could point me in the right direction, I am a maverick kind of person and just find my ways there.
* I propose we conclude the vote (with added reservation in it, that final choice of the transport will be based on the extended POC) as planned.
* I am rather curious to try the things that both of you mentioned you are concerned about (threading/multiprocessing/load balancing) and see how they compare in both approaches - before making final decision and doing first, founding PR

More comments (personal feeling):

I really treat your comments - (now that they finally came), seriously. And value your expert knowledge and experience. And I personally expect the same in return. I always respond to any AIPs/discussions when I am given enough time and opportunity to comment, I do - In due time. This time it did not happen  - not sure why. I personally feel this is a little disrespectful to my work, but I do not hold any grudges. I am happy to put that completely aside - now that I have your attention and can use your experience, the goal is that we do not make any personal complaints or escalations here, we are all here to make Airflow a better product. Community is the most important, respect for each other's work is an important part of it. And I hope this can improve in the future AIP discussions.

But going back to the merit of the discussion:

Just to let everyone know I am not too "vested" in gRPC . I spend a little time on implementing the POC with it and after doing the POC and reading testimonies of people I feel this is the right choice. But for this AIP. I made deliberate decisions in the way that the transport can be swapped out if we find reasons we should.

I am reasonably happy with the way proposed by Ash (In fact I am going to update the AIP with it in a moment). For me the way how we actually implement the "if"  is an implementation detail that will be worked out on the first PR. The way proposed is good for me, though I would rather experiment a bit with decorators and see if we can make it a bit nicer - but  this is not a dealbreaker at all for me. One concern I have is that we will have another abstraction layer (possibly needless) and that we will have to again repeat all the methods signatures and parameters and keep them in sync (they will now be repeated in 4 or 5 places). But again - this is something that can be likely done in the first "founding" PR we are going to iterate on and work out the best balance between duplication and flexibility/maintainability. And we can always update the AIP with this implementation detail later - very much like it happend with a number of AIPs before - including AIP-39, AIP-40, AIP-42 - all of them went through similar rounds of updates and clarifications as implementation was progressing. It's hard to work out all the details "on paper" or even in "POC". Also I would REALLY love to tap in the experience of people like the both of you, but it seems that the only way to get some good and serious feedback is to call a vote (or make a PR with the intention of merging it).

But I even can go further than that - I think independently from voting whether the whole AIP-44 is a good or bad idea. I think there is no doubt we need it, and the "general scope" and approach seems to already reach general consensus, so if we can just continue and complete the vote. I am happy to continue running the POC on using OpenAPI spec and gRPC in parallel and see how they compare. I am always eager to learn and try other things, and if you have valid concerns I am happy to address them by trying out. I would personally like to compare both from the development "friction" point of view, performance, as well as doing some tests trying to address and test the operational (process/thread/load balancing) concerns you both have and see how they can be solved in both and compare. I think there is nothing like "show me the code" and performing actual working POC.

And I am super happy to continue with the POC and extend it with a pure JSON/OpenAPI based proposal after voting completes and make the final decision during the first founding PR. And we can even arrange extra votes or lazy consensus before the first PR lands - after seeing all the "ins/outs".
The Founding PR is still quite a bit away - I do not want to make any commits before we branch-off the 2.4 - and I don't even want to take too much of your time for that other than discussion and raising concerns and commenting on my findings. I am happy to do all the ground-work here.

That's all I ask for. Just treating the work I do seriously.

So Ash, Andrew

Can you please point me to some examples where RPC-API like ours has been implemented with Open API/JSON? I am curious to learn from those and turn them into POC.

And I propose - let's just continue the vote as planned. We already have 3 binding votes, and more +1s than -1s and the time has already passed, but I am happy to let it run till the end of day tomorrow to see if my proposal above will be good to conclude the vote with more consent from all the people involved.

J.




On Thu, Aug 11, 2022 at 7:49 AM Eugen Kosteev <eu...@kosteev.com>> wrote:
+1 (non-binding)

On Thu, Aug 11, 2022 at 12:08 AM Ash Berlin-Taylor <as...@apache.org>> wrote:
So my concerns (in addition to the ones Andrew already pointed out) with gRPC: it uses threads! Threads + python always makes me nervous. Doubly so when you then couple that with DB access via sqlalchemy - we're heading down paths less well travelled if we do this.

From https://github.com/grpc/grpc/blob/master/doc/fork_support.md

> gRPC Python wraps gRPC core, which uses multithreading for performance, and hence doesn't support fork(). Historically, we didn't support forking in gRPC, but some users seemed to be doing fine until their code started to break on version 1.6. This was likely caused by the addition of background c-threads and a background Python thread

And there's https://github.com/grpc/grpc/issues/16001 which may or may not be a problem for us, I'm unclear:

> The main constraint that must be satisified is that your process must only invoke the fork() syscall before you create your gRPC server(s).


But anyway, the only change I'd like to see is to make the internal API more pluggable.

So instead of something like this:

    def process_file(
        self,
        file_path: str,
        callback_requests: List[CallbackRequest],
        pickle_dags: bool = False,
    ) -> Tuple[int, int]:
        if self.use_grpc:
            return self.process_file_grpc(
                file_path=file_path, callback_requests=callback_requests, pickle_dags=pickle_dags
            )
        return self.process_file_db(
            file_path=file_path, callback_requests=callback_requests, pickle_dags=pickle_dags
        )

I'd very much like us to have it be something like this:

    def process_file(
        self,
        file_path: str,
        callback_requests: List[CallbackRequest],
        pickle_dags: bool = False,
    ) -> Tuple[int, int]:
        if settings.DATABASE_ACCESS_ISOLATION:
            return InternalAPIClient.dagbag_process_file(
                file_path=file_path, callback_requests=callback_requests, pickle_dags=pickle_dags
            )
        return self.process_file_db(
            file_path=file_path, callback_requests=callback_requests, pickle_dags=pickle_dags
        )

i.e. all API access is "marshalled" (perhaps the wrong word) via a single pluggable (and eventually configurable/replaceable) API client. Additionally (though less important) `self.use_grpc` is changed to a property on the `airflow.settings` module (as it is a global setting, not an attribute/property of any single instance.)

(In my mind the API client would include the from_protobuff/to_protobuff methods that you added to TaskInstance on your POC PR. Or the from_protbuff could live in the FileProcessorServiceServicer class in your example, but that probably doesn't scale when multiple services would take a TI/SimpleTI. I guess they could still also live on TaskInstance et al and just not be used - but that doesn't feel as clean to me is all)

Thoughts? It's not too big change to encapsulate things like this I hope?

Sorry again that we didn't look at the recent work on this AIP sooner.

-ash



On Wed, Aug 10 2022 at 13:51:02 -06:00:00, Andrew Godwin <an...@astronomer.io.INVALID> wrote:
I also wish we'd had this discussion before!

I've converted several million lines of code into API-driven services over the past decade so I have my ways and I'm set in them, I guess :)

Notice that I didn't say "use REST" - I don't think REST maps well to RPC style codebases, and agree the whole method thing is confusing. I just mean "ship JSON in a POST and receive JSON in the response".

As you said before though, the line where you draw the abstraction is what matters more than the transport layer, and "fat endpoints" (doing transactions and multiple calls on the API server etc.) is, I agree, the way this has to go, so it's not like this is something I think is totally wrong, just that I've repeatedly tried gRPC on projects like this and been disappointed with what it actually takes to ship changes and prototype things quickly. Nothing beats the ability to just throw curl at a problem - or maybe that's just me.

Anyway, I'll leave you to it - I have my own ideas in this area I'll be noodling on, but it's a bit of a different take and aimed more at execution as a whole, so I'll come back and discuss them if they're successful.

Andrew

On Wed, Aug 10, 2022 at 1:36 PM Jarek Potiuk <ja...@potiuk.com>> wrote:
1st of all - I wish we had this discussion before :)

> The autogenerated code also is available from OpenAPI specs, of course, and the request/response semantics in the same thread are precisely what make load balancing these calls a little harder, and horizontal scaling to multiple threads comes with every HTTP server, but I digress - clearly you've made a choice here to write an RPC layer rather than a lightweight API layer, and go with the more RPC-style features, and I get that.

OpenAPI is REST not RPC and it does not map well to RPC style calls. I tried to explain in detail in the AIP (and in earlier discussions). And the choice is basically made for us because of our expectation to have minimal impact on the existing code (and ability to switch off the remote part). We really do not want to introduce new API calls.  We want to make sure that existing "DB transactions" (i.e. coarse grained calls) are remotely executed. So we are not really talking about lightweight API almost by definition. Indeed, Open API also maps a definition described in a declarative way to python code.  but it has this non-nice part that OpenAPI/REST, it is supposed to be run on some resources. We have no resources to run it on - every single method we call is doing something. Even from the REST/OpenAPI semantics I'd have a really hard time to decide whether it should be GET, POST or PUT or DELETE. In most cases this will be a combination of those 4 on several different resources. All the "nice parts" of Open API (Swagger UI etc.) become next to useless if you try to map such remote procedures we have, to REST calls.


> I still think it's choosing something that's more complex to maintain over a simpler, more accessible option, but since I won't be the one writing it, that's not really for me to worry about. I am very curious to see how this evolves as the work towards multitenancy really kicks in and all the API schemas need to change to add namespacing!

The gRPC (proto) is designed from ground-up with maintainability in mind. The ability to evolve the API, add new parameters etc. is built-in the protobuf definition. From the user perspective it's actually easier to use than OpenAPI when it comes to remote method calls, because you basically - call a method.

And also coming back to monitoring - literally every monitoring platform supports gRPC to monitor. Grafana, NewRelic, CloudWatch, Datadog, you name it. In our case.

Also load-balancing:

Regardless of the choice we talk about HTTP request/response happening for 1 call. This is the boundary. Each of the calls we have will be a separate transaction, separate call, not connected to any other call. The server handling it will be stateless (state will be stored in a DB when each call completes). I deliberately put the "boundary" of each of the remotely callable methods, to be a complete DB transaction to achieve it.

So It really does not change whether we use gRPC or REST/JSON. REST/JSON vs. gRPC is just the content of the message, but this is the very same HTTP call, with same authentication added on top, same headers - just how the message is encoded is different. The same tools for load balancing works in the same way regardless if we use gRPC or REST/JSON. This is really a higher layer than the one involved in load balancing.





On Wed, Aug 10, 2022 at 9:10 PM Andrew Godwin <an...@astronomer.io.invalid> wrote:
Well, binary representation serialization performance is worse for protobuf than for JSON in my experience (in Python specifically) - unless you mean size-on-the-wire, which I can agree with but tend to not worry about very much since it's rarely a bottleneck.

The autogenerated code also is available from OpenAPI specs, of course, and the request/response semantics in the same thread are precisely what make load balancing these calls a little harder, and horizontal scaling to multiple threads comes with every HTTP server, but I digress - clearly you've made a choice here to write an RPC layer rather than a lightweight API layer, and go with the more RPC-style features, and I get that.

I still think it's choosing something that's more complex to maintain over a simpler, more accessible option, but since I won't be the one writing it, that's not really for me to worry about. I am very curious to see how this evolves as the work towards multitenancy really kicks in and all the API schemas need to change to add namespacing!

Andrew

On Wed, Aug 10, 2022 at 12:52 PM Jarek Potiuk <ja...@potiuk.com>> wrote:
Sure I can explain - the main reasons are:

- 1) binary representation performance - impact of this is rather limited because our API calls are doing rather heavy processing compared to the data being transmitted. But I believe it's not negligible.
- 2) automated tools to automatically generate strongly typed Python code (that's the ProtoBuf part). The strongly typed Python code is what convinced me (see my POC). The tooling we got for that is excellent. Far more superior than dealing with json-encoded data even with schema.
- 2) built-in "remote procedure" layer - where we have request/response semantics optimisations (for multiple calls over the same chanel) and exception handling done for us (This is basically what "Remote Procedure" interface provide us)
- 3) built-in server that efficiently distributes the method called from multiple client into a multi-threaded/multi-threaded execution (all individual calls are stateless so multi-processing can be added on top regardless from the "transport" chosen).

BTW. If you look at my POC code, there is nothing that strongly "binds" us to gRPC. The nice thing is that once it is implemented, it can be swapped out very easily. The only Proto/gRPC code that "leaks" to "generic" Airflow code is mapping of some (most complex) parameters to Proto . And this is only for most complex cases - literally only few of our types require custom serialisation - most of the mapping is handled automatically by generated protobuf code. And we can easily put the "complex" mapping in a separate package. Plus there is an "if" statement for each of the ~ 30 or so methods that we will have to turn into remotely-callable. We can even (as I proposed it as an option) add a little python magic and add a simple decorator to handle the "ifs". Then the decorator "if" can be swapped with some other "remote call" implementation.

The actual bulk of the implementation is to make sure that all the places are covered (that's the testing harness).



On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin <an...@astronomer.io.invalid> wrote:
Hi Jarek,

Apologies - I was as not involved as I wanted to be in the AIP-44 process, and obviously my vote is non-binding anyway - but having done a lot of Python API development over the past decade or so I wanted to know why the decision was made to go with gRPC over just plain HTTP+JSON (with a schema, of course).

The AIP covers why XMLRPC and Thrift lost out to gRPC, which I agree with - but does not go into the option of using a standard Python HTTP server with JSON schema enforcement, such as FastAPI. In my experience, the tools for load balancing, debugging, testing and monitoring JSON/HTTP are superior and more numerous than those for gRPC, and in addition the asynchronous support for gRPC servers is lacking compared to their plain HTTP counterparts, and the fact that you can interact and play with the APIs in prototyping stages without having to handle obtaining correct protobuf versions for the Airflow version you're using.

I wouldn't go so far as to suggest a veto, but I do want to see the AIP address why gRPC would win over this option. Apologies again for the late realisation that gRPC got chosen and was being voted on - it's been a very busy summer.

Thanks,
Andrew

On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <ja...@potiuk.com>> wrote:
Just let me express my rather strong dissatisfaction with the way this "last minute" is raised.

It is very late to come up with such a statement - not that it comes at all, but when it comes when everyone had a chance to take a look and comment, including taking a look at the POC and result of checks. This has never been raised even 4 months ago where the only choices were Thrift and gRPc).

I REALLY hope the arguments are very strong and backed by real examples and data why it is a bad choice rather than opinions.

J,.

On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <as...@apache.org>> wrote:
Sorry to weigh in at the last minute, but I'm wary of gRPC over just JSON, so -1 to that specific choice. Everything else I'm happy with.

I (or Andrew G) will follow up with more details shortly.

-ash

On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <ja...@potiuk.com>> wrote:
Oh yeah :)

On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pi...@umich.edu>> wrote:
ah, good call.

I guess the email template can be updated:

Only votes from PMC members are binding, but members of the community are encouraged to check the AIP and vote with "(non-binding)".


+1 (binding)

Thanks,

Ping


On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com>> wrote:
Thank you . And BTW. It's binding Ping :). For AIP's commiter's votes are binding. See https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy :D

On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu>> wrote:
+1 (non-binding)

Thanks,

Ping


On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com>> wrote:
Hey everyone,

I would like to cast a vote for "AIP-44 - Airflow Internal API".

The AIP-44 is here: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API

Discussion thread: https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3

The voting will last for 5 days (until 9th of August 2022 11:00 CEST), and until at least 3 binding votes have been cast.

Please vote accordingly:

[ ] + 1 approve
[ ] + 0 no opinion
[ ] - 1 disapprove with the reason

Only votes from PMC members are binding, but members of the community are encouraged to check the AIP and vote with "(non-binding)".

----

Just a summary of where we are:

It's been long in the making, but I think it might be a great step-forward to our long-term multi-tenancy goal. I believe the proposal I have is quite well thought out and discussed a lot in the past:

* we have a working POC for implementation used for performance testing:  https://github.com/apache/airflow/pull/25094
* it is based on on industry-standard open-source gRPC (which is already our dependency) which fits better the RPC "model" we need than our public REST API.
* it has moderate performance impact and rather good maintainability features (very little impact on regular development effort)
* it is fully backwards compatible - the new DB isolation will be an optional feature
* has a solid plan for full test coverage in our CI
* has a backing and plans for more extensive complete testing in "real" environment with Google Composer team support
* allows for further extensions as part of AIP-1 (I am planning to re-establish sig-multitenancy effort for follow up AIPs once this one is well in progress).


J.





--
Eugene

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Abhishek Bhakat <ab...@astronomer.io.INVALID>.
 +1 (non-binding) for gRPC only.

On 11-Aug-2022 at 8:42:52 AM, Ash Berlin-Taylor <as...@apache.org> wrote:

> +1 (binding) now with that change. Thank you very much.
>
> I'm also okay with us to proceed with only gRPC for now -- with this
> architectural change it's much easier to replace it in future if we
> want/need to, and for others (such as service providers like
> Google/Amazon/Astronomer) to experiment with custom API implementations.
>
> I'm not able to point at any examples, as the ones I know of personally
> aren't open source; they were all written for companies projects/products.
>
> I'm am sorry that we missed it -- speaking personally I've had _a lot_
> going on at home and I've barely had time to "look around" at what else is
> going on in Airflow over the last few months. I'm only now just having the
> head space to actually think in detail about other things that what is
> right in front of me, hence why it came now.
>
> -ash
>
> On Thu, Aug 11 2022 at 09:27:39 +02:00:00, Jarek Potiuk <ja...@potiuk.com>
> wrote:
>
> I made the updates in
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
> to reflect the above. As I suspected, there were really very few changes
> needed, to make it "GRPC/JSON OpenAPI" agnostic. Ir does not change the
> "gist" and "purpose" of the AIP and we can easily turn it into
> implementation detail after extended POC is complete.
>
> On Thu, Aug 11, 2022 at 9:08 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Hey Ash, Andrew,
>>
>> TL;DR; I slept over it to try to understand what just happened, and I
>> have a proposal.
>>
>> * I am happy to extend my POC with the pure JSon/Open API approach -
>> providing that Andrew/Ash point me to some good examples where the RPC
>> style API has been implemented. If you could point me in the right
>> direction, I am a maverick kind of person and just find my ways there.
>> * I propose we conclude the vote (with added reservation in it, that
>> final choice of the transport will be based on the extended POC) as planned.
>> * I am rather curious to try the things that both of you mentioned you
>> are concerned about (threading/multiprocessing/load balancing) and see how
>> they compare in both approaches - before making final decision and doing
>> first, founding PR
>>
>> More comments (personal feeling):
>>
>> I really treat your comments - (now that they finally came), seriously.
>> And value your expert knowledge and experience. And I personally expect the
>> same in return. I always respond to any AIPs/discussions when I am given
>> enough time and opportunity to comment, I do - In due time. This time it
>> did not happen  - not sure why. I personally feel this is a little
>> disrespectful to my work, but I do not hold any grudges. I am happy to put
>> that completely aside - now that I have your attention and can use
>> your experience, the goal is that we do not make any personal complaints or
>> escalations here, we are all here to make Airflow a better product.
>> Community is the most important, respect for each other's work is an
>> important part of it. And I hope this can improve in the future AIP
>> discussions.
>>
>> But going back to the merit of the discussion:
>>
>> Just to let everyone know I am not too "vested" in gRPC . I spend a
>> little time on implementing the POC with it and after doing the POC and
>> reading testimonies of people I feel this is the right choice. But for this
>> AIP. I made deliberate decisions in the way that the transport can be
>> swapped out if we find reasons we should.
>>
>> I am reasonably happy with the way proposed by Ash (In fact I am going to
>> update the AIP with it in a moment). For me the way how we actually
>> implement the "if"  is an implementation detail that will be worked out on
>> the first PR. The way proposed is good for me, though I would rather
>> experiment a bit with decorators and see if we can make it a bit nicer -
>> but  this is not a dealbreaker at all for me. One concern I have is that we
>> will have another abstraction layer (possibly needless) and that we will
>> have to again repeat all the methods signatures and parameters and keep
>> them in sync (they will now be repeated in 4 or 5 places). But again - this
>> is something that can be likely done in the first "founding" PR we are
>> going to iterate on and work out the best balance between duplication and
>> flexibility/maintainability. And we can always update the AIP with this
>> implementation detail later - very much like it happend with a number of
>> AIPs before - including AIP-39, AIP-40, AIP-42 - all of them went through
>> similar rounds of updates and clarifications as implementation was
>> progressing. It's hard to work out all the details "on paper" or even in
>> "POC". Also I would REALLY love to tap in the experience of people like the
>> both of you, but it seems that the only way to get some good and serious
>> feedback is to call a vote (or make a PR with the intention of merging it).
>>
>> But I even can go further than that - I think independently from voting
>> whether the whole AIP-44 is a good or bad idea. I think there is no doubt
>> we need it, and the "general scope" and approach seems to already reach
>> general consensus, so if we can just continue and complete the vote. I am
>> happy to continue running the POC on using OpenAPI spec and gRPC in
>> parallel and see how they compare. I am always eager to learn and try other
>> things, and if you have valid concerns I am happy to address them by trying
>> out. I would personally like to compare both from the development
>> "friction" point of view, performance, as well as doing some tests trying
>> to address and test the operational (process/thread/load balancing)
>> concerns you both have and see how they can be solved in both and compare.
>> I think there is nothing like "show me the code" and performing
>> actual working POC.
>>
>> And I am super happy to continue with the POC and extend it with a pure
>> JSON/OpenAPI based proposal after voting completes and make the final
>> decision during the first founding PR. And we can even arrange extra votes
>> or lazy consensus before the first PR lands - after seeing all the
>> "ins/outs".
>> The Founding PR is still quite a bit away - I do not want to make any
>> commits before we branch-off the 2.4 - and I don't even want to take too
>> much of your time for that other than discussion and raising concerns and
>> commenting on my findings. I am happy to do all the ground-work here.
>>
>> That's all I ask for. Just treating the work I do seriously.
>>
>> So Ash, Andrew
>>
>> Can you please point me to some examples where RPC-API like ours has been
>> implemented with Open API/JSON? I am curious to learn from those and turn
>> them into POC.
>>
>> And I propose - let's just continue the vote as planned. We already have
>> 3 binding votes, and more +1s than -1s and the time has already passed, but
>> I am happy to let it run till the end of day tomorrow to see if my proposal
>> above will be good to conclude the vote with more consent from all the
>> people involved.
>>
>> J.
>>
>>
>>
>>
>> On Thu, Aug 11, 2022 at 7:49 AM Eugen Kosteev <eu...@kosteev.com> wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Thu, Aug 11, 2022 at 12:08 AM Ash Berlin-Taylor <as...@apache.org>
>>> wrote:
>>>
>>>> So my concerns (in addition to the ones Andrew already pointed out)
>>>> with gRPC: it uses threads! Threads + python always makes me nervous.
>>>> Doubly so when you then couple that with DB access via sqlalchemy - we're
>>>> heading down paths less well travelled if we do this.
>>>>
>>>> From https://github.com/grpc/grpc/blob/master/doc/fork_support.md
>>>>
>>>> > gRPC Python wraps gRPC core, which uses multithreading for
>>>> performance, and hence doesn't support fork(). Historically, we didn't
>>>> support forking in gRPC, but some users seemed to be doing fine until their
>>>> code started to break on version 1.6. This was likely caused by the
>>>> addition of background c-threads and a background Python thread
>>>>
>>>> And there's https://github.com/grpc/grpc/issues/16001 which may or may
>>>> not be a problem for us, I'm unclear:
>>>>
>>>> > The main constraint that must be satisified is that your process must
>>>> only invoke the fork() syscall *before* you create your gRPC server(s).
>>>>
>>>>
>>>> But anyway, the only change I'd like to see is to make the internal API
>>>> more pluggable.
>>>>
>>>> So instead of something like this:
>>>>
>>>>     def process_file(
>>>>         self,
>>>>         file_path: str,
>>>>         callback_requests: List[CallbackRequest],
>>>>         pickle_dags: bool = False,
>>>>     ) -> Tuple[int, int]:
>>>>         if self.use_grpc:
>>>>             return self.process_file_grpc(
>>>>                 file_path=file_path,
>>>> callback_requests=callback_requests, pickle_dags=pickle_dags
>>>>             )
>>>>         return self.process_file_db(
>>>>             file_path=file_path, callback_requests=callback_requests,
>>>> pickle_dags=pickle_dags
>>>>         )
>>>>
>>>> I'd very much like us to have it be something like this:
>>>>
>>>>     def process_file(
>>>>         self,
>>>>         file_path: str,
>>>>         callback_requests: List[CallbackRequest],
>>>>         pickle_dags: bool = False,
>>>>     ) -> Tuple[int, int]:
>>>>         if settings.DATABASE_ACCESS_ISOLATION:
>>>>             return InternalAPIClient.dagbag_process_file(
>>>>                 file_path=file_path,
>>>> callback_requests=callback_requests, pickle_dags=pickle_dags
>>>>             )
>>>>         return self.process_file_db(
>>>>             file_path=file_path, callback_requests=callback_requests,
>>>> pickle_dags=pickle_dags
>>>>         )
>>>>
>>>> i.e. all API access is "marshalled" (perhaps the wrong word) via a
>>>> single pluggable (and eventually configurable/replaceable) API client.
>>>> Additionally (though less important) `self.use_grpc` is changed to a
>>>> property on the `airflow.settings` module (as it is a global setting, not
>>>> an attribute/property of any single instance.)
>>>>
>>>> (In my mind the API client would include the
>>>> from_protobuff/to_protobuff methods that you added to TaskInstance on your
>>>> POC PR. Or the from_protbuff could live in the FileProcessorServiceServicer
>>>> class in your example, but that probably doesn't scale when multiple
>>>> services would take a TI/SimpleTI. I guess they could still also live on
>>>> TaskInstance et al and just not be used - but that doesn't feel as clean to
>>>> me is all)
>>>>
>>>> Thoughts? It's not too big change to encapsulate things like this I
>>>> hope?
>>>>
>>>> Sorry again that we didn't look at the recent work on this AIP sooner.
>>>>
>>>> -ash
>>>>
>>>>
>>>>
>>>> On Wed, Aug 10 2022 at 13:51:02 -06:00:00, Andrew Godwin
>>>> <an...@astronomer.io.INVALID> wrote:
>>>>
>>>> I also wish we'd had this discussion before!
>>>>
>>>> I've converted several million lines of code into API-driven
>>>> services over the past decade so I have my ways and I'm set in them, I
>>>> guess :)
>>>>
>>>> Notice that I didn't say "use REST" - I don't think REST maps well to
>>>> RPC style codebases, and agree the whole method thing is confusing. I just
>>>> mean "ship JSON in a POST and receive JSON in the response".
>>>>
>>>> As you said before though, the line where you draw the abstraction is
>>>> what matters more than the transport layer, and "fat endpoints" (doing
>>>> transactions and multiple calls on the API server etc.) is, I agree, the
>>>> way this has to go, so it's not like this is something I think is totally
>>>> wrong, just that I've repeatedly tried gRPC on projects like this and been
>>>> disappointed with what it actually takes to ship changes and
>>>> prototype things quickly. Nothing beats the ability to just throw curl at a
>>>> problem - or maybe that's just me.
>>>>
>>>> Anyway, I'll leave you to it - I have my own ideas in this area I'll be
>>>> noodling on, but it's a bit of a different take and aimed more at execution
>>>> as a whole, so I'll come back and discuss them if they're successful.
>>>>
>>>> Andrew
>>>>
>>>> On Wed, Aug 10, 2022 at 1:36 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>
>>>>> 1st of all - I wish we had this discussion before :)
>>>>>
>>>>> > The autogenerated code also is available from OpenAPI specs, of
>>>>> course, and the request/response semantics in the same thread are precisely
>>>>> what make load balancing these calls a little harder, and horizontal
>>>>> scaling to multiple threads comes with every HTTP server, but I digress -
>>>>> clearly you've made a choice here to write an RPC layer rather than a
>>>>> lightweight API layer, and go with the more RPC-style features, and I get
>>>>> that.
>>>>>
>>>>> OpenAPI is REST not RPC and it does not map well to RPC style calls. I
>>>>> tried to explain in detail in the AIP (and in earlier discussions). And the
>>>>> choice is basically made for us because of our expectation to have
>>>>> minimal impact on the existing code (and ability to switch off the remote
>>>>> part). We really do not want to introduce new API calls.  We want to make
>>>>> sure that existing "DB transactions" (i.e. coarse grained calls) are
>>>>> remotely executed. So we are not really talking about lightweight API
>>>>> almost by definition. Indeed, Open API also maps a definition described in
>>>>> a declarative way to python code.  but it has this non-nice part that
>>>>> OpenAPI/REST, it is supposed to be run on some resources. We have no
>>>>> resources to run it on - every single method we call is doing something.
>>>>> Even from the REST/OpenAPI semantics I'd have a really hard time to decide
>>>>> whether it should be GET, POST or PUT or DELETE. In most cases this will be
>>>>> a combination of those 4 on several different resources. All the "nice
>>>>> parts" of Open API (Swagger UI etc.) become next to useless if you try to
>>>>> map such remote procedures we have, to REST calls.
>>>>>
>>>>>
>>>>> > I still think it's choosing something that's more complex to
>>>>> maintain over a simpler, more accessible option, but since I won't be the
>>>>> one writing it, that's not really for me to worry about. I am very curious
>>>>> to see how this evolves as the work towards multitenancy really kicks in
>>>>> and all the API schemas need to change to add namespacing!
>>>>>
>>>>> The gRPC (proto) is designed from ground-up with maintainability in
>>>>> mind. The ability to evolve the API, add new parameters etc. is built-in
>>>>> the protobuf definition. From the user perspective it's actually easier to
>>>>> use than OpenAPI when it comes to remote method calls, because you
>>>>> basically - call a method.
>>>>>
>>>>> And also coming back to monitoring - literally every monitoring
>>>>> platform supports gRPC to monitor. Grafana, NewRelic, CloudWatch, Datadog,
>>>>> you name it. In our case.
>>>>>
>>>>> Also load-balancing:
>>>>>
>>>>> Regardless of the choice we talk about HTTP request/response happening
>>>>> for 1 call. This is the boundary. Each of the calls we have will be a
>>>>> separate transaction, separate call, not connected to any other call. The
>>>>> server handling it will be stateless (state will be stored in a DB when
>>>>> each call completes). I deliberately put the "boundary" of each of
>>>>> the remotely callable methods, to be a complete DB transaction to achieve
>>>>> it.
>>>>>
>>>>> So It really does not change whether we use gRPC or REST/JSON.
>>>>> REST/JSON vs. gRPC is just the content of the message, but this is the very
>>>>> same HTTP call, with same authentication added on top, same headers - just
>>>>> how the message is encoded is different. The same tools for load balancing
>>>>> works in the same way regardless if we use gRPC or REST/JSON. This is
>>>>> really a higher layer than the one involved in load balancing.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 10, 2022 at 9:10 PM Andrew Godwin
>>>>> <an...@astronomer.io.invalid> wrote:
>>>>>
>>>>>> Well, binary representation serialization performance is worse for
>>>>>> protobuf than for JSON in my experience (in Python specifically) - unless
>>>>>> you mean size-on-the-wire, which I can agree with but tend to not worry
>>>>>> about very much since it's rarely a bottleneck.
>>>>>>
>>>>>> The autogenerated code also is available from OpenAPI specs, of
>>>>>> course, and the request/response semantics in the same thread are precisely
>>>>>> what make load balancing these calls a little harder, and horizontal
>>>>>> scaling to multiple threads comes with every HTTP server, but I digress -
>>>>>> clearly you've made a choice here to write an RPC layer rather than a
>>>>>> lightweight API layer, and go with the more RPC-style features, and I get
>>>>>> that.
>>>>>>
>>>>>> I still think it's choosing something that's more complex to maintain
>>>>>> over a simpler, more accessible option, but since I won't be the one
>>>>>> writing it, that's not really for me to worry about. I am very curious to
>>>>>> see how this evolves as the work towards multitenancy really kicks in and
>>>>>> all the API schemas need to change to add namespacing!
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>> On Wed, Aug 10, 2022 at 12:52 PM Jarek Potiuk <ja...@potiuk.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Sure I can explain - the main reasons are:
>>>>>>>
>>>>>>> - 1) binary representation performance - impact of this is
>>>>>>> rather limited because our API calls are doing rather heavy processing
>>>>>>> compared to the data being transmitted. But I believe it's not negligible.
>>>>>>> - 2) automated tools to automatically generate strongly typed Python
>>>>>>> code (that's the ProtoBuf part). The strongly typed Python code is what
>>>>>>> convinced me (see my POC). The tooling we got for that is excellent. Far
>>>>>>> more superior than dealing with json-encoded data even with schema.
>>>>>>> - 2) built-in "remote procedure" layer - where we have
>>>>>>> request/response semantics optimisations (for multiple calls over the same
>>>>>>> chanel) and exception handling done for us (This is basically what "Remote
>>>>>>> Procedure" interface provide us)
>>>>>>> - 3) built-in server that efficiently distributes the method called
>>>>>>> from multiple client into a multi-threaded/multi-threaded execution (all
>>>>>>> individual calls are stateless so multi-processing can be added on top
>>>>>>> regardless from the "transport" chosen).
>>>>>>>
>>>>>>> BTW. If you look at my POC code, there is nothing that strongly
>>>>>>> "binds" us to gRPC. The nice thing is that once it is implemented, it can
>>>>>>> be swapped out very easily. The only Proto/gRPC code that "leaks" to
>>>>>>> "generic" Airflow code is mapping of some (most complex) parameters to
>>>>>>> Proto . And this is only for most complex cases - literally only few of our
>>>>>>> types require custom serialisation - most of the mapping is handled
>>>>>>> automatically by generated protobuf code. And we can easily put
>>>>>>> the "complex" mapping in a separate package. Plus there is an "if"
>>>>>>> statement for each of the ~ 30 or so methods that we will have to turn into
>>>>>>> remotely-callable. We can even (as I proposed it as an option) add a little
>>>>>>> python magic and add a simple decorator to handle the "ifs". Then the
>>>>>>> decorator "if" can be swapped with some other "remote call" implementation.
>>>>>>>
>>>>>>> The actual bulk of the implementation is to make sure that all the
>>>>>>> places are covered (that's the testing harness).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin
>>>>>>> <an...@astronomer.io.invalid> wrote:
>>>>>>>
>>>>>>>> Hi Jarek,
>>>>>>>>
>>>>>>>> Apologies - I was as not involved as I wanted to be in the AIP-44
>>>>>>>> process, and obviously my vote is non-binding anyway - but having done a
>>>>>>>> lot of Python API development over the past decade or so I wanted to know
>>>>>>>> why the decision was made to go with gRPC over just plain HTTP+JSON (with a
>>>>>>>> schema, of course).
>>>>>>>>
>>>>>>>> The AIP covers why XMLRPC and Thrift lost out to gRPC, which I
>>>>>>>> agree with - but does not go into the option of using a standard Python
>>>>>>>> HTTP server with JSON schema enforcement, such as FastAPI. In my
>>>>>>>> experience, the tools for load balancing, debugging, testing and monitoring
>>>>>>>> JSON/HTTP are superior and more numerous than those for gRPC, and in
>>>>>>>> addition the asynchronous support for gRPC servers is lacking compared to
>>>>>>>> their plain HTTP counterparts, and the fact that you can interact and play
>>>>>>>> with the APIs in prototyping stages without having to handle obtaining
>>>>>>>> correct protobuf versions for the Airflow version you're using.
>>>>>>>>
>>>>>>>> I wouldn't go so far as to suggest a veto, but I do want to see the
>>>>>>>> AIP address why gRPC would win over this option. Apologies again for the
>>>>>>>> late realisation that gRPC got chosen and was being voted on - it's been a
>>>>>>>> very busy summer.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Andrew
>>>>>>>>
>>>>>>>> On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <ja...@potiuk.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Just let me express my rather strong dissatisfaction with the way
>>>>>>>>> this "last minute" is raised.
>>>>>>>>>
>>>>>>>>> It is very late to come up with such a statement - not that it
>>>>>>>>> comes at all, but when it comes when everyone had a chance to take a look
>>>>>>>>> and comment, including taking a look at the POC and result of checks. This
>>>>>>>>> has never been raised even 4 months ago where the only choices were Thrift
>>>>>>>>> and gRPc).
>>>>>>>>>
>>>>>>>>> I REALLY hope the arguments are very strong and backed by real
>>>>>>>>> examples and data why it is a bad choice rather than opinions.
>>>>>>>>>
>>>>>>>>> J,.
>>>>>>>>>
>>>>>>>>> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <as...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Sorry to weigh in at the last minute, but I'm wary of gRPC over
>>>>>>>>>> just JSON, so -1 to that specific choice. Everything else I'm happy with.
>>>>>>>>>>
>>>>>>>>>> I (or Andrew G) will follow up with more details shortly.
>>>>>>>>>>
>>>>>>>>>> -ash
>>>>>>>>>>
>>>>>>>>>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <
>>>>>>>>>> jarek@potiuk.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Oh yeah :)
>>>>>>>>>>
>>>>>>>>>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pi...@umich.edu>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> ah, good call.
>>>>>>>>>>>
>>>>>>>>>>> I guess the email template can be updated:
>>>>>>>>>>>
>>>>>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>>>>>> community are encouraged to check the AIP and vote with "(non-binding)".
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> +1 (binding)
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Ping
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's
>>>>>>>>>>>> votes are binding. See
>>>>>>>>>>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>>>>>>>>>>>> :D
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> +1 (non-binding)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ping
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal
>>>>>>>>>>>>>> API".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The AIP-44 is here:
>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Discussion thread:
>>>>>>>>>>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The voting will last for 5 days (until 9th of August 2022
>>>>>>>>>>>>>> 11:00 CEST), and until at least 3 binding votes have been
>>>>>>>>>>>>>> cast.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please vote accordingly:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [ ] + 1 approve
>>>>>>>>>>>>>> [ ] + 0 no opinion
>>>>>>>>>>>>>> [ ] - 1 disapprove with the reason
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>>>>>>>> community are encouraged to check the AIP and vote with
>>>>>>>>>>>>>> "(non-binding)".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ----
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Just a summary of where we are:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It's been long in the making, but I think it might be a great
>>>>>>>>>>>>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>>>>>>>>>>>>> have is quite well thought out and discussed a lot in the past:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * we have a working POC for implementation used for
>>>>>>>>>>>>>> performance testing:
>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/25094
>>>>>>>>>>>>>> * it is based on on industry-standard open-source gRPC (which
>>>>>>>>>>>>>> is already our dependency) which fits better the RPC "model" we need than
>>>>>>>>>>>>>> our public REST API.
>>>>>>>>>>>>>> * it has moderate performance impact and rather good
>>>>>>>>>>>>>> maintainability features (very little impact on regular development effort)
>>>>>>>>>>>>>> * it is fully backwards compatible - the new DB isolation
>>>>>>>>>>>>>> will be an optional feature
>>>>>>>>>>>>>> * has a solid plan for full test coverage in our CI
>>>>>>>>>>>>>> * has a backing and plans for more extensive complete testing
>>>>>>>>>>>>>> in "real" environment with Google Composer team support
>>>>>>>>>>>>>> * allows for further extensions as part of AIP-1 (I am
>>>>>>>>>>>>>> planning to re-establish sig-multitenancy effort for follow up AIPs once
>>>>>>>>>>>>>> this one is well in progress).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> J.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>
>>> --
>>> Eugene
>>>
>>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Ash Berlin-Taylor <as...@apache.org>.
+1 (binding) now with that change. Thank you very much.

I'm also okay with us to proceed with only gRPC for now -- with this 
architectural change it's much easier to replace it in future if we 
want/need to, and for others (such as service providers like 
Google/Amazon/Astronomer) to experiment with custom API implementations.

I'm not able to point at any examples, as the ones I know of personally 
aren't open source; they were all written for companies 
projects/products.

I'm am sorry that we missed it -- speaking personally I've had _a lot_ 
going on at home and I've barely had time to "look around" at what else 
is going on in Airflow over the last few months. I'm only now just 
having the head space to actually think in detail about other things 
that what is right in front of me, hence why it came now.

-ash

On Thu, Aug 11 2022 at 09:27:39 +02:00:00, Jarek Potiuk 
<ja...@potiuk.com> wrote:
> I made the updates in 
> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API> 
> to reflect the above. As I suspected, there were really very few 
> changes needed, to make it "GRPC/JSON OpenAPI" agnostic. Ir does not 
> change the "gist" and "purpose" of the AIP and we can easily turn it 
> into implementation detail after extended POC is complete.
> 
> On Thu, Aug 11, 2022 at 9:08 AM Jarek Potiuk <jarek@potiuk.com 
> <ma...@potiuk.com>> wrote:
>> Hey Ash, Andrew,
>> 
>> TL;DR; I slept over it to try to understand what just happened, and 
>> I have a proposal.
>> 
>> * I am happy to extend my POC with the pure JSon/Open API approach - 
>> providing that Andrew/Ash point me to some good examples where the 
>> RPC style API has been implemented. If you could point me in the 
>> right direction, I am a maverick kind of person and just find my 
>> ways there.
>> * I propose we conclude the vote (with added reservation in it, that 
>> final choice of the transport will be based on the extended POC) as 
>> planned.
>> * I am rather curious to try the things that both of you mentioned 
>> you are concerned about (threading/multiprocessing/load balancing) 
>> and see how they compare in both approaches - before making final 
>> decision and doing first, founding PR
>> 
>> More comments (personal feeling):
>> 
>> I really treat your comments - (now that they finally came), 
>> seriously. And value your expert knowledge and experience. And I 
>> personally expect the same in return. I always respond to any 
>> AIPs/discussions when I am given enough time and opportunity to 
>> comment, I do - In due time. This time it did not happen  - not sure 
>> why. I personally feel this is a little disrespectful to my work, 
>> but I do not hold any grudges. I am happy to put that completely 
>> aside - now that I have your attention and can use your experience, 
>> the goal is that we do not make any personal complaints or 
>> escalations here, we are all here to make Airflow a better product. 
>> Community is the most important, respect for each other's work is an 
>> important part of it. And I hope this can improve in the future AIP 
>> discussions.
>> 
>> But going back to the merit of the discussion:
>> 
>> Just to let everyone know I am not too "vested" in gRPC . I spend a 
>> little time on implementing the POC with it and after doing the POC 
>> and reading testimonies of people I feel this is the right choice. 
>> But for this AIP. I made deliberate decisions in the way that the 
>> transport can be swapped out if we find reasons we should.
>> 
>> I am reasonably happy with the way proposed by Ash (In fact I am 
>> going to update the AIP with it in a moment). For me the way how we 
>> actually implement the "if"  is an implementation detail that will 
>> be worked out on the first PR. The way proposed is good for me, 
>> though I would rather experiment a bit with decorators and see if we 
>> can make it a bit nicer - but  this is not a dealbreaker at all for 
>> me. One concern I have is that we will have another abstraction 
>> layer (possibly needless) and that we will have to again repeat all 
>> the methods signatures and parameters and keep them in sync (they 
>> will now be repeated in 4 or 5 places). But again - this is 
>> something that can be likely done in the first "founding" PR we are 
>> going to iterate on and work out the best balance between 
>> duplication and flexibility/maintainability. And we can always 
>> update the AIP with this implementation detail later - very much 
>> like it happend with a number of AIPs before - including AIP-39, 
>> AIP-40, AIP-42 - all of them went through similar rounds of updates 
>> and clarifications as implementation was progressing. It's hard to 
>> work out all the details "on paper" or even in "POC". Also I would 
>> REALLY love to tap in the experience of people like the both of you, 
>> but it seems that the only way to get some good and serious feedback 
>> is to call a vote (or make a PR with the intention of merging it).
>> 
>> But I even can go further than that - I think independently from 
>> voting whether the whole AIP-44 is a good or bad idea. I think there 
>> is no doubt we need it, and the "general scope" and approach seems 
>> to already reach general consensus, so if we can just continue and 
>> complete the vote. I am happy to continue running the POC on using 
>> OpenAPI spec and gRPC in parallel and see how they compare. I am 
>> always eager to learn and try other things, and if you have valid 
>> concerns I am happy to address them by trying out. I would 
>> personally like to compare both from the development "friction" 
>> point of view, performance, as well as doing some tests trying to 
>> address and test the operational (process/thread/load balancing) 
>> concerns you both have and see how they can be solved in both and 
>> compare. I think there is nothing like "show me the code" and 
>> performing actual working POC.
>> 
>> And I am super happy to continue with the POC and extend it with a 
>> pure JSON/OpenAPI based proposal after voting completes and make the 
>> final decision during the first founding PR. And we can even arrange 
>> extra votes or lazy consensus before the first PR lands - after 
>> seeing all the "ins/outs".
>> The Founding PR is still quite a bit away - I do not want to make 
>> any commits before we branch-off the 2.4 - and I don't even want to 
>> take too much of your time for that other than discussion and 
>> raising concerns and commenting on my findings. I am happy to do all 
>> the ground-work here.
>> 
>> That's all I ask for. Just treating the work I do seriously.
>> 
>> So Ash, Andrew
>> 
>> Can you please point me to some examples where RPC-API like ours has 
>> been implemented with Open API/JSON? I am curious to learn from 
>> those and turn them into POC.
>> 
>> And I propose - let's just continue the vote as planned. We already 
>> have 3 binding votes, and more +1s than -1s and the time has already 
>> passed, but I am happy to let it run till the end of day tomorrow to 
>> see if my proposal above will be good to conclude the vote with more 
>> consent from all the people involved.
>> 
>> J.
>> 
>> 
>> 
>> 
>> On Thu, Aug 11, 2022 at 7:49 AM Eugen Kosteev <eugen@kosteev.com 
>> <ma...@kosteev.com>> wrote:
>>> +1 (non-binding)
>>> 
>>> On Thu, Aug 11, 2022 at 12:08 AM Ash Berlin-Taylor <ash@apache.org 
>>> <ma...@apache.org>> wrote:
>>>> So my concerns (in addition to the ones Andrew already pointed 
>>>> out) with gRPC: it uses threads! Threads + python always makes me 
>>>> nervous. Doubly so when you then couple that with DB access via 
>>>> sqlalchemy - we're heading down paths less well travelled if we do 
>>>> this.
>>>> 
>>>> From <https://github.com/grpc/grpc/blob/master/doc/fork_support.md>
>>>> 
>>>> > gRPC Python wraps gRPC core, which uses multithreading for 
>>>> performance, and hence doesn't support fork(). Historically, we 
>>>> didn't support forking in gRPC, but some users seemed to be doing 
>>>> fine until their code started to break on version 1.6. This was 
>>>> likely caused by the addition of background c-threads and a 
>>>> background Python thread
>>>> 
>>>> And there's <https://github.com/grpc/grpc/issues/16001> which may 
>>>> or may not be a problem for us, I'm unclear:
>>>> 
>>>> > The main constraint that must be satisified is that your process 
>>>> must only invoke the fork() syscall /before/ you create your gRPC 
>>>> server(s).
>>>> 
>>>> 
>>>> But anyway, the only change I'd like to see is to make the 
>>>> internal API more pluggable.
>>>> 
>>>> So instead of something like this:
>>>> 
>>>>     def process_file(
>>>>         self,
>>>>         file_path: str,
>>>>         callback_requests: List[CallbackRequest],
>>>>         pickle_dags: bool = False,
>>>>     ) -> Tuple[int, int]:
>>>>         if self.use_grpc:
>>>>             return self.process_file_grpc(
>>>>                 file_path=file_path, 
>>>> callback_requests=callback_requests, pickle_dags=pickle_dags
>>>>             )
>>>>         return self.process_file_db(
>>>>             file_path=file_path, 
>>>> callback_requests=callback_requests, pickle_dags=pickle_dags
>>>>         )
>>>> 
>>>> I'd very much like us to have it be something like this:
>>>> 
>>>>     def process_file(
>>>>         self,
>>>>         file_path: str,
>>>>         callback_requests: List[CallbackRequest],
>>>>         pickle_dags: bool = False,
>>>>     ) -> Tuple[int, int]:
>>>>         if settings.DATABASE_ACCESS_ISOLATION:
>>>>             return InternalAPIClient.dagbag_process_file(
>>>>                 file_path=file_path, 
>>>> callback_requests=callback_requests, pickle_dags=pickle_dags
>>>>             )
>>>>         return self.process_file_db(
>>>>             file_path=file_path, 
>>>> callback_requests=callback_requests, pickle_dags=pickle_dags
>>>>         )
>>>> 
>>>> i.e. all API access is "marshalled" (perhaps the wrong word) via a 
>>>> single pluggable (and eventually configurable/replaceable) API 
>>>> client. Additionally (though less important) `self.use_grpc` is 
>>>> changed to a property on the `airflow.settings` module (as it is a 
>>>> global setting, not an attribute/property of any single instance.)
>>>> 
>>>> (In my mind the API client would include the 
>>>> from_protobuff/to_protobuff methods that you added to TaskInstance 
>>>> on your POC PR. Or the from_protbuff could live in the 
>>>> FileProcessorServiceServicer class in your example, but that 
>>>> probably doesn't scale when multiple services would take a 
>>>> TI/SimpleTI. I guess they could still also live on TaskInstance et 
>>>> al and just not be used - but that doesn't feel as clean to me is 
>>>> all)
>>>> 
>>>> Thoughts? It's not too big change to encapsulate things like this 
>>>> I hope?
>>>> 
>>>> Sorry again that we didn't look at the recent work on this AIP 
>>>> sooner.
>>>> 
>>>> -ash
>>>> 
>>>> 
>>>> 
>>>> On Wed, Aug 10 2022 at 13:51:02 -06:00:00, Andrew Godwin 
>>>> <an...@astronomer.io.INVALID> wrote:
>>>>> I also wish we'd had this discussion before!
>>>>> 
>>>>> I've converted several million lines of code into API-driven 
>>>>> services over the past decade so I have my ways and I'm set in 
>>>>> them, I guess :)
>>>>> 
>>>>> Notice that I didn't say "use REST" - I don't think REST maps 
>>>>> well to RPC style codebases, and agree the whole method thing is 
>>>>> confusing. I just mean "ship JSON in a POST and receive JSON in 
>>>>> the response".
>>>>> 
>>>>> As you said before though, the line where you draw the 
>>>>> abstraction is what matters more than the transport layer, and 
>>>>> "fat endpoints" (doing transactions and multiple calls on the API 
>>>>> server etc.) is, I agree, the way this has to go, so it's not 
>>>>> like this is something I think is totally wrong, just that I've 
>>>>> repeatedly tried gRPC on projects like this and been disappointed 
>>>>> with what it actually takes to ship changes and prototype things 
>>>>> quickly. Nothing beats the ability to just throw curl at a 
>>>>> problem - or maybe that's just me.
>>>>> 
>>>>> Anyway, I'll leave you to it - I have my own ideas in this area 
>>>>> I'll be noodling on, but it's a bit of a different take and aimed 
>>>>> more at execution as a whole, so I'll come back and discuss them 
>>>>> if they're successful.
>>>>> 
>>>>> Andrew
>>>>> 
>>>>> On Wed, Aug 10, 2022 at 1:36 PM Jarek Potiuk <jarek@potiuk.com 
>>>>> <ma...@potiuk.com>> wrote:
>>>>>> 1st of all - I wish we had this discussion before :)
>>>>>> 
>>>>>> > The autogenerated code also is available from OpenAPI specs, 
>>>>>> of course, and the request/response semantics in the same thread 
>>>>>> are precisely what make load balancing these calls a little 
>>>>>> harder, and horizontal scaling to multiple threads comes with 
>>>>>> every HTTP server, but I digress - clearly you've made a choice 
>>>>>> here to write an RPC layer rather than a lightweight API layer, 
>>>>>> and go with the more RPC-style features, and I get that.
>>>>>> 
>>>>>> OpenAPI is REST not RPC and it does not map well to RPC style 
>>>>>> calls. I tried to explain in detail in the AIP (and in earlier 
>>>>>> discussions). And the choice is basically made for us because of 
>>>>>> our expectation to have minimal impact on the existing code (and 
>>>>>> ability to switch off the remote part). We really do not want to 
>>>>>> introduce new API calls.  We want to make sure that existing "DB 
>>>>>> transactions" (i.e. coarse grained calls) are remotely executed. 
>>>>>> So we are not really talking about lightweight API almost by 
>>>>>> definition. Indeed, Open API also maps a definition described in 
>>>>>> a declarative way to python code.  but it has this non-nice part 
>>>>>> that OpenAPI/REST, it is supposed to be run on some resources. 
>>>>>> We have no resources to run it on - every single method we call 
>>>>>> is doing something. Even from the REST/OpenAPI semantics I'd 
>>>>>> have a really hard time to decide whether it should be GET, POST 
>>>>>> or PUT or DELETE. In most cases this will be a combination of 
>>>>>> those 4 on several different resources. All the "nice parts" of 
>>>>>> Open API (Swagger UI etc.) become next to useless if you try to 
>>>>>> map such remote procedures we have, to REST calls.
>>>>>> 
>>>>>> 
>>>>>> > I still think it's choosing something that's more complex to 
>>>>>> maintain over a simpler, more accessible option, but since I 
>>>>>> won't be the one writing it, that's not really for me to worry 
>>>>>> about. I am very curious to see how this evolves as the work 
>>>>>> towards multitenancy really kicks in and all the API schemas 
>>>>>> need to change to add namespacing!
>>>>>> 
>>>>>> The gRPC (proto) is designed from ground-up with maintainability 
>>>>>> in mind. The ability to evolve the API, add new parameters etc. 
>>>>>> is built-in the protobuf definition. From the user perspective 
>>>>>> it's actually easier to use than OpenAPI when it comes to remote 
>>>>>> method calls, because you basically - call a method.
>>>>>> 
>>>>>> And also coming back to monitoring - literally every monitoring 
>>>>>> platform supports gRPC to monitor. Grafana, NewRelic, 
>>>>>> CloudWatch, Datadog, you name it. In our case.
>>>>>> 
>>>>>> Also load-balancing:
>>>>>> 
>>>>>> Regardless of the choice we talk about HTTP request/response 
>>>>>> happening for 1 call. This is the boundary. Each of the calls we 
>>>>>> have will be a separate transaction, separate call, not 
>>>>>> connected to any other call. The server handling it will be 
>>>>>> stateless (state will be stored in a DB when each call 
>>>>>> completes). I deliberately put the "boundary" of each of the 
>>>>>> remotely callable methods, to be a complete DB transaction to 
>>>>>> achieve it.
>>>>>> 
>>>>>> So It really does not change whether we use gRPC or REST/JSON. 
>>>>>> REST/JSON vs. gRPC is just the content of the message, but this 
>>>>>> is the very same HTTP call, with same authentication added on 
>>>>>> top, same headers - just how the message is encoded is 
>>>>>> different. The same tools for load balancing works in the same 
>>>>>> way regardless if we use gRPC or REST/JSON. This is really a 
>>>>>> higher layer than the one involved in load balancing.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Aug 10, 2022 at 9:10 PM Andrew Godwin 
>>>>>> <an...@astronomer.io.invalid> wrote:
>>>>>>> Well, binary representation serialization performance is worse 
>>>>>>> for protobuf than for JSON in my experience (in Python 
>>>>>>> specifically) - unless you mean size-on-the-wire, which I can 
>>>>>>> agree with but tend to not worry about very much since it's 
>>>>>>> rarely a bottleneck.
>>>>>>> 
>>>>>>> The autogenerated code also is available from OpenAPI specs, of 
>>>>>>> course, and the request/response semantics in the same thread 
>>>>>>> are precisely what make load balancing these calls a little 
>>>>>>> harder, and horizontal scaling to multiple threads comes with 
>>>>>>> every HTTP server, but I digress - clearly you've made a choice 
>>>>>>> here to write an RPC layer rather than a lightweight API layer, 
>>>>>>> and go with the more RPC-style features, and I get that.
>>>>>>> 
>>>>>>> I still think it's choosing something that's more complex to 
>>>>>>> maintain over a simpler, more accessible option, but since I 
>>>>>>> won't be the one writing it, that's not really for me to worry 
>>>>>>> about. I am very curious to see how this evolves as the work 
>>>>>>> towards multitenancy really kicks in and all the API schemas 
>>>>>>> need to change to add namespacing!
>>>>>>> 
>>>>>>> Andrew
>>>>>>> 
>>>>>>> On Wed, Aug 10, 2022 at 12:52 PM Jarek Potiuk <jarek@potiuk.com 
>>>>>>> <ma...@potiuk.com>> wrote:
>>>>>>>> Sure I can explain - the main reasons are:
>>>>>>>> 
>>>>>>>> - 1) binary representation performance - impact of this is 
>>>>>>>> rather limited because our API calls are doing rather heavy 
>>>>>>>> processing compared to the data being transmitted. But I 
>>>>>>>> believe it's not negligible.
>>>>>>>> - 2) automated tools to automatically generate strongly typed 
>>>>>>>> Python code (that's the ProtoBuf part). The strongly typed 
>>>>>>>> Python code is what convinced me (see my POC). The tooling we 
>>>>>>>> got for that is excellent. Far more superior than dealing with 
>>>>>>>> json-encoded data even with schema.
>>>>>>>> - 2) built-in "remote procedure" layer - where we have 
>>>>>>>> request/response semantics optimisations (for multiple calls 
>>>>>>>> over the same chanel) and exception handling done for us (This 
>>>>>>>> is basically what "Remote Procedure" interface provide us)
>>>>>>>> - 3) built-in server that efficiently distributes the method 
>>>>>>>> called from multiple client into a 
>>>>>>>> multi-threaded/multi-threaded execution (all individual calls 
>>>>>>>> are stateless so multi-processing can be added on top 
>>>>>>>> regardless from the "transport" chosen).
>>>>>>>> 
>>>>>>>> BTW. If you look at my POC code, there is nothing that 
>>>>>>>> strongly "binds" us to gRPC. The nice thing is that once it is 
>>>>>>>> implemented, it can be swapped out very easily. The only 
>>>>>>>> Proto/gRPC code that "leaks" to "generic" Airflow code is 
>>>>>>>> mapping of some (most complex) parameters to Proto . And this 
>>>>>>>> is only for most complex cases - literally only few of our 
>>>>>>>> types require custom serialisation - most of the mapping is 
>>>>>>>> handled automatically by generated protobuf code. And we can 
>>>>>>>> easily put the "complex" mapping in a separate package. Plus 
>>>>>>>> there is an "if" statement for each of the ~ 30 or so methods 
>>>>>>>> that we will have to turn into remotely-callable. We can even 
>>>>>>>> (as I proposed it as an option) add a little python magic and 
>>>>>>>> add a simple decorator to handle the "ifs". Then the decorator 
>>>>>>>> "if" can be swapped with some other "remote call" 
>>>>>>>> implementation.
>>>>>>>> 
>>>>>>>> The actual bulk of the implementation is to make sure that all 
>>>>>>>> the places are covered (that's the testing harness).
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin 
>>>>>>>> <an...@astronomer.io.invalid> wrote:
>>>>>>>>> Hi Jarek,
>>>>>>>>> 
>>>>>>>>> Apologies - I was as not involved as I wanted to be in the 
>>>>>>>>> AIP-44 process, and obviously my vote is non-binding anyway - 
>>>>>>>>> but having done a lot of Python API development over the past 
>>>>>>>>> decade or so I wanted to know why the decision was made to go 
>>>>>>>>> with gRPC over just plain HTTP+JSON (with a schema, of 
>>>>>>>>> course).
>>>>>>>>> 
>>>>>>>>> The AIP covers why XMLRPC and Thrift lost out to gRPC, which 
>>>>>>>>> I agree with - but does not go into the option of using a 
>>>>>>>>> standard Python HTTP server with JSON schema enforcement, 
>>>>>>>>> such as FastAPI. In my experience, the tools for load 
>>>>>>>>> balancing, debugging, testing and monitoring JSON/HTTP are 
>>>>>>>>> superior and more numerous than those for gRPC, and in 
>>>>>>>>> addition the asynchronous support for gRPC servers is lacking 
>>>>>>>>> compared to their plain HTTP counterparts, and the fact that 
>>>>>>>>> you can interact and play with the APIs in prototyping stages 
>>>>>>>>> without having to handle obtaining correct protobuf versions 
>>>>>>>>> for the Airflow version you're using.
>>>>>>>>> 
>>>>>>>>> I wouldn't go so far as to suggest a veto, but I do want to 
>>>>>>>>> see the AIP address why gRPC would win over this option. 
>>>>>>>>> Apologies again for the late realisation that gRPC got chosen 
>>>>>>>>> and was being voted on - it's been a very busy summer.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Andrew
>>>>>>>>> 
>>>>>>>>> On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk 
>>>>>>>>> <jarek@potiuk.com <ma...@potiuk.com>> wrote:
>>>>>>>>>> Just let me express my rather strong dissatisfaction with 
>>>>>>>>>> the way this "last minute" is raised.
>>>>>>>>>> 
>>>>>>>>>> It is very late to come up with such a statement - not that 
>>>>>>>>>> it comes at all, but when it comes when everyone had a 
>>>>>>>>>> chance to take a look and comment, including taking a look 
>>>>>>>>>> at the POC and result of checks. This has never been raised 
>>>>>>>>>> even 4 months ago where the only choices were Thrift and 
>>>>>>>>>> gRPc).
>>>>>>>>>> 
>>>>>>>>>> I REALLY hope the arguments are very strong and backed by 
>>>>>>>>>> real examples and data why it is a bad choice rather than 
>>>>>>>>>> opinions.
>>>>>>>>>> 
>>>>>>>>>> J,.
>>>>>>>>>> 
>>>>>>>>>> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor 
>>>>>>>>>> <ash@apache.org <ma...@apache.org>> wrote:
>>>>>>>>>>> Sorry to weigh in at the last minute, but I'm wary of gRPC 
>>>>>>>>>>> over just JSON, so -1 to that specific choice. Everything 
>>>>>>>>>>> else I'm happy with.
>>>>>>>>>>> 
>>>>>>>>>>> I (or Andrew G) will follow up with more details shortly.
>>>>>>>>>>> 
>>>>>>>>>>> -ash
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk 
>>>>>>>>>>> <jarek@potiuk.com <ma...@potiuk.com>> wrote:
>>>>>>>>>>>> Oh yeah :)
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang 
>>>>>>>>>>>> <pingzh@umich.edu <ma...@umich.edu>> wrote:
>>>>>>>>>>>>> ah, good call.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I guess the email template can be updated:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Only votes from PMC members are binding, but members of 
>>>>>>>>>>>>>> the community are encouraged to check the AIP and vote 
>>>>>>>>>>>>>> with "(non-binding)".
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> +1 (binding)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Ping
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk 
>>>>>>>>>>>>> <jarek@potiuk.com <ma...@potiuk.com>> wrote:
>>>>>>>>>>>>>> Thank you . And BTW. It's binding Ping :). For AIP's 
>>>>>>>>>>>>>> commiter's votes are binding. See 
>>>>>>>>>>>>>> <https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy> 
>>>>>>>>>>>>>> :D
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang 
>>>>>>>>>>>>>> <pingzh@umich.edu <ma...@umich.edu>> wrote:
>>>>>>>>>>>>>>> +1 (non-binding)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Ping
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk 
>>>>>>>>>>>>>>> <jarek@potiuk.com <ma...@potiuk.com>> wrote:
>>>>>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I would like to cast a vote for "AIP-44 - Airflow 
>>>>>>>>>>>>>>>> Internal API".
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The AIP-44 is here: 
>>>>>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API>
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Discussion thread: 
>>>>>>>>>>>>>>>> <https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3>
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The voting will last for 5 days (until 9th of August 
>>>>>>>>>>>>>>>> 2022 11:00 CEST), and until at least 3 binding votes 
>>>>>>>>>>>>>>>> have been cast.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Please vote accordingly:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> [ ] + 1 approve
>>>>>>>>>>>>>>>> [ ] + 0 no opinion
>>>>>>>>>>>>>>>> [ ] - 1 disapprove with the reason
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Only votes from PMC members are binding, but members 
>>>>>>>>>>>>>>>> of the community are encouraged to check the AIP and 
>>>>>>>>>>>>>>>> vote with "(non-binding)".
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> ----
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Just a summary of where we are:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> It's been long in the making, but I think it might be 
>>>>>>>>>>>>>>>> a great step-forward to our long-term multi-tenancy 
>>>>>>>>>>>>>>>> goal. I believe the proposal I have is quite well 
>>>>>>>>>>>>>>>> thought out and discussed a lot in the past:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> * we have a working POC for implementation used for 
>>>>>>>>>>>>>>>> performance testing:  
>>>>>>>>>>>>>>>> <https://github.com/apache/airflow/pull/25094>
>>>>>>>>>>>>>>>> * it is based on on industry-standard open-source gRPC 
>>>>>>>>>>>>>>>> (which is already our dependency) which fits better 
>>>>>>>>>>>>>>>> the RPC "model" we need than our public REST API.
>>>>>>>>>>>>>>>> * it has moderate performance impact and rather good 
>>>>>>>>>>>>>>>> maintainability features (very little impact on 
>>>>>>>>>>>>>>>> regular development effort)
>>>>>>>>>>>>>>>> * it is fully backwards compatible - the new DB 
>>>>>>>>>>>>>>>> isolation will be an optional feature
>>>>>>>>>>>>>>>> * has a solid plan for full test coverage in our CI
>>>>>>>>>>>>>>>> * has a backing and plans for more extensive complete 
>>>>>>>>>>>>>>>> testing in "real" environment with Google Composer 
>>>>>>>>>>>>>>>> team support
>>>>>>>>>>>>>>>> * allows for further extensions as part of AIP-1 (I am 
>>>>>>>>>>>>>>>> planning to re-establish sig-multitenancy effort for 
>>>>>>>>>>>>>>>> follow up AIPs once this one is well in progress).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> J.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>> 
>>> 
>>> --
>>> Eugene


Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Jarek Potiuk <ja...@potiuk.com>.
I made the updates in
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
to reflect the above. As I suspected, there were really very few changes
needed, to make it "GRPC/JSON OpenAPI" agnostic. Ir does not change the
"gist" and "purpose" of the AIP and we can easily turn it into
implementation detail after extended POC is complete.

On Thu, Aug 11, 2022 at 9:08 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Hey Ash, Andrew,
>
> TL;DR; I slept over it to try to understand what just happened, and I have
> a proposal.
>
> * I am happy to extend my POC with the pure JSon/Open API approach -
> providing that Andrew/Ash point me to some good examples where the RPC
> style API has been implemented. If you could point me in the right
> direction, I am a maverick kind of person and just find my ways there.
> * I propose we conclude the vote (with added reservation in it, that final
> choice of the transport will be based on the extended POC) as planned.
> * I am rather curious to try the things that both of you mentioned you are
> concerned about (threading/multiprocessing/load balancing) and see how they
> compare in both approaches - before making final decision and doing first,
> founding PR
>
> More comments (personal feeling):
>
> I really treat your comments - (now that they finally came), seriously.
> And value your expert knowledge and experience. And I personally expect the
> same in return. I always respond to any AIPs/discussions when I am given
> enough time and opportunity to comment, I do - In due time. This time it
> did not happen  - not sure why. I personally feel this is a little
> disrespectful to my work, but I do not hold any grudges. I am happy to put
> that completely aside - now that I have your attention and can use
> your experience, the goal is that we do not make any personal complaints or
> escalations here, we are all here to make Airflow a better product.
> Community is the most important, respect for each other's work is an
> important part of it. And I hope this can improve in the future AIP
> discussions.
>
> But going back to the merit of the discussion:
>
> Just to let everyone know I am not too "vested" in gRPC . I spend a
> little time on implementing the POC with it and after doing the POC and
> reading testimonies of people I feel this is the right choice. But for this
> AIP. I made deliberate decisions in the way that the transport can be
> swapped out if we find reasons we should.
>
> I am reasonably happy with the way proposed by Ash (In fact I am going to
> update the AIP with it in a moment). For me the way how we actually
> implement the "if"  is an implementation detail that will be worked out on
> the first PR. The way proposed is good for me, though I would rather
> experiment a bit with decorators and see if we can make it a bit nicer -
> but  this is not a dealbreaker at all for me. One concern I have is that we
> will have another abstraction layer (possibly needless) and that we will
> have to again repeat all the methods signatures and parameters and keep
> them in sync (they will now be repeated in 4 or 5 places). But again - this
> is something that can be likely done in the first "founding" PR we are
> going to iterate on and work out the best balance between duplication and
> flexibility/maintainability. And we can always update the AIP with this
> implementation detail later - very much like it happend with a number of
> AIPs before - including AIP-39, AIP-40, AIP-42 - all of them went through
> similar rounds of updates and clarifications as implementation was
> progressing. It's hard to work out all the details "on paper" or even in
> "POC". Also I would REALLY love to tap in the experience of people like the
> both of you, but it seems that the only way to get some good and serious
> feedback is to call a vote (or make a PR with the intention of merging it).
>
> But I even can go further than that - I think independently from voting
> whether the whole AIP-44 is a good or bad idea. I think there is no doubt
> we need it, and the "general scope" and approach seems to already reach
> general consensus, so if we can just continue and complete the vote. I am
> happy to continue running the POC on using OpenAPI spec and gRPC in
> parallel and see how they compare. I am always eager to learn and try other
> things, and if you have valid concerns I am happy to address them by trying
> out. I would personally like to compare both from the development
> "friction" point of view, performance, as well as doing some tests trying
> to address and test the operational (process/thread/load balancing)
> concerns you both have and see how they can be solved in both and compare.
> I think there is nothing like "show me the code" and performing
> actual working POC.
>
> And I am super happy to continue with the POC and extend it with a pure
> JSON/OpenAPI based proposal after voting completes and make the final
> decision during the first founding PR. And we can even arrange extra votes
> or lazy consensus before the first PR lands - after seeing all the
> "ins/outs".
> The Founding PR is still quite a bit away - I do not want to make any
> commits before we branch-off the 2.4 - and I don't even want to take too
> much of your time for that other than discussion and raising concerns and
> commenting on my findings. I am happy to do all the ground-work here.
>
> That's all I ask for. Just treating the work I do seriously.
>
> So Ash, Andrew
>
> Can you please point me to some examples where RPC-API like ours has been
> implemented with Open API/JSON? I am curious to learn from those and turn
> them into POC.
>
> And I propose - let's just continue the vote as planned. We already have 3
> binding votes, and more +1s than -1s and the time has already passed, but I
> am happy to let it run till the end of day tomorrow to see if my proposal
> above will be good to conclude the vote with more consent from all the
> people involved.
>
> J.
>
>
>
>
> On Thu, Aug 11, 2022 at 7:49 AM Eugen Kosteev <eu...@kosteev.com> wrote:
>
>> +1 (non-binding)
>>
>> On Thu, Aug 11, 2022 at 12:08 AM Ash Berlin-Taylor <as...@apache.org>
>> wrote:
>>
>>> So my concerns (in addition to the ones Andrew already pointed out) with
>>> gRPC: it uses threads! Threads + python always makes me nervous. Doubly so
>>> when you then couple that with DB access via sqlalchemy - we're heading
>>> down paths less well travelled if we do this.
>>>
>>> From https://github.com/grpc/grpc/blob/master/doc/fork_support.md
>>>
>>> > gRPC Python wraps gRPC core, which uses multithreading for
>>> performance, and hence doesn't support fork(). Historically, we didn't
>>> support forking in gRPC, but some users seemed to be doing fine until their
>>> code started to break on version 1.6. This was likely caused by the
>>> addition of background c-threads and a background Python thread
>>>
>>> And there's https://github.com/grpc/grpc/issues/16001 which may or may
>>> not be a problem for us, I'm unclear:
>>>
>>> > The main constraint that must be satisified is that your process must
>>> only invoke the fork() syscall *before* you create your gRPC server(s).
>>>
>>>
>>> But anyway, the only change I'd like to see is to make the internal API
>>> more pluggable.
>>>
>>> So instead of something like this:
>>>
>>>     def process_file(
>>>         self,
>>>         file_path: str,
>>>         callback_requests: List[CallbackRequest],
>>>         pickle_dags: bool = False,
>>>     ) -> Tuple[int, int]:
>>>         if self.use_grpc:
>>>             return self.process_file_grpc(
>>>                 file_path=file_path,
>>> callback_requests=callback_requests, pickle_dags=pickle_dags
>>>             )
>>>         return self.process_file_db(
>>>             file_path=file_path, callback_requests=callback_requests,
>>> pickle_dags=pickle_dags
>>>         )
>>>
>>> I'd very much like us to have it be something like this:
>>>
>>>     def process_file(
>>>         self,
>>>         file_path: str,
>>>         callback_requests: List[CallbackRequest],
>>>         pickle_dags: bool = False,
>>>     ) -> Tuple[int, int]:
>>>         if settings.DATABASE_ACCESS_ISOLATION:
>>>             return InternalAPIClient.dagbag_process_file(
>>>                 file_path=file_path,
>>> callback_requests=callback_requests, pickle_dags=pickle_dags
>>>             )
>>>         return self.process_file_db(
>>>             file_path=file_path, callback_requests=callback_requests,
>>> pickle_dags=pickle_dags
>>>         )
>>>
>>> i.e. all API access is "marshalled" (perhaps the wrong word) via a
>>> single pluggable (and eventually configurable/replaceable) API client.
>>> Additionally (though less important) `self.use_grpc` is changed to a
>>> property on the `airflow.settings` module (as it is a global setting, not
>>> an attribute/property of any single instance.)
>>>
>>> (In my mind the API client would include the from_protobuff/to_protobuff
>>> methods that you added to TaskInstance on your POC PR. Or the from_protbuff
>>> could live in the FileProcessorServiceServicer class in your example, but
>>> that probably doesn't scale when multiple services would take a
>>> TI/SimpleTI. I guess they could still also live on TaskInstance et al and
>>> just not be used - but that doesn't feel as clean to me is all)
>>>
>>> Thoughts? It's not too big change to encapsulate things like this I hope?
>>>
>>> Sorry again that we didn't look at the recent work on this AIP sooner.
>>>
>>> -ash
>>>
>>>
>>>
>>> On Wed, Aug 10 2022 at 13:51:02 -06:00:00, Andrew Godwin
>>> <an...@astronomer.io.INVALID> wrote:
>>>
>>> I also wish we'd had this discussion before!
>>>
>>> I've converted several million lines of code into API-driven
>>> services over the past decade so I have my ways and I'm set in them, I
>>> guess :)
>>>
>>> Notice that I didn't say "use REST" - I don't think REST maps well to
>>> RPC style codebases, and agree the whole method thing is confusing. I just
>>> mean "ship JSON in a POST and receive JSON in the response".
>>>
>>> As you said before though, the line where you draw the abstraction is
>>> what matters more than the transport layer, and "fat endpoints" (doing
>>> transactions and multiple calls on the API server etc.) is, I agree, the
>>> way this has to go, so it's not like this is something I think is totally
>>> wrong, just that I've repeatedly tried gRPC on projects like this and been
>>> disappointed with what it actually takes to ship changes and
>>> prototype things quickly. Nothing beats the ability to just throw curl at a
>>> problem - or maybe that's just me.
>>>
>>> Anyway, I'll leave you to it - I have my own ideas in this area I'll be
>>> noodling on, but it's a bit of a different take and aimed more at execution
>>> as a whole, so I'll come back and discuss them if they're successful.
>>>
>>> Andrew
>>>
>>> On Wed, Aug 10, 2022 at 1:36 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>>> 1st of all - I wish we had this discussion before :)
>>>>
>>>> > The autogenerated code also is available from OpenAPI specs, of
>>>> course, and the request/response semantics in the same thread are precisely
>>>> what make load balancing these calls a little harder, and horizontal
>>>> scaling to multiple threads comes with every HTTP server, but I digress -
>>>> clearly you've made a choice here to write an RPC layer rather than a
>>>> lightweight API layer, and go with the more RPC-style features, and I get
>>>> that.
>>>>
>>>> OpenAPI is REST not RPC and it does not map well to RPC style calls. I
>>>> tried to explain in detail in the AIP (and in earlier discussions). And the
>>>> choice is basically made for us because of our expectation to have
>>>> minimal impact on the existing code (and ability to switch off the remote
>>>> part). We really do not want to introduce new API calls.  We want to make
>>>> sure that existing "DB transactions" (i.e. coarse grained calls) are
>>>> remotely executed. So we are not really talking about lightweight API
>>>> almost by definition. Indeed, Open API also maps a definition described in
>>>> a declarative way to python code.  but it has this non-nice part that
>>>> OpenAPI/REST, it is supposed to be run on some resources. We have no
>>>> resources to run it on - every single method we call is doing something.
>>>> Even from the REST/OpenAPI semantics I'd have a really hard time to decide
>>>> whether it should be GET, POST or PUT or DELETE. In most cases this will be
>>>> a combination of those 4 on several different resources. All the "nice
>>>> parts" of Open API (Swagger UI etc.) become next to useless if you try to
>>>> map such remote procedures we have, to REST calls.
>>>>
>>>>
>>>> > I still think it's choosing something that's more complex to maintain
>>>> over a simpler, more accessible option, but since I won't be the one
>>>> writing it, that's not really for me to worry about. I am very curious to
>>>> see how this evolves as the work towards multitenancy really kicks in and
>>>> all the API schemas need to change to add namespacing!
>>>>
>>>> The gRPC (proto) is designed from ground-up with maintainability in
>>>> mind. The ability to evolve the API, add new parameters etc. is built-in
>>>> the protobuf definition. From the user perspective it's actually easier to
>>>> use than OpenAPI when it comes to remote method calls, because you
>>>> basically - call a method.
>>>>
>>>> And also coming back to monitoring - literally every monitoring
>>>> platform supports gRPC to monitor. Grafana, NewRelic, CloudWatch, Datadog,
>>>> you name it. In our case.
>>>>
>>>> Also load-balancing:
>>>>
>>>> Regardless of the choice we talk about HTTP request/response happening
>>>> for 1 call. This is the boundary. Each of the calls we have will be a
>>>> separate transaction, separate call, not connected to any other call. The
>>>> server handling it will be stateless (state will be stored in a DB when
>>>> each call completes). I deliberately put the "boundary" of each of
>>>> the remotely callable methods, to be a complete DB transaction to achieve
>>>> it.
>>>>
>>>> So It really does not change whether we use gRPC or REST/JSON.
>>>> REST/JSON vs. gRPC is just the content of the message, but this is the very
>>>> same HTTP call, with same authentication added on top, same headers - just
>>>> how the message is encoded is different. The same tools for load balancing
>>>> works in the same way regardless if we use gRPC or REST/JSON. This is
>>>> really a higher layer than the one involved in load balancing.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Aug 10, 2022 at 9:10 PM Andrew Godwin
>>>> <an...@astronomer.io.invalid> wrote:
>>>>
>>>>> Well, binary representation serialization performance is worse for
>>>>> protobuf than for JSON in my experience (in Python specifically) - unless
>>>>> you mean size-on-the-wire, which I can agree with but tend to not worry
>>>>> about very much since it's rarely a bottleneck.
>>>>>
>>>>> The autogenerated code also is available from OpenAPI specs, of
>>>>> course, and the request/response semantics in the same thread are precisely
>>>>> what make load balancing these calls a little harder, and horizontal
>>>>> scaling to multiple threads comes with every HTTP server, but I digress -
>>>>> clearly you've made a choice here to write an RPC layer rather than a
>>>>> lightweight API layer, and go with the more RPC-style features, and I get
>>>>> that.
>>>>>
>>>>> I still think it's choosing something that's more complex to maintain
>>>>> over a simpler, more accessible option, but since I won't be the one
>>>>> writing it, that's not really for me to worry about. I am very curious to
>>>>> see how this evolves as the work towards multitenancy really kicks in and
>>>>> all the API schemas need to change to add namespacing!
>>>>>
>>>>> Andrew
>>>>>
>>>>> On Wed, Aug 10, 2022 at 12:52 PM Jarek Potiuk <ja...@potiuk.com>
>>>>> wrote:
>>>>>
>>>>>> Sure I can explain - the main reasons are:
>>>>>>
>>>>>> - 1) binary representation performance - impact of this is
>>>>>> rather limited because our API calls are doing rather heavy processing
>>>>>> compared to the data being transmitted. But I believe it's not negligible.
>>>>>> - 2) automated tools to automatically generate strongly typed Python
>>>>>> code (that's the ProtoBuf part). The strongly typed Python code is what
>>>>>> convinced me (see my POC). The tooling we got for that is excellent. Far
>>>>>> more superior than dealing with json-encoded data even with schema.
>>>>>> - 2) built-in "remote procedure" layer - where we have
>>>>>> request/response semantics optimisations (for multiple calls over the same
>>>>>> chanel) and exception handling done for us (This is basically what "Remote
>>>>>> Procedure" interface provide us)
>>>>>> - 3) built-in server that efficiently distributes the method called
>>>>>> from multiple client into a multi-threaded/multi-threaded execution (all
>>>>>> individual calls are stateless so multi-processing can be added on top
>>>>>> regardless from the "transport" chosen).
>>>>>>
>>>>>> BTW. If you look at my POC code, there is nothing that strongly
>>>>>> "binds" us to gRPC. The nice thing is that once it is implemented, it can
>>>>>> be swapped out very easily. The only Proto/gRPC code that "leaks" to
>>>>>> "generic" Airflow code is mapping of some (most complex) parameters to
>>>>>> Proto . And this is only for most complex cases - literally only few of our
>>>>>> types require custom serialisation - most of the mapping is handled
>>>>>> automatically by generated protobuf code. And we can easily put
>>>>>> the "complex" mapping in a separate package. Plus there is an "if"
>>>>>> statement for each of the ~ 30 or so methods that we will have to turn into
>>>>>> remotely-callable. We can even (as I proposed it as an option) add a little
>>>>>> python magic and add a simple decorator to handle the "ifs". Then the
>>>>>> decorator "if" can be swapped with some other "remote call" implementation.
>>>>>>
>>>>>> The actual bulk of the implementation is to make sure that all the
>>>>>> places are covered (that's the testing harness).
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin
>>>>>> <an...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>>> Hi Jarek,
>>>>>>>
>>>>>>> Apologies - I was as not involved as I wanted to be in the AIP-44
>>>>>>> process, and obviously my vote is non-binding anyway - but having done a
>>>>>>> lot of Python API development over the past decade or so I wanted to know
>>>>>>> why the decision was made to go with gRPC over just plain HTTP+JSON (with a
>>>>>>> schema, of course).
>>>>>>>
>>>>>>> The AIP covers why XMLRPC and Thrift lost out to gRPC, which I agree
>>>>>>> with - but does not go into the option of using a standard Python HTTP
>>>>>>> server with JSON schema enforcement, such as FastAPI. In my experience, the
>>>>>>> tools for load balancing, debugging, testing and monitoring JSON/HTTP are
>>>>>>> superior and more numerous than those for gRPC, and in addition the
>>>>>>> asynchronous support for gRPC servers is lacking compared to their plain
>>>>>>> HTTP counterparts, and the fact that you can interact and play with the
>>>>>>> APIs in prototyping stages without having to handle obtaining correct
>>>>>>> protobuf versions for the Airflow version you're using.
>>>>>>>
>>>>>>> I wouldn't go so far as to suggest a veto, but I do want to see the
>>>>>>> AIP address why gRPC would win over this option. Apologies again for the
>>>>>>> late realisation that gRPC got chosen and was being voted on - it's been a
>>>>>>> very busy summer.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Andrew
>>>>>>>
>>>>>>> On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <ja...@potiuk.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Just let me express my rather strong dissatisfaction with the way
>>>>>>>> this "last minute" is raised.
>>>>>>>>
>>>>>>>> It is very late to come up with such a statement - not that it
>>>>>>>> comes at all, but when it comes when everyone had a chance to take a look
>>>>>>>> and comment, including taking a look at the POC and result of checks. This
>>>>>>>> has never been raised even 4 months ago where the only choices were Thrift
>>>>>>>> and gRPc).
>>>>>>>>
>>>>>>>> I REALLY hope the arguments are very strong and backed by real
>>>>>>>> examples and data why it is a bad choice rather than opinions.
>>>>>>>>
>>>>>>>> J,.
>>>>>>>>
>>>>>>>> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <as...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Sorry to weigh in at the last minute, but I'm wary of gRPC over
>>>>>>>>> just JSON, so -1 to that specific choice. Everything else I'm happy with.
>>>>>>>>>
>>>>>>>>> I (or Andrew G) will follow up with more details shortly.
>>>>>>>>>
>>>>>>>>> -ash
>>>>>>>>>
>>>>>>>>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <
>>>>>>>>> jarek@potiuk.com> wrote:
>>>>>>>>>
>>>>>>>>> Oh yeah :)
>>>>>>>>>
>>>>>>>>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pi...@umich.edu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> ah, good call.
>>>>>>>>>>
>>>>>>>>>> I guess the email template can be updated:
>>>>>>>>>>
>>>>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>>>>> community are encouraged to check the AIP and vote with "(non-binding)".
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> +1 (binding)
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Ping
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's
>>>>>>>>>>> votes are binding. See
>>>>>>>>>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>>>>>>>>>>> :D
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +1 (non-binding)
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Ping
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal
>>>>>>>>>>>>> API".
>>>>>>>>>>>>>
>>>>>>>>>>>>> The AIP-44 is here:
>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>>>>>>>>>>
>>>>>>>>>>>>> Discussion thread:
>>>>>>>>>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>>>>>>>>>>
>>>>>>>>>>>>> The voting will last for 5 days (until 9th of August 2022
>>>>>>>>>>>>> 11:00 CEST), and until at least 3 binding votes have been cast
>>>>>>>>>>>>> .
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please vote accordingly:
>>>>>>>>>>>>>
>>>>>>>>>>>>> [ ] + 1 approve
>>>>>>>>>>>>> [ ] + 0 no opinion
>>>>>>>>>>>>> [ ] - 1 disapprove with the reason
>>>>>>>>>>>>>
>>>>>>>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>>>>>>> community are encouraged to check the AIP and vote with
>>>>>>>>>>>>> "(non-binding)".
>>>>>>>>>>>>>
>>>>>>>>>>>>> ----
>>>>>>>>>>>>>
>>>>>>>>>>>>> Just a summary of where we are:
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's been long in the making, but I think it might be a great
>>>>>>>>>>>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>>>>>>>>>>>> have is quite well thought out and discussed a lot in the past:
>>>>>>>>>>>>>
>>>>>>>>>>>>> * we have a working POC for implementation used for
>>>>>>>>>>>>> performance testing:
>>>>>>>>>>>>> https://github.com/apache/airflow/pull/25094
>>>>>>>>>>>>> * it is based on on industry-standard open-source gRPC (which
>>>>>>>>>>>>> is already our dependency) which fits better the RPC "model" we need than
>>>>>>>>>>>>> our public REST API.
>>>>>>>>>>>>> * it has moderate performance impact and rather good
>>>>>>>>>>>>> maintainability features (very little impact on regular development effort)
>>>>>>>>>>>>> * it is fully backwards compatible - the new DB isolation will
>>>>>>>>>>>>> be an optional feature
>>>>>>>>>>>>> * has a solid plan for full test coverage in our CI
>>>>>>>>>>>>> * has a backing and plans for more extensive complete testing
>>>>>>>>>>>>> in "real" environment with Google Composer team support
>>>>>>>>>>>>> * allows for further extensions as part of AIP-1 (I am
>>>>>>>>>>>>> planning to re-establish sig-multitenancy effort for follow up AIPs once
>>>>>>>>>>>>> this one is well in progress).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> J.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>
>> --
>> Eugene
>>
>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Jarek Potiuk <ja...@potiuk.com>.
Hey Ash, Andrew,

TL;DR; I slept over it to try to understand what just happened, and I have
a proposal.

* I am happy to extend my POC with the pure JSon/Open API approach -
providing that Andrew/Ash point me to some good examples where the RPC
style API has been implemented. If you could point me in the right
direction, I am a maverick kind of person and just find my ways there.
* I propose we conclude the vote (with added reservation in it, that final
choice of the transport will be based on the extended POC) as planned.
* I am rather curious to try the things that both of you mentioned you are
concerned about (threading/multiprocessing/load balancing) and see how they
compare in both approaches - before making final decision and doing first,
founding PR

More comments (personal feeling):

I really treat your comments - (now that they finally came), seriously. And
value your expert knowledge and experience. And I personally expect the
same in return. I always respond to any AIPs/discussions when I am given
enough time and opportunity to comment, I do - In due time. This time it
did not happen  - not sure why. I personally feel this is a little
disrespectful to my work, but I do not hold any grudges. I am happy to put
that completely aside - now that I have your attention and can use
your experience, the goal is that we do not make any personal complaints or
escalations here, we are all here to make Airflow a better product.
Community is the most important, respect for each other's work is an
important part of it. And I hope this can improve in the future AIP
discussions.

But going back to the merit of the discussion:

Just to let everyone know I am not too "vested" in gRPC . I spend a
little time on implementing the POC with it and after doing the POC and
reading testimonies of people I feel this is the right choice. But for this
AIP. I made deliberate decisions in the way that the transport can be
swapped out if we find reasons we should.

I am reasonably happy with the way proposed by Ash (In fact I am going to
update the AIP with it in a moment). For me the way how we actually
implement the "if"  is an implementation detail that will be worked out on
the first PR. The way proposed is good for me, though I would rather
experiment a bit with decorators and see if we can make it a bit nicer -
but  this is not a dealbreaker at all for me. One concern I have is that we
will have another abstraction layer (possibly needless) and that we will
have to again repeat all the methods signatures and parameters and keep
them in sync (they will now be repeated in 4 or 5 places). But again - this
is something that can be likely done in the first "founding" PR we are
going to iterate on and work out the best balance between duplication and
flexibility/maintainability. And we can always update the AIP with this
implementation detail later - very much like it happend with a number of
AIPs before - including AIP-39, AIP-40, AIP-42 - all of them went through
similar rounds of updates and clarifications as implementation was
progressing. It's hard to work out all the details "on paper" or even in
"POC". Also I would REALLY love to tap in the experience of people like the
both of you, but it seems that the only way to get some good and serious
feedback is to call a vote (or make a PR with the intention of merging it).

But I even can go further than that - I think independently from voting
whether the whole AIP-44 is a good or bad idea. I think there is no doubt
we need it, and the "general scope" and approach seems to already reach
general consensus, so if we can just continue and complete the vote. I am
happy to continue running the POC on using OpenAPI spec and gRPC in
parallel and see how they compare. I am always eager to learn and try other
things, and if you have valid concerns I am happy to address them by trying
out. I would personally like to compare both from the development
"friction" point of view, performance, as well as doing some tests trying
to address and test the operational (process/thread/load balancing)
concerns you both have and see how they can be solved in both and compare.
I think there is nothing like "show me the code" and performing
actual working POC.

And I am super happy to continue with the POC and extend it with a pure
JSON/OpenAPI based proposal after voting completes and make the final
decision during the first founding PR. And we can even arrange extra votes
or lazy consensus before the first PR lands - after seeing all the
"ins/outs".
The Founding PR is still quite a bit away - I do not want to make any
commits before we branch-off the 2.4 - and I don't even want to take too
much of your time for that other than discussion and raising concerns and
commenting on my findings. I am happy to do all the ground-work here.

That's all I ask for. Just treating the work I do seriously.

So Ash, Andrew

Can you please point me to some examples where RPC-API like ours has been
implemented with Open API/JSON? I am curious to learn from those and turn
them into POC.

And I propose - let's just continue the vote as planned. We already have 3
binding votes, and more +1s than -1s and the time has already passed, but I
am happy to let it run till the end of day tomorrow to see if my proposal
above will be good to conclude the vote with more consent from all the
people involved.

J.




On Thu, Aug 11, 2022 at 7:49 AM Eugen Kosteev <eu...@kosteev.com> wrote:

> +1 (non-binding)
>
> On Thu, Aug 11, 2022 at 12:08 AM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> So my concerns (in addition to the ones Andrew already pointed out) with
>> gRPC: it uses threads! Threads + python always makes me nervous. Doubly so
>> when you then couple that with DB access via sqlalchemy - we're heading
>> down paths less well travelled if we do this.
>>
>> From https://github.com/grpc/grpc/blob/master/doc/fork_support.md
>>
>> > gRPC Python wraps gRPC core, which uses multithreading for performance,
>> and hence doesn't support fork(). Historically, we didn't support
>> forking in gRPC, but some users seemed to be doing fine until their code
>> started to break on version 1.6. This was likely caused by the addition of
>> background c-threads and a background Python thread
>>
>> And there's https://github.com/grpc/grpc/issues/16001 which may or may
>> not be a problem for us, I'm unclear:
>>
>> > The main constraint that must be satisified is that your process must
>> only invoke the fork() syscall *before* you create your gRPC server(s).
>>
>>
>> But anyway, the only change I'd like to see is to make the internal API
>> more pluggable.
>>
>> So instead of something like this:
>>
>>     def process_file(
>>         self,
>>         file_path: str,
>>         callback_requests: List[CallbackRequest],
>>         pickle_dags: bool = False,
>>     ) -> Tuple[int, int]:
>>         if self.use_grpc:
>>             return self.process_file_grpc(
>>                 file_path=file_path, callback_requests=callback_requests,
>> pickle_dags=pickle_dags
>>             )
>>         return self.process_file_db(
>>             file_path=file_path, callback_requests=callback_requests,
>> pickle_dags=pickle_dags
>>         )
>>
>> I'd very much like us to have it be something like this:
>>
>>     def process_file(
>>         self,
>>         file_path: str,
>>         callback_requests: List[CallbackRequest],
>>         pickle_dags: bool = False,
>>     ) -> Tuple[int, int]:
>>         if settings.DATABASE_ACCESS_ISOLATION:
>>             return InternalAPIClient.dagbag_process_file(
>>                 file_path=file_path, callback_requests=callback_requests,
>> pickle_dags=pickle_dags
>>             )
>>         return self.process_file_db(
>>             file_path=file_path, callback_requests=callback_requests,
>> pickle_dags=pickle_dags
>>         )
>>
>> i.e. all API access is "marshalled" (perhaps the wrong word) via a single
>> pluggable (and eventually configurable/replaceable) API client.
>> Additionally (though less important) `self.use_grpc` is changed to a
>> property on the `airflow.settings` module (as it is a global setting, not
>> an attribute/property of any single instance.)
>>
>> (In my mind the API client would include the from_protobuff/to_protobuff
>> methods that you added to TaskInstance on your POC PR. Or the from_protbuff
>> could live in the FileProcessorServiceServicer class in your example, but
>> that probably doesn't scale when multiple services would take a
>> TI/SimpleTI. I guess they could still also live on TaskInstance et al and
>> just not be used - but that doesn't feel as clean to me is all)
>>
>> Thoughts? It's not too big change to encapsulate things like this I hope?
>>
>> Sorry again that we didn't look at the recent work on this AIP sooner.
>>
>> -ash
>>
>>
>>
>> On Wed, Aug 10 2022 at 13:51:02 -06:00:00, Andrew Godwin
>> <an...@astronomer.io.INVALID> wrote:
>>
>> I also wish we'd had this discussion before!
>>
>> I've converted several million lines of code into API-driven
>> services over the past decade so I have my ways and I'm set in them, I
>> guess :)
>>
>> Notice that I didn't say "use REST" - I don't think REST maps well to RPC
>> style codebases, and agree the whole method thing is confusing. I just mean
>> "ship JSON in a POST and receive JSON in the response".
>>
>> As you said before though, the line where you draw the abstraction is
>> what matters more than the transport layer, and "fat endpoints" (doing
>> transactions and multiple calls on the API server etc.) is, I agree, the
>> way this has to go, so it's not like this is something I think is totally
>> wrong, just that I've repeatedly tried gRPC on projects like this and been
>> disappointed with what it actually takes to ship changes and
>> prototype things quickly. Nothing beats the ability to just throw curl at a
>> problem - or maybe that's just me.
>>
>> Anyway, I'll leave you to it - I have my own ideas in this area I'll be
>> noodling on, but it's a bit of a different take and aimed more at execution
>> as a whole, so I'll come back and discuss them if they're successful.
>>
>> Andrew
>>
>> On Wed, Aug 10, 2022 at 1:36 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> 1st of all - I wish we had this discussion before :)
>>>
>>> > The autogenerated code also is available from OpenAPI specs, of
>>> course, and the request/response semantics in the same thread are precisely
>>> what make load balancing these calls a little harder, and horizontal
>>> scaling to multiple threads comes with every HTTP server, but I digress -
>>> clearly you've made a choice here to write an RPC layer rather than a
>>> lightweight API layer, and go with the more RPC-style features, and I get
>>> that.
>>>
>>> OpenAPI is REST not RPC and it does not map well to RPC style calls. I
>>> tried to explain in detail in the AIP (and in earlier discussions). And the
>>> choice is basically made for us because of our expectation to have
>>> minimal impact on the existing code (and ability to switch off the remote
>>> part). We really do not want to introduce new API calls.  We want to make
>>> sure that existing "DB transactions" (i.e. coarse grained calls) are
>>> remotely executed. So we are not really talking about lightweight API
>>> almost by definition. Indeed, Open API also maps a definition described in
>>> a declarative way to python code.  but it has this non-nice part that
>>> OpenAPI/REST, it is supposed to be run on some resources. We have no
>>> resources to run it on - every single method we call is doing something.
>>> Even from the REST/OpenAPI semantics I'd have a really hard time to decide
>>> whether it should be GET, POST or PUT or DELETE. In most cases this will be
>>> a combination of those 4 on several different resources. All the "nice
>>> parts" of Open API (Swagger UI etc.) become next to useless if you try to
>>> map such remote procedures we have, to REST calls.
>>>
>>>
>>> > I still think it's choosing something that's more complex to maintain
>>> over a simpler, more accessible option, but since I won't be the one
>>> writing it, that's not really for me to worry about. I am very curious to
>>> see how this evolves as the work towards multitenancy really kicks in and
>>> all the API schemas need to change to add namespacing!
>>>
>>> The gRPC (proto) is designed from ground-up with maintainability in
>>> mind. The ability to evolve the API, add new parameters etc. is built-in
>>> the protobuf definition. From the user perspective it's actually easier to
>>> use than OpenAPI when it comes to remote method calls, because you
>>> basically - call a method.
>>>
>>> And also coming back to monitoring - literally every monitoring platform
>>> supports gRPC to monitor. Grafana, NewRelic, CloudWatch, Datadog, you name
>>> it. In our case.
>>>
>>> Also load-balancing:
>>>
>>> Regardless of the choice we talk about HTTP request/response happening
>>> for 1 call. This is the boundary. Each of the calls we have will be a
>>> separate transaction, separate call, not connected to any other call. The
>>> server handling it will be stateless (state will be stored in a DB when
>>> each call completes). I deliberately put the "boundary" of each of
>>> the remotely callable methods, to be a complete DB transaction to achieve
>>> it.
>>>
>>> So It really does not change whether we use gRPC or REST/JSON. REST/JSON
>>> vs. gRPC is just the content of the message, but this is the very same HTTP
>>> call, with same authentication added on top, same headers - just how the
>>> message is encoded is different. The same tools for load balancing works in
>>> the same way regardless if we use gRPC or REST/JSON. This is really a
>>> higher layer than the one involved in load balancing.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Aug 10, 2022 at 9:10 PM Andrew Godwin
>>> <an...@astronomer.io.invalid> wrote:
>>>
>>>> Well, binary representation serialization performance is worse for
>>>> protobuf than for JSON in my experience (in Python specifically) - unless
>>>> you mean size-on-the-wire, which I can agree with but tend to not worry
>>>> about very much since it's rarely a bottleneck.
>>>>
>>>> The autogenerated code also is available from OpenAPI specs, of course,
>>>> and the request/response semantics in the same thread are precisely what
>>>> make load balancing these calls a little harder, and horizontal scaling to
>>>> multiple threads comes with every HTTP server, but I digress - clearly
>>>> you've made a choice here to write an RPC layer rather than a lightweight
>>>> API layer, and go with the more RPC-style features, and I get that.
>>>>
>>>> I still think it's choosing something that's more complex to maintain
>>>> over a simpler, more accessible option, but since I won't be the one
>>>> writing it, that's not really for me to worry about. I am very curious to
>>>> see how this evolves as the work towards multitenancy really kicks in and
>>>> all the API schemas need to change to add namespacing!
>>>>
>>>> Andrew
>>>>
>>>> On Wed, Aug 10, 2022 at 12:52 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>
>>>>> Sure I can explain - the main reasons are:
>>>>>
>>>>> - 1) binary representation performance - impact of this is
>>>>> rather limited because our API calls are doing rather heavy processing
>>>>> compared to the data being transmitted. But I believe it's not negligible.
>>>>> - 2) automated tools to automatically generate strongly typed Python
>>>>> code (that's the ProtoBuf part). The strongly typed Python code is what
>>>>> convinced me (see my POC). The tooling we got for that is excellent. Far
>>>>> more superior than dealing with json-encoded data even with schema.
>>>>> - 2) built-in "remote procedure" layer - where we have
>>>>> request/response semantics optimisations (for multiple calls over the same
>>>>> chanel) and exception handling done for us (This is basically what "Remote
>>>>> Procedure" interface provide us)
>>>>> - 3) built-in server that efficiently distributes the method called
>>>>> from multiple client into a multi-threaded/multi-threaded execution (all
>>>>> individual calls are stateless so multi-processing can be added on top
>>>>> regardless from the "transport" chosen).
>>>>>
>>>>> BTW. If you look at my POC code, there is nothing that strongly
>>>>> "binds" us to gRPC. The nice thing is that once it is implemented, it can
>>>>> be swapped out very easily. The only Proto/gRPC code that "leaks" to
>>>>> "generic" Airflow code is mapping of some (most complex) parameters to
>>>>> Proto . And this is only for most complex cases - literally only few of our
>>>>> types require custom serialisation - most of the mapping is handled
>>>>> automatically by generated protobuf code. And we can easily put
>>>>> the "complex" mapping in a separate package. Plus there is an "if"
>>>>> statement for each of the ~ 30 or so methods that we will have to turn into
>>>>> remotely-callable. We can even (as I proposed it as an option) add a little
>>>>> python magic and add a simple decorator to handle the "ifs". Then the
>>>>> decorator "if" can be swapped with some other "remote call" implementation.
>>>>>
>>>>> The actual bulk of the implementation is to make sure that all the
>>>>> places are covered (that's the testing harness).
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin
>>>>> <an...@astronomer.io.invalid> wrote:
>>>>>
>>>>>> Hi Jarek,
>>>>>>
>>>>>> Apologies - I was as not involved as I wanted to be in the AIP-44
>>>>>> process, and obviously my vote is non-binding anyway - but having done a
>>>>>> lot of Python API development over the past decade or so I wanted to know
>>>>>> why the decision was made to go with gRPC over just plain HTTP+JSON (with a
>>>>>> schema, of course).
>>>>>>
>>>>>> The AIP covers why XMLRPC and Thrift lost out to gRPC, which I agree
>>>>>> with - but does not go into the option of using a standard Python HTTP
>>>>>> server with JSON schema enforcement, such as FastAPI. In my experience, the
>>>>>> tools for load balancing, debugging, testing and monitoring JSON/HTTP are
>>>>>> superior and more numerous than those for gRPC, and in addition the
>>>>>> asynchronous support for gRPC servers is lacking compared to their plain
>>>>>> HTTP counterparts, and the fact that you can interact and play with the
>>>>>> APIs in prototyping stages without having to handle obtaining correct
>>>>>> protobuf versions for the Airflow version you're using.
>>>>>>
>>>>>> I wouldn't go so far as to suggest a veto, but I do want to see the
>>>>>> AIP address why gRPC would win over this option. Apologies again for the
>>>>>> late realisation that gRPC got chosen and was being voted on - it's been a
>>>>>> very busy summer.
>>>>>>
>>>>>> Thanks,
>>>>>> Andrew
>>>>>>
>>>>>> On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <ja...@potiuk.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Just let me express my rather strong dissatisfaction with the way
>>>>>>> this "last minute" is raised.
>>>>>>>
>>>>>>> It is very late to come up with such a statement - not that it comes
>>>>>>> at all, but when it comes when everyone had a chance to take a look and
>>>>>>> comment, including taking a look at the POC and result of checks. This has
>>>>>>> never been raised even 4 months ago where the only choices were Thrift and
>>>>>>> gRPc).
>>>>>>>
>>>>>>> I REALLY hope the arguments are very strong and backed by real
>>>>>>> examples and data why it is a bad choice rather than opinions.
>>>>>>>
>>>>>>> J,.
>>>>>>>
>>>>>>> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <as...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Sorry to weigh in at the last minute, but I'm wary of gRPC over
>>>>>>>> just JSON, so -1 to that specific choice. Everything else I'm happy with.
>>>>>>>>
>>>>>>>> I (or Andrew G) will follow up with more details shortly.
>>>>>>>>
>>>>>>>> -ash
>>>>>>>>
>>>>>>>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <
>>>>>>>> jarek@potiuk.com> wrote:
>>>>>>>>
>>>>>>>> Oh yeah :)
>>>>>>>>
>>>>>>>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pi...@umich.edu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> ah, good call.
>>>>>>>>>
>>>>>>>>> I guess the email template can be updated:
>>>>>>>>>
>>>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>>>> community are encouraged to check the AIP and vote with "(non-binding)".
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> +1 (binding)
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Ping
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's
>>>>>>>>>> votes are binding. See
>>>>>>>>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>>>>>>>>>> :D
>>>>>>>>>>
>>>>>>>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1 (non-binding)
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Ping
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>>
>>>>>>>>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>>>>>>>>>>
>>>>>>>>>>>> The AIP-44 is here:
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>>>>>>>>>
>>>>>>>>>>>> Discussion thread:
>>>>>>>>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>>>>>>>>>
>>>>>>>>>>>> The voting will last for 5 days (until 9th of August 2022
>>>>>>>>>>>> 11:00 CEST), and until at least 3 binding votes have been cast.
>>>>>>>>>>>>
>>>>>>>>>>>> Please vote accordingly:
>>>>>>>>>>>>
>>>>>>>>>>>> [ ] + 1 approve
>>>>>>>>>>>> [ ] + 0 no opinion
>>>>>>>>>>>> [ ] - 1 disapprove with the reason
>>>>>>>>>>>>
>>>>>>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>>>>>> community are encouraged to check the AIP and vote with
>>>>>>>>>>>> "(non-binding)".
>>>>>>>>>>>>
>>>>>>>>>>>> ----
>>>>>>>>>>>>
>>>>>>>>>>>> Just a summary of where we are:
>>>>>>>>>>>>
>>>>>>>>>>>> It's been long in the making, but I think it might be a great
>>>>>>>>>>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>>>>>>>>>>> have is quite well thought out and discussed a lot in the past:
>>>>>>>>>>>>
>>>>>>>>>>>> * we have a working POC for implementation used for performance
>>>>>>>>>>>> testing:  https://github.com/apache/airflow/pull/25094
>>>>>>>>>>>> * it is based on on industry-standard open-source gRPC (which
>>>>>>>>>>>> is already our dependency) which fits better the RPC "model" we need than
>>>>>>>>>>>> our public REST API.
>>>>>>>>>>>> * it has moderate performance impact and rather good
>>>>>>>>>>>> maintainability features (very little impact on regular development effort)
>>>>>>>>>>>> * it is fully backwards compatible - the new DB isolation will
>>>>>>>>>>>> be an optional feature
>>>>>>>>>>>> * has a solid plan for full test coverage in our CI
>>>>>>>>>>>> * has a backing and plans for more extensive complete testing
>>>>>>>>>>>> in "real" environment with Google Composer team support
>>>>>>>>>>>> * allows for further extensions as part of AIP-1 (I am planning
>>>>>>>>>>>> to re-establish sig-multitenancy effort for follow up AIPs once this one is
>>>>>>>>>>>> well in progress).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> J.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>
> --
> Eugene
>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Eugen Kosteev <eu...@kosteev.com>.
+1 (non-binding)

On Thu, Aug 11, 2022 at 12:08 AM Ash Berlin-Taylor <as...@apache.org> wrote:

> So my concerns (in addition to the ones Andrew already pointed out) with
> gRPC: it uses threads! Threads + python always makes me nervous. Doubly so
> when you then couple that with DB access via sqlalchemy - we're heading
> down paths less well travelled if we do this.
>
> From https://github.com/grpc/grpc/blob/master/doc/fork_support.md
>
> > gRPC Python wraps gRPC core, which uses multithreading for performance,
> and hence doesn't support fork(). Historically, we didn't support forking
> in gRPC, but some users seemed to be doing fine until their code started to
> break on version 1.6. This was likely caused by the addition of background
> c-threads and a background Python thread
>
> And there's https://github.com/grpc/grpc/issues/16001 which may or may
> not be a problem for us, I'm unclear:
>
> > The main constraint that must be satisified is that your process must
> only invoke the fork() syscall *before* you create your gRPC server(s).
>
>
> But anyway, the only change I'd like to see is to make the internal API
> more pluggable.
>
> So instead of something like this:
>
>     def process_file(
>         self,
>         file_path: str,
>         callback_requests: List[CallbackRequest],
>         pickle_dags: bool = False,
>     ) -> Tuple[int, int]:
>         if self.use_grpc:
>             return self.process_file_grpc(
>                 file_path=file_path, callback_requests=callback_requests,
> pickle_dags=pickle_dags
>             )
>         return self.process_file_db(
>             file_path=file_path, callback_requests=callback_requests,
> pickle_dags=pickle_dags
>         )
>
> I'd very much like us to have it be something like this:
>
>     def process_file(
>         self,
>         file_path: str,
>         callback_requests: List[CallbackRequest],
>         pickle_dags: bool = False,
>     ) -> Tuple[int, int]:
>         if settings.DATABASE_ACCESS_ISOLATION:
>             return InternalAPIClient.dagbag_process_file(
>                 file_path=file_path, callback_requests=callback_requests,
> pickle_dags=pickle_dags
>             )
>         return self.process_file_db(
>             file_path=file_path, callback_requests=callback_requests,
> pickle_dags=pickle_dags
>         )
>
> i.e. all API access is "marshalled" (perhaps the wrong word) via a single
> pluggable (and eventually configurable/replaceable) API client.
> Additionally (though less important) `self.use_grpc` is changed to a
> property on the `airflow.settings` module (as it is a global setting, not
> an attribute/property of any single instance.)
>
> (In my mind the API client would include the from_protobuff/to_protobuff
> methods that you added to TaskInstance on your POC PR. Or the from_protbuff
> could live in the FileProcessorServiceServicer class in your example, but
> that probably doesn't scale when multiple services would take a
> TI/SimpleTI. I guess they could still also live on TaskInstance et al and
> just not be used - but that doesn't feel as clean to me is all)
>
> Thoughts? It's not too big change to encapsulate things like this I hope?
>
> Sorry again that we didn't look at the recent work on this AIP sooner.
>
> -ash
>
>
>
> On Wed, Aug 10 2022 at 13:51:02 -06:00:00, Andrew Godwin
> <an...@astronomer.io.INVALID> wrote:
>
> I also wish we'd had this discussion before!
>
> I've converted several million lines of code into API-driven services over
> the past decade so I have my ways and I'm set in them, I guess :)
>
> Notice that I didn't say "use REST" - I don't think REST maps well to RPC
> style codebases, and agree the whole method thing is confusing. I just mean
> "ship JSON in a POST and receive JSON in the response".
>
> As you said before though, the line where you draw the abstraction is what
> matters more than the transport layer, and "fat endpoints" (doing
> transactions and multiple calls on the API server etc.) is, I agree, the
> way this has to go, so it's not like this is something I think is totally
> wrong, just that I've repeatedly tried gRPC on projects like this and been
> disappointed with what it actually takes to ship changes and
> prototype things quickly. Nothing beats the ability to just throw curl at a
> problem - or maybe that's just me.
>
> Anyway, I'll leave you to it - I have my own ideas in this area I'll be
> noodling on, but it's a bit of a different take and aimed more at execution
> as a whole, so I'll come back and discuss them if they're successful.
>
> Andrew
>
> On Wed, Aug 10, 2022 at 1:36 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> 1st of all - I wish we had this discussion before :)
>>
>> > The autogenerated code also is available from OpenAPI specs, of course,
>> and the request/response semantics in the same thread are precisely what
>> make load balancing these calls a little harder, and horizontal scaling to
>> multiple threads comes with every HTTP server, but I digress - clearly
>> you've made a choice here to write an RPC layer rather than a lightweight
>> API layer, and go with the more RPC-style features, and I get that.
>>
>> OpenAPI is REST not RPC and it does not map well to RPC style calls. I
>> tried to explain in detail in the AIP (and in earlier discussions). And the
>> choice is basically made for us because of our expectation to have
>> minimal impact on the existing code (and ability to switch off the remote
>> part). We really do not want to introduce new API calls.  We want to make
>> sure that existing "DB transactions" (i.e. coarse grained calls) are
>> remotely executed. So we are not really talking about lightweight API
>> almost by definition. Indeed, Open API also maps a definition described in
>> a declarative way to python code.  but it has this non-nice part that
>> OpenAPI/REST, it is supposed to be run on some resources. We have no
>> resources to run it on - every single method we call is doing something.
>> Even from the REST/OpenAPI semantics I'd have a really hard time to decide
>> whether it should be GET, POST or PUT or DELETE. In most cases this will be
>> a combination of those 4 on several different resources. All the "nice
>> parts" of Open API (Swagger UI etc.) become next to useless if you try to
>> map such remote procedures we have, to REST calls.
>>
>>
>> > I still think it's choosing something that's more complex to maintain
>> over a simpler, more accessible option, but since I won't be the one
>> writing it, that's not really for me to worry about. I am very curious to
>> see how this evolves as the work towards multitenancy really kicks in and
>> all the API schemas need to change to add namespacing!
>>
>> The gRPC (proto) is designed from ground-up with maintainability in mind.
>> The ability to evolve the API, add new parameters etc. is built-in the
>> protobuf definition. From the user perspective it's actually easier to use
>> than OpenAPI when it comes to remote method calls, because you basically -
>> call a method.
>>
>> And also coming back to monitoring - literally every monitoring platform
>> supports gRPC to monitor. Grafana, NewRelic, CloudWatch, Datadog, you name
>> it. In our case.
>>
>> Also load-balancing:
>>
>> Regardless of the choice we talk about HTTP request/response happening
>> for 1 call. This is the boundary. Each of the calls we have will be a
>> separate transaction, separate call, not connected to any other call. The
>> server handling it will be stateless (state will be stored in a DB when
>> each call completes). I deliberately put the "boundary" of each of
>> the remotely callable methods, to be a complete DB transaction to achieve
>> it.
>>
>> So It really does not change whether we use gRPC or REST/JSON. REST/JSON
>> vs. gRPC is just the content of the message, but this is the very same HTTP
>> call, with same authentication added on top, same headers - just how the
>> message is encoded is different. The same tools for load balancing works in
>> the same way regardless if we use gRPC or REST/JSON. This is really a
>> higher layer than the one involved in load balancing.
>>
>>
>>
>>
>>
>> On Wed, Aug 10, 2022 at 9:10 PM Andrew Godwin
>> <an...@astronomer.io.invalid> wrote:
>>
>>> Well, binary representation serialization performance is worse for
>>> protobuf than for JSON in my experience (in Python specifically) - unless
>>> you mean size-on-the-wire, which I can agree with but tend to not worry
>>> about very much since it's rarely a bottleneck.
>>>
>>> The autogenerated code also is available from OpenAPI specs, of course,
>>> and the request/response semantics in the same thread are precisely what
>>> make load balancing these calls a little harder, and horizontal scaling to
>>> multiple threads comes with every HTTP server, but I digress - clearly
>>> you've made a choice here to write an RPC layer rather than a lightweight
>>> API layer, and go with the more RPC-style features, and I get that.
>>>
>>> I still think it's choosing something that's more complex to maintain
>>> over a simpler, more accessible option, but since I won't be the one
>>> writing it, that's not really for me to worry about. I am very curious to
>>> see how this evolves as the work towards multitenancy really kicks in and
>>> all the API schemas need to change to add namespacing!
>>>
>>> Andrew
>>>
>>> On Wed, Aug 10, 2022 at 12:52 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>>> Sure I can explain - the main reasons are:
>>>>
>>>> - 1) binary representation performance - impact of this is
>>>> rather limited because our API calls are doing rather heavy processing
>>>> compared to the data being transmitted. But I believe it's not negligible.
>>>> - 2) automated tools to automatically generate strongly typed Python
>>>> code (that's the ProtoBuf part). The strongly typed Python code is what
>>>> convinced me (see my POC). The tooling we got for that is excellent. Far
>>>> more superior than dealing with json-encoded data even with schema.
>>>> - 2) built-in "remote procedure" layer - where we have request/response
>>>> semantics optimisations (for multiple calls over the same chanel) and
>>>> exception handling done for us (This is basically what "Remote Procedure"
>>>> interface provide us)
>>>> - 3) built-in server that efficiently distributes the method called
>>>> from multiple client into a multi-threaded/multi-threaded execution (all
>>>> individual calls are stateless so multi-processing can be added on top
>>>> regardless from the "transport" chosen).
>>>>
>>>> BTW. If you look at my POC code, there is nothing that strongly "binds"
>>>> us to gRPC. The nice thing is that once it is implemented, it can be
>>>> swapped out very easily. The only Proto/gRPC code that "leaks" to "generic"
>>>> Airflow code is mapping of some (most complex) parameters to Proto . And
>>>> this is only for most complex cases - literally only few of our types
>>>> require custom serialisation - most of the mapping is handled automatically
>>>> by generated protobuf code. And we can easily put the "complex" mapping in
>>>> a separate package. Plus there is an "if" statement for each of the ~ 30 or
>>>> so methods that we will have to turn into remotely-callable. We can even
>>>> (as I proposed it as an option) add a little python magic and add a simple
>>>> decorator to handle the "ifs". Then the decorator "if" can be swapped with
>>>> some other "remote call" implementation.
>>>>
>>>> The actual bulk of the implementation is to make sure that all the
>>>> places are covered (that's the testing harness).
>>>>
>>>>
>>>>
>>>> On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin
>>>> <an...@astronomer.io.invalid> wrote:
>>>>
>>>>> Hi Jarek,
>>>>>
>>>>> Apologies - I was as not involved as I wanted to be in the AIP-44
>>>>> process, and obviously my vote is non-binding anyway - but having done a
>>>>> lot of Python API development over the past decade or so I wanted to know
>>>>> why the decision was made to go with gRPC over just plain HTTP+JSON (with a
>>>>> schema, of course).
>>>>>
>>>>> The AIP covers why XMLRPC and Thrift lost out to gRPC, which I agree
>>>>> with - but does not go into the option of using a standard Python HTTP
>>>>> server with JSON schema enforcement, such as FastAPI. In my experience, the
>>>>> tools for load balancing, debugging, testing and monitoring JSON/HTTP are
>>>>> superior and more numerous than those for gRPC, and in addition the
>>>>> asynchronous support for gRPC servers is lacking compared to their plain
>>>>> HTTP counterparts, and the fact that you can interact and play with the
>>>>> APIs in prototyping stages without having to handle obtaining correct
>>>>> protobuf versions for the Airflow version you're using.
>>>>>
>>>>> I wouldn't go so far as to suggest a veto, but I do want to see the
>>>>> AIP address why gRPC would win over this option. Apologies again for the
>>>>> late realisation that gRPC got chosen and was being voted on - it's been a
>>>>> very busy summer.
>>>>>
>>>>> Thanks,
>>>>> Andrew
>>>>>
>>>>> On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <ja...@potiuk.com>
>>>>> wrote:
>>>>>
>>>>>> Just let me express my rather strong dissatisfaction with the way
>>>>>> this "last minute" is raised.
>>>>>>
>>>>>> It is very late to come up with such a statement - not that it comes
>>>>>> at all, but when it comes when everyone had a chance to take a look and
>>>>>> comment, including taking a look at the POC and result of checks. This has
>>>>>> never been raised even 4 months ago where the only choices were Thrift and
>>>>>> gRPc).
>>>>>>
>>>>>> I REALLY hope the arguments are very strong and backed by real
>>>>>> examples and data why it is a bad choice rather than opinions.
>>>>>>
>>>>>> J,.
>>>>>>
>>>>>> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <as...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Sorry to weigh in at the last minute, but I'm wary of gRPC over just
>>>>>>> JSON, so -1 to that specific choice. Everything else I'm happy with.
>>>>>>>
>>>>>>> I (or Andrew G) will follow up with more details shortly.
>>>>>>>
>>>>>>> -ash
>>>>>>>
>>>>>>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <
>>>>>>> jarek@potiuk.com> wrote:
>>>>>>>
>>>>>>> Oh yeah :)
>>>>>>>
>>>>>>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pi...@umich.edu> wrote:
>>>>>>>
>>>>>>>> ah, good call.
>>>>>>>>
>>>>>>>> I guess the email template can be updated:
>>>>>>>>
>>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>>> community are encouraged to check the AIP and vote with "(non-binding)".
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> +1 (binding)
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Ping
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's
>>>>>>>>> votes are binding. See
>>>>>>>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>>>>>>>>> :D
>>>>>>>>>
>>>>>>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> +1 (non-binding)
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Ping
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>
>>>>>>>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>>>>>>>>>
>>>>>>>>>>> The AIP-44 is here:
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>>>>>>>>
>>>>>>>>>>> Discussion thread:
>>>>>>>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>>>>>>>>
>>>>>>>>>>> The voting will last for 5 days (until 9th of August 2022 11:00
>>>>>>>>>>> CEST), and until at least 3 binding votes have been cast.
>>>>>>>>>>>
>>>>>>>>>>> Please vote accordingly:
>>>>>>>>>>>
>>>>>>>>>>> [ ] + 1 approve
>>>>>>>>>>> [ ] + 0 no opinion
>>>>>>>>>>> [ ] - 1 disapprove with the reason
>>>>>>>>>>>
>>>>>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>>>>> community are encouraged to check the AIP and vote with
>>>>>>>>>>> "(non-binding)".
>>>>>>>>>>>
>>>>>>>>>>> ----
>>>>>>>>>>>
>>>>>>>>>>> Just a summary of where we are:
>>>>>>>>>>>
>>>>>>>>>>> It's been long in the making, but I think it might be a great
>>>>>>>>>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>>>>>>>>>> have is quite well thought out and discussed a lot in the past:
>>>>>>>>>>>
>>>>>>>>>>> * we have a working POC for implementation used for performance
>>>>>>>>>>> testing:  https://github.com/apache/airflow/pull/25094
>>>>>>>>>>> * it is based on on industry-standard open-source gRPC (which is
>>>>>>>>>>> already our dependency) which fits better the RPC "model" we need than our
>>>>>>>>>>> public REST API.
>>>>>>>>>>> * it has moderate performance impact and rather good
>>>>>>>>>>> maintainability features (very little impact on regular development effort)
>>>>>>>>>>> * it is fully backwards compatible - the new DB isolation will
>>>>>>>>>>> be an optional feature
>>>>>>>>>>> * has a solid plan for full test coverage in our CI
>>>>>>>>>>> * has a backing and plans for more extensive complete testing in
>>>>>>>>>>> "real" environment with Google Composer team support
>>>>>>>>>>> * allows for further extensions as part of AIP-1 (I am planning
>>>>>>>>>>> to re-establish sig-multitenancy effort for follow up AIPs once this one is
>>>>>>>>>>> well in progress).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> J.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>

-- 
Eugene

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Ash Berlin-Taylor <as...@apache.org>.
So my concerns (in addition to the ones Andrew already pointed out) 
with gRPC: it uses threads! Threads + python always makes me nervous. 
Doubly so when you then couple that with DB access via sqlalchemy - 
we're heading down paths less well travelled if we do this.

 From <https://github.com/grpc/grpc/blob/master/doc/fork_support.md>

 > gRPC Python wraps gRPC core, which uses multithreading for 
performance, and hence doesn't support fork(). Historically, we didn't 
support forking in gRPC, but some users seemed to be doing fine until 
their code started to break on version 1.6. This was likely caused by 
the addition of background c-threads and a background Python thread

And there's <https://github.com/grpc/grpc/issues/16001> which may or 
may not be a problem for us, I'm unclear:

 > The main constraint that must be satisified is that your process 
must only invoke the fork() syscall /before/ you create your gRPC 
server(s).


But anyway, the only change I'd like to see is to make the internal API 
more pluggable.

So instead of something like this:

    def process_file(
        self,
        file_path: str,
        callback_requests: List[CallbackRequest],
        pickle_dags: bool = False,
    ) -> Tuple[int, int]:
        if self.use_grpc:
            return self.process_file_grpc(
                file_path=file_path, 
callback_requests=callback_requests, pickle_dags=pickle_dags
            )
        return self.process_file_db(
            file_path=file_path, callback_requests=callback_requests, 
pickle_dags=pickle_dags
        )

I'd very much like us to have it be something like this:

    def process_file(
        self,
        file_path: str,
        callback_requests: List[CallbackRequest],
        pickle_dags: bool = False,
    ) -> Tuple[int, int]:
        if settings.DATABASE_ACCESS_ISOLATION:
            return InternalAPIClient.dagbag_process_file(
                file_path=file_path, 
callback_requests=callback_requests, pickle_dags=pickle_dags
            )
        return self.process_file_db(
            file_path=file_path, callback_requests=callback_requests, 
pickle_dags=pickle_dags
        )

i.e. all API access is "marshalled" (perhaps the wrong word) via a 
single pluggable (and eventually configurable/replaceable) API client. 
Additionally (though less important) `self.use_grpc` is changed to a 
property on the `airflow.settings` module (as it is a global setting, 
not an attribute/property of any single instance.)

(In my mind the API client would include the 
from_protobuff/to_protobuff methods that you added to TaskInstance on 
your POC PR. Or the from_protbuff could live in the 
FileProcessorServiceServicer class in your example, but that probably 
doesn't scale when multiple services would take a TI/SimpleTI. I guess 
they could still also live on TaskInstance et al and just not be used - 
but that doesn't feel as clean to me is all)

Thoughts? It's not too big change to encapsulate things like this I 
hope?

Sorry again that we didn't look at the recent work on this AIP sooner.

-ash



On Wed, Aug 10 2022 at 13:51:02 -06:00:00, Andrew Godwin 
<an...@astronomer.io.INVALID> wrote:
> I also wish we'd had this discussion before!
> 
> I've converted several million lines of code into API-driven services 
> over the past decade so I have my ways and I'm set in them, I guess :)
> 
> Notice that I didn't say "use REST" - I don't think REST maps well to 
> RPC style codebases, and agree the whole method thing is confusing. I 
> just mean "ship JSON in a POST and receive JSON in the response".
> 
> As you said before though, the line where you draw the abstraction is 
> what matters more than the transport layer, and "fat endpoints" 
> (doing transactions and multiple calls on the API server etc.) is, I 
> agree, the way this has to go, so it's not like this is something I 
> think is totally wrong, just that I've repeatedly tried gRPC on 
> projects like this and been disappointed with what it actually takes 
> to ship changes and prototype things quickly. Nothing beats the 
> ability to just throw curl at a problem - or maybe that's just me.
> 
> Anyway, I'll leave you to it - I have my own ideas in this area I'll 
> be noodling on, but it's a bit of a different take and aimed more at 
> execution as a whole, so I'll come back and discuss them if they're 
> successful.
> 
> Andrew
> 
> On Wed, Aug 10, 2022 at 1:36 PM Jarek Potiuk <jarek@potiuk.com 
> <ma...@potiuk.com>> wrote:
>> 1st of all - I wish we had this discussion before :)
>> 
>> > The autogenerated code also is available from OpenAPI specs, of 
>> course, and the request/response semantics in the same thread are 
>> precisely what make load balancing these calls a little harder, and 
>> horizontal scaling to multiple threads comes with every HTTP server, 
>> but I digress - clearly you've made a choice here to write an RPC 
>> layer rather than a lightweight API layer, and go with the more 
>> RPC-style features, and I get that.
>> 
>> OpenAPI is REST not RPC and it does not map well to RPC style calls. 
>> I tried to explain in detail in the AIP (and in earlier 
>> discussions). And the choice is basically made for us because of our 
>> expectation to have minimal impact on the existing code (and ability 
>> to switch off the remote part). We really do not want to introduce 
>> new API calls.  We want to make sure that existing "DB transactions" 
>> (i.e. coarse grained calls) are remotely executed. So we are not 
>> really talking about lightweight API almost by definition. Indeed, 
>> Open API also maps a definition described in a declarative way to 
>> python code.  but it has this non-nice part that OpenAPI/REST, it is 
>> supposed to be run on some resources. We have no resources to run it 
>> on - every single method we call is doing something. Even from the 
>> REST/OpenAPI semantics I'd have a really hard time to decide whether 
>> it should be GET, POST or PUT or DELETE. In most cases this will be 
>> a combination of those 4 on several different resources. All the 
>> "nice parts" of Open API (Swagger UI etc.) become next to useless if 
>> you try to map such remote procedures we have, to REST calls.
>> 
>> 
>> > I still think it's choosing something that's more complex to 
>> maintain over a simpler, more accessible option, but since I won't 
>> be the one writing it, that's not really for me to worry about. I am 
>> very curious to see how this evolves as the work towards 
>> multitenancy really kicks in and all the API schemas need to change 
>> to add namespacing!
>> 
>> The gRPC (proto) is designed from ground-up with maintainability in 
>> mind. The ability to evolve the API, add new parameters etc. is 
>> built-in the protobuf definition. From the user perspective it's 
>> actually easier to use than OpenAPI when it comes to remote method 
>> calls, because you basically - call a method.
>> 
>> And also coming back to monitoring - literally every monitoring 
>> platform supports gRPC to monitor. Grafana, NewRelic, CloudWatch, 
>> Datadog, you name it. In our case.
>> 
>> Also load-balancing:
>> 
>> Regardless of the choice we talk about HTTP request/response 
>> happening for 1 call. This is the boundary. Each of the calls we 
>> have will be a separate transaction, separate call, not connected to 
>> any other call. The server handling it will be stateless (state will 
>> be stored in a DB when each call completes). I deliberately put the 
>> "boundary" of each of the remotely callable methods, to be a 
>> complete DB transaction to achieve it.
>> 
>> So It really does not change whether we use gRPC or REST/JSON. 
>> REST/JSON vs. gRPC is just the content of the message, but this is 
>> the very same HTTP call, with same authentication added on top, same 
>> headers - just how the message is encoded is different. The same 
>> tools for load balancing works in the same way regardless if we use 
>> gRPC or REST/JSON. This is really a higher layer than the one 
>> involved in load balancing.
>> 
>> 
>> 
>> 
>> 
>> On Wed, Aug 10, 2022 at 9:10 PM Andrew Godwin 
>> <an...@astronomer.io.invalid> wrote:
>>> Well, binary representation serialization performance is worse for 
>>> protobuf than for JSON in my experience (in Python specifically) - 
>>> unless you mean size-on-the-wire, which I can agree with but tend 
>>> to not worry about very much since it's rarely a bottleneck.
>>> 
>>> The autogenerated code also is available from OpenAPI specs, of 
>>> course, and the request/response semantics in the same thread are 
>>> precisely what make load balancing these calls a little harder, and 
>>> horizontal scaling to multiple threads comes with every HTTP 
>>> server, but I digress - clearly you've made a choice here to write 
>>> an RPC layer rather than a lightweight API layer, and go with the 
>>> more RPC-style features, and I get that.
>>> 
>>> I still think it's choosing something that's more complex to 
>>> maintain over a simpler, more accessible option, but since I won't 
>>> be the one writing it, that's not really for me to worry about. I 
>>> am very curious to see how this evolves as the work towards 
>>> multitenancy really kicks in and all the API schemas need to change 
>>> to add namespacing!
>>> 
>>> Andrew
>>> 
>>> On Wed, Aug 10, 2022 at 12:52 PM Jarek Potiuk <jarek@potiuk.com 
>>> <ma...@potiuk.com>> wrote:
>>>> Sure I can explain - the main reasons are:
>>>> 
>>>> - 1) binary representation performance - impact of this is rather 
>>>> limited because our API calls are doing rather heavy processing 
>>>> compared to the data being transmitted. But I believe it's not 
>>>> negligible.
>>>> - 2) automated tools to automatically generate strongly typed 
>>>> Python code (that's the ProtoBuf part). The strongly typed Python 
>>>> code is what convinced me (see my POC). The tooling we got for 
>>>> that is excellent. Far more superior than dealing with 
>>>> json-encoded data even with schema.
>>>> - 2) built-in "remote procedure" layer - where we have 
>>>> request/response semantics optimisations (for multiple calls over 
>>>> the same chanel) and exception handling done for us (This is 
>>>> basically what "Remote Procedure" interface provide us)
>>>> - 3) built-in server that efficiently distributes the method 
>>>> called from multiple client into a multi-threaded/multi-threaded 
>>>> execution (all individual calls are stateless so multi-processing 
>>>> can be added on top regardless from the "transport" chosen).
>>>> 
>>>> BTW. If you look at my POC code, there is nothing that strongly 
>>>> "binds" us to gRPC. The nice thing is that once it is implemented, 
>>>> it can be swapped out very easily. The only Proto/gRPC code that 
>>>> "leaks" to "generic" Airflow code is mapping of some (most 
>>>> complex) parameters to Proto . And this is only for most complex 
>>>> cases - literally only few of our types require custom 
>>>> serialisation - most of the mapping is handled automatically by 
>>>> generated protobuf code. And we can easily put the "complex" 
>>>> mapping in a separate package. Plus there is an "if" statement for 
>>>> each of the ~ 30 or so methods that we will have to turn into 
>>>> remotely-callable. We can even (as I proposed it as an option) add 
>>>> a little python magic and add a simple decorator to handle the 
>>>> "ifs". Then the decorator "if" can be swapped with some other 
>>>> "remote call" implementation.
>>>> 
>>>> The actual bulk of the implementation is to make sure that all the 
>>>> places are covered (that's the testing harness).
>>>> 
>>>> 
>>>> 
>>>> On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin 
>>>> <an...@astronomer.io.invalid> wrote:
>>>>> Hi Jarek,
>>>>> 
>>>>> Apologies - I was as not involved as I wanted to be in the AIP-44 
>>>>> process, and obviously my vote is non-binding anyway - but having 
>>>>> done a lot of Python API development over the past decade or so I 
>>>>> wanted to know why the decision was made to go with gRPC over 
>>>>> just plain HTTP+JSON (with a schema, of course).
>>>>> 
>>>>> The AIP covers why XMLRPC and Thrift lost out to gRPC, which I 
>>>>> agree with - but does not go into the option of using a standard 
>>>>> Python HTTP server with JSON schema enforcement, such as FastAPI. 
>>>>> In my experience, the tools for load balancing, debugging, 
>>>>> testing and monitoring JSON/HTTP are superior and more numerous 
>>>>> than those for gRPC, and in addition the asynchronous support for 
>>>>> gRPC servers is lacking compared to their plain HTTP 
>>>>> counterparts, and the fact that you can interact and play with 
>>>>> the APIs in prototyping stages without having to handle obtaining 
>>>>> correct protobuf versions for the Airflow version you're using.
>>>>> 
>>>>> I wouldn't go so far as to suggest a veto, but I do want to see 
>>>>> the AIP address why gRPC would win over this option. Apologies 
>>>>> again for the late realisation that gRPC got chosen and was being 
>>>>> voted on - it's been a very busy summer.
>>>>> 
>>>>> Thanks,
>>>>> Andrew
>>>>> 
>>>>> On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <jarek@potiuk.com 
>>>>> <ma...@potiuk.com>> wrote:
>>>>>> Just let me express my rather strong dissatisfaction with the 
>>>>>> way this "last minute" is raised.
>>>>>> 
>>>>>> It is very late to come up with such a statement - not that it 
>>>>>> comes at all, but when it comes when everyone had a chance to 
>>>>>> take a look and comment, including taking a look at the POC and 
>>>>>> result of checks. This has never been raised even 4 months ago 
>>>>>> where the only choices were Thrift and gRPc).
>>>>>> 
>>>>>> I REALLY hope the arguments are very strong and backed by real 
>>>>>> examples and data why it is a bad choice rather than opinions.
>>>>>> 
>>>>>> J,.
>>>>>> 
>>>>>> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor 
>>>>>> <ash@apache.org <ma...@apache.org>> wrote:
>>>>>>> Sorry to weigh in at the last minute, but I'm wary of gRPC over 
>>>>>>> just JSON, so -1 to that specific choice. Everything else I'm 
>>>>>>> happy with.
>>>>>>> 
>>>>>>> I (or Andrew G) will follow up with more details shortly.
>>>>>>> 
>>>>>>> -ash
>>>>>>> 
>>>>>>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk 
>>>>>>> <jarek@potiuk.com <ma...@potiuk.com>> wrote:
>>>>>>>> Oh yeah :)
>>>>>>>> 
>>>>>>>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pingzh@umich.edu 
>>>>>>>> <ma...@umich.edu>> wrote:
>>>>>>>>> ah, good call.
>>>>>>>>> 
>>>>>>>>> I guess the email template can be updated:
>>>>>>>>> 
>>>>>>>>>> Only votes from PMC members are binding, but members of the 
>>>>>>>>>> community are encouraged to check the AIP and vote with 
>>>>>>>>>> "(non-binding)".
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> +1 (binding)
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> Ping
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk 
>>>>>>>>> <jarek@potiuk.com <ma...@potiuk.com>> wrote:
>>>>>>>>>> Thank you . And BTW. It's binding Ping :). For AIP's 
>>>>>>>>>> commiter's votes are binding. See 
>>>>>>>>>> <https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy> 
>>>>>>>>>> :D
>>>>>>>>>> 
>>>>>>>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pingzh@umich.edu 
>>>>>>>>>> <ma...@umich.edu>> wrote:
>>>>>>>>>>> +1 (non-binding)
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Ping
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk 
>>>>>>>>>>> <jarek@potiuk.com <ma...@potiuk.com>> wrote:
>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>> 
>>>>>>>>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal 
>>>>>>>>>>>> API".
>>>>>>>>>>>> 
>>>>>>>>>>>> The AIP-44 is here: 
>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API>
>>>>>>>>>>>> 
>>>>>>>>>>>> Discussion thread: 
>>>>>>>>>>>> <https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3>
>>>>>>>>>>>> 
>>>>>>>>>>>> The voting will last for 5 days (until 9th of August 2022 
>>>>>>>>>>>> 11:00 CEST), and until at least 3 binding votes have been 
>>>>>>>>>>>> cast.
>>>>>>>>>>>> 
>>>>>>>>>>>> Please vote accordingly:
>>>>>>>>>>>> 
>>>>>>>>>>>> [ ] + 1 approve
>>>>>>>>>>>> [ ] + 0 no opinion
>>>>>>>>>>>> [ ] - 1 disapprove with the reason
>>>>>>>>>>>> 
>>>>>>>>>>>> Only votes from PMC members are binding, but members of 
>>>>>>>>>>>> the community are encouraged to check the AIP and vote 
>>>>>>>>>>>> with "(non-binding)".
>>>>>>>>>>>> 
>>>>>>>>>>>> ----
>>>>>>>>>>>> 
>>>>>>>>>>>> Just a summary of where we are:
>>>>>>>>>>>> 
>>>>>>>>>>>> It's been long in the making, but I think it might be a 
>>>>>>>>>>>> great step-forward to our long-term multi-tenancy goal. I 
>>>>>>>>>>>> believe the proposal I have is quite well thought out and 
>>>>>>>>>>>> discussed a lot in the past:
>>>>>>>>>>>> 
>>>>>>>>>>>> * we have a working POC for implementation used for 
>>>>>>>>>>>> performance testing:  
>>>>>>>>>>>> <https://github.com/apache/airflow/pull/25094>
>>>>>>>>>>>> * it is based on on industry-standard open-source gRPC 
>>>>>>>>>>>> (which is already our dependency) which fits better the 
>>>>>>>>>>>> RPC "model" we need than our public REST API.
>>>>>>>>>>>> * it has moderate performance impact and rather good 
>>>>>>>>>>>> maintainability features (very little impact on regular 
>>>>>>>>>>>> development effort)
>>>>>>>>>>>> * it is fully backwards compatible - the new DB isolation 
>>>>>>>>>>>> will be an optional feature
>>>>>>>>>>>> * has a solid plan for full test coverage in our CI
>>>>>>>>>>>> * has a backing and plans for more extensive complete 
>>>>>>>>>>>> testing in "real" environment with Google Composer team 
>>>>>>>>>>>> support
>>>>>>>>>>>> * allows for further extensions as part of AIP-1 (I am 
>>>>>>>>>>>> planning to re-establish sig-multitenancy effort for 
>>>>>>>>>>>> follow up AIPs once this one is well in progress).
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> J.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 


Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Andrew Godwin <an...@astronomer.io.INVALID>.
I also wish we'd had this discussion before!

I've converted several million lines of code into API-driven services over
the past decade so I have my ways and I'm set in them, I guess :)

Notice that I didn't say "use REST" - I don't think REST maps well to RPC
style codebases, and agree the whole method thing is confusing. I just mean
"ship JSON in a POST and receive JSON in the response".

As you said before though, the line where you draw the abstraction is what
matters more than the transport layer, and "fat endpoints" (doing
transactions and multiple calls on the API server etc.) is, I agree, the
way this has to go, so it's not like this is something I think is totally
wrong, just that I've repeatedly tried gRPC on projects like this and been
disappointed with what it actually takes to ship changes and
prototype things quickly. Nothing beats the ability to just throw curl at a
problem - or maybe that's just me.

Anyway, I'll leave you to it - I have my own ideas in this area I'll be
noodling on, but it's a bit of a different take and aimed more at execution
as a whole, so I'll come back and discuss them if they're successful.

Andrew

On Wed, Aug 10, 2022 at 1:36 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> 1st of all - I wish we had this discussion before :)
>
> > The autogenerated code also is available from OpenAPI specs, of course,
> and the request/response semantics in the same thread are precisely what
> make load balancing these calls a little harder, and horizontal scaling to
> multiple threads comes with every HTTP server, but I digress - clearly
> you've made a choice here to write an RPC layer rather than a lightweight
> API layer, and go with the more RPC-style features, and I get that.
>
> OpenAPI is REST not RPC and it does not map well to RPC style calls. I
> tried to explain in detail in the AIP (and in earlier discussions). And the
> choice is basically made for us because of our expectation to have
> minimal impact on the existing code (and ability to switch off the remote
> part). We really do not want to introduce new API calls.  We want to make
> sure that existing "DB transactions" (i.e. coarse grained calls) are
> remotely executed. So we are not really talking about lightweight API
> almost by definition. Indeed, Open API also maps a definition described in
> a declarative way to python code.  but it has this non-nice part that
> OpenAPI/REST, it is supposed to be run on some resources. We have no
> resources to run it on - every single method we call is doing something.
> Even from the REST/OpenAPI semantics I'd have a really hard time to decide
> whether it should be GET, POST or PUT or DELETE. In most cases this will be
> a combination of those 4 on several different resources. All the "nice
> parts" of Open API (Swagger UI etc.) become next to useless if you try to
> map such remote procedures we have, to REST calls.
>
>
> > I still think it's choosing something that's more complex to maintain
> over a simpler, more accessible option, but since I won't be the one
> writing it, that's not really for me to worry about. I am very curious to
> see how this evolves as the work towards multitenancy really kicks in and
> all the API schemas need to change to add namespacing!
>
> The gRPC (proto) is designed from ground-up with maintainability in mind.
> The ability to evolve the API, add new parameters etc. is built-in the
> protobuf definition. From the user perspective it's actually easier to use
> than OpenAPI when it comes to remote method calls, because you basically -
> call a method.
>
> And also coming back to monitoring - literally every monitoring platform
> supports gRPC to monitor. Grafana, NewRelic, CloudWatch, Datadog, you name
> it. In our case.
>
> Also load-balancing:
>
> Regardless of the choice we talk about HTTP request/response happening for
> 1 call. This is the boundary. Each of the calls we have will be a separate
> transaction, separate call, not connected to any other call. The server
> handling it will be stateless (state will be stored in a DB when each call
> completes). I deliberately put the "boundary" of each of the remotely
> callable methods, to be a complete DB transaction to achieve it.
>
> So It really does not change whether we use gRPC or REST/JSON. REST/JSON
> vs. gRPC is just the content of the message, but this is the very same HTTP
> call, with same authentication added on top, same headers - just how the
> message is encoded is different. The same tools for load balancing works in
> the same way regardless if we use gRPC or REST/JSON. This is really a
> higher layer than the one involved in load balancing.
>
>
>
>
>
> On Wed, Aug 10, 2022 at 9:10 PM Andrew Godwin
> <an...@astronomer.io.invalid> wrote:
>
>> Well, binary representation serialization performance is worse for
>> protobuf than for JSON in my experience (in Python specifically) - unless
>> you mean size-on-the-wire, which I can agree with but tend to not worry
>> about very much since it's rarely a bottleneck.
>>
>> The autogenerated code also is available from OpenAPI specs, of course,
>> and the request/response semantics in the same thread are precisely what
>> make load balancing these calls a little harder, and horizontal scaling to
>> multiple threads comes with every HTTP server, but I digress - clearly
>> you've made a choice here to write an RPC layer rather than a lightweight
>> API layer, and go with the more RPC-style features, and I get that.
>>
>> I still think it's choosing something that's more complex to maintain
>> over a simpler, more accessible option, but since I won't be the one
>> writing it, that's not really for me to worry about. I am very curious to
>> see how this evolves as the work towards multitenancy really kicks in and
>> all the API schemas need to change to add namespacing!
>>
>> Andrew
>>
>> On Wed, Aug 10, 2022 at 12:52 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> Sure I can explain - the main reasons are:
>>>
>>> - 1) binary representation performance - impact of this is
>>> rather limited because our API calls are doing rather heavy processing
>>> compared to the data being transmitted. But I believe it's not negligible.
>>> - 2) automated tools to automatically generate strongly typed Python
>>> code (that's the ProtoBuf part). The strongly typed Python code is what
>>> convinced me (see my POC). The tooling we got for that is excellent. Far
>>> more superior than dealing with json-encoded data even with schema.
>>> - 2) built-in "remote procedure" layer - where we have request/response
>>> semantics optimisations (for multiple calls over the same chanel) and
>>> exception handling done for us (This is basically what "Remote Procedure"
>>> interface provide us)
>>> - 3) built-in server that efficiently distributes the method called from
>>> multiple client into a multi-threaded/multi-threaded execution (all
>>> individual calls are stateless so multi-processing can be added on top
>>> regardless from the "transport" chosen).
>>>
>>> BTW. If you look at my POC code, there is nothing that strongly "binds"
>>> us to gRPC. The nice thing is that once it is implemented, it can be
>>> swapped out very easily. The only Proto/gRPC code that "leaks" to "generic"
>>> Airflow code is mapping of some (most complex) parameters to Proto . And
>>> this is only for most complex cases - literally only few of our types
>>> require custom serialisation - most of the mapping is handled automatically
>>> by generated protobuf code. And we can easily put the "complex" mapping in
>>> a separate package. Plus there is an "if" statement for each of the ~ 30 or
>>> so methods that we will have to turn into remotely-callable. We can even
>>> (as I proposed it as an option) add a little python magic and add a simple
>>> decorator to handle the "ifs". Then the decorator "if" can be swapped with
>>> some other "remote call" implementation.
>>>
>>> The actual bulk of the implementation is to make sure that all the
>>> places are covered (that's the testing harness).
>>>
>>>
>>>
>>> On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin
>>> <an...@astronomer.io.invalid> wrote:
>>>
>>>> Hi Jarek,
>>>>
>>>> Apologies - I was as not involved as I wanted to be in the AIP-44
>>>> process, and obviously my vote is non-binding anyway - but having done a
>>>> lot of Python API development over the past decade or so I wanted to know
>>>> why the decision was made to go with gRPC over just plain HTTP+JSON (with a
>>>> schema, of course).
>>>>
>>>> The AIP covers why XMLRPC and Thrift lost out to gRPC, which I agree
>>>> with - but does not go into the option of using a standard Python HTTP
>>>> server with JSON schema enforcement, such as FastAPI. In my experience, the
>>>> tools for load balancing, debugging, testing and monitoring JSON/HTTP are
>>>> superior and more numerous than those for gRPC, and in addition the
>>>> asynchronous support for gRPC servers is lacking compared to their plain
>>>> HTTP counterparts, and the fact that you can interact and play with the
>>>> APIs in prototyping stages without having to handle obtaining correct
>>>> protobuf versions for the Airflow version you're using.
>>>>
>>>> I wouldn't go so far as to suggest a veto, but I do want to see the AIP
>>>> address why gRPC would win over this option. Apologies again for the late
>>>> realisation that gRPC got chosen and was being voted on - it's been a very
>>>> busy summer.
>>>>
>>>> Thanks,
>>>> Andrew
>>>>
>>>> On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>
>>>>> Just let me express my rather strong dissatisfaction with the way
>>>>> this "last minute" is raised.
>>>>>
>>>>> It is very late to come up with such a statement - not that it comes
>>>>> at all, but when it comes when everyone had a chance to take a look and
>>>>> comment, including taking a look at the POC and result of checks. This has
>>>>> never been raised even 4 months ago where the only choices were Thrift and
>>>>> gRPc).
>>>>>
>>>>> I REALLY hope the arguments are very strong and backed by real
>>>>> examples and data why it is a bad choice rather than opinions.
>>>>>
>>>>> J,.
>>>>>
>>>>> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <as...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Sorry to weigh in at the last minute, but I'm wary of gRPC over just
>>>>>> JSON, so -1 to that specific choice. Everything else I'm happy with.
>>>>>>
>>>>>> I (or Andrew G) will follow up with more details shortly.
>>>>>>
>>>>>> -ash
>>>>>>
>>>>>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <
>>>>>> jarek@potiuk.com> wrote:
>>>>>>
>>>>>> Oh yeah :)
>>>>>>
>>>>>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pi...@umich.edu> wrote:
>>>>>>
>>>>>>> ah, good call.
>>>>>>>
>>>>>>> I guess the email template can be updated:
>>>>>>>
>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>> community are encouraged to check the AIP and vote with "(non-binding)".
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> +1 (binding)
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Ping
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's
>>>>>>>> votes are binding. See
>>>>>>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>>>>>>>> :D
>>>>>>>>
>>>>>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1 (non-binding)
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Ping
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hey everyone,
>>>>>>>>>>
>>>>>>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>>>>>>>>
>>>>>>>>>> The AIP-44 is here:
>>>>>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>>>>>>>
>>>>>>>>>> Discussion thread:
>>>>>>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>>>>>>>
>>>>>>>>>> The voting will last for 5 days (until 9th of August 2022 11:00
>>>>>>>>>> CEST), and until at least 3 binding votes have been cast.
>>>>>>>>>>
>>>>>>>>>> Please vote accordingly:
>>>>>>>>>>
>>>>>>>>>> [ ] + 1 approve
>>>>>>>>>> [ ] + 0 no opinion
>>>>>>>>>> [ ] - 1 disapprove with the reason
>>>>>>>>>>
>>>>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>>>> community are encouraged to check the AIP and vote with
>>>>>>>>>> "(non-binding)".
>>>>>>>>>>
>>>>>>>>>> ----
>>>>>>>>>>
>>>>>>>>>> Just a summary of where we are:
>>>>>>>>>>
>>>>>>>>>> It's been long in the making, but I think it might be a great
>>>>>>>>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>>>>>>>>> have is quite well thought out and discussed a lot in the past:
>>>>>>>>>>
>>>>>>>>>> * we have a working POC for implementation used for performance
>>>>>>>>>> testing:  https://github.com/apache/airflow/pull/25094
>>>>>>>>>> * it is based on on industry-standard open-source gRPC (which is
>>>>>>>>>> already our dependency) which fits better the RPC "model" we need than our
>>>>>>>>>> public REST API.
>>>>>>>>>> * it has moderate performance impact and rather good
>>>>>>>>>> maintainability features (very little impact on regular development effort)
>>>>>>>>>> * it is fully backwards compatible - the new DB isolation will be
>>>>>>>>>> an optional feature
>>>>>>>>>> * has a solid plan for full test coverage in our CI
>>>>>>>>>> * has a backing and plans for more extensive complete testing in
>>>>>>>>>> "real" environment with Google Composer team support
>>>>>>>>>> * allows for further extensions as part of AIP-1 (I am planning
>>>>>>>>>> to re-establish sig-multitenancy effort for follow up AIPs once this one is
>>>>>>>>>> well in progress).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> J.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Jarek Potiuk <ja...@potiuk.com>.
1st of all - I wish we had this discussion before :)

> The autogenerated code also is available from OpenAPI specs, of course,
and the request/response semantics in the same thread are precisely what
make load balancing these calls a little harder, and horizontal scaling to
multiple threads comes with every HTTP server, but I digress - clearly
you've made a choice here to write an RPC layer rather than a lightweight
API layer, and go with the more RPC-style features, and I get that.

OpenAPI is REST not RPC and it does not map well to RPC style calls. I
tried to explain in detail in the AIP (and in earlier discussions). And the
choice is basically made for us because of our expectation to have
minimal impact on the existing code (and ability to switch off the remote
part). We really do not want to introduce new API calls.  We want to make
sure that existing "DB transactions" (i.e. coarse grained calls) are
remotely executed. So we are not really talking about lightweight API
almost by definition. Indeed, Open API also maps a definition described in
a declarative way to python code.  but it has this non-nice part that
OpenAPI/REST, it is supposed to be run on some resources. We have no
resources to run it on - every single method we call is doing something.
Even from the REST/OpenAPI semantics I'd have a really hard time to decide
whether it should be GET, POST or PUT or DELETE. In most cases this will be
a combination of those 4 on several different resources. All the "nice
parts" of Open API (Swagger UI etc.) become next to useless if you try to
map such remote procedures we have, to REST calls.


> I still think it's choosing something that's more complex to maintain
over a simpler, more accessible option, but since I won't be the one
writing it, that's not really for me to worry about. I am very curious to
see how this evolves as the work towards multitenancy really kicks in and
all the API schemas need to change to add namespacing!

The gRPC (proto) is designed from ground-up with maintainability in mind.
The ability to evolve the API, add new parameters etc. is built-in the
protobuf definition. From the user perspective it's actually easier to use
than OpenAPI when it comes to remote method calls, because you basically -
call a method.

And also coming back to monitoring - literally every monitoring platform
supports gRPC to monitor. Grafana, NewRelic, CloudWatch, Datadog, you name
it. In our case.

Also load-balancing:

Regardless of the choice we talk about HTTP request/response happening for
1 call. This is the boundary. Each of the calls we have will be a separate
transaction, separate call, not connected to any other call. The server
handling it will be stateless (state will be stored in a DB when each call
completes). I deliberately put the "boundary" of each of the remotely
callable methods, to be a complete DB transaction to achieve it.

So It really does not change whether we use gRPC or REST/JSON. REST/JSON
vs. gRPC is just the content of the message, but this is the very same HTTP
call, with same authentication added on top, same headers - just how the
message is encoded is different. The same tools for load balancing works in
the same way regardless if we use gRPC or REST/JSON. This is really a
higher layer than the one involved in load balancing.





On Wed, Aug 10, 2022 at 9:10 PM Andrew Godwin
<an...@astronomer.io.invalid> wrote:

> Well, binary representation serialization performance is worse for
> protobuf than for JSON in my experience (in Python specifically) - unless
> you mean size-on-the-wire, which I can agree with but tend to not worry
> about very much since it's rarely a bottleneck.
>
> The autogenerated code also is available from OpenAPI specs, of course,
> and the request/response semantics in the same thread are precisely what
> make load balancing these calls a little harder, and horizontal scaling to
> multiple threads comes with every HTTP server, but I digress - clearly
> you've made a choice here to write an RPC layer rather than a lightweight
> API layer, and go with the more RPC-style features, and I get that.
>
> I still think it's choosing something that's more complex to maintain over
> a simpler, more accessible option, but since I won't be the one writing it,
> that's not really for me to worry about. I am very curious to see how this
> evolves as the work towards multitenancy really kicks in and all the API
> schemas need to change to add namespacing!
>
> Andrew
>
> On Wed, Aug 10, 2022 at 12:52 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Sure I can explain - the main reasons are:
>>
>> - 1) binary representation performance - impact of this is rather limited
>> because our API calls are doing rather heavy processing compared to the
>> data being transmitted. But I believe it's not negligible.
>> - 2) automated tools to automatically generate strongly typed Python code
>> (that's the ProtoBuf part). The strongly typed Python code is what
>> convinced me (see my POC). The tooling we got for that is excellent. Far
>> more superior than dealing with json-encoded data even with schema.
>> - 2) built-in "remote procedure" layer - where we have request/response
>> semantics optimisations (for multiple calls over the same chanel) and
>> exception handling done for us (This is basically what "Remote Procedure"
>> interface provide us)
>> - 3) built-in server that efficiently distributes the method called from
>> multiple client into a multi-threaded/multi-threaded execution (all
>> individual calls are stateless so multi-processing can be added on top
>> regardless from the "transport" chosen).
>>
>> BTW. If you look at my POC code, there is nothing that strongly "binds"
>> us to gRPC. The nice thing is that once it is implemented, it can be
>> swapped out very easily. The only Proto/gRPC code that "leaks" to "generic"
>> Airflow code is mapping of some (most complex) parameters to Proto . And
>> this is only for most complex cases - literally only few of our types
>> require custom serialisation - most of the mapping is handled automatically
>> by generated protobuf code. And we can easily put the "complex" mapping in
>> a separate package. Plus there is an "if" statement for each of the ~ 30 or
>> so methods that we will have to turn into remotely-callable. We can even
>> (as I proposed it as an option) add a little python magic and add a simple
>> decorator to handle the "ifs". Then the decorator "if" can be swapped with
>> some other "remote call" implementation.
>>
>> The actual bulk of the implementation is to make sure that all the places
>> are covered (that's the testing harness).
>>
>>
>>
>> On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin
>> <an...@astronomer.io.invalid> wrote:
>>
>>> Hi Jarek,
>>>
>>> Apologies - I was as not involved as I wanted to be in the AIP-44
>>> process, and obviously my vote is non-binding anyway - but having done a
>>> lot of Python API development over the past decade or so I wanted to know
>>> why the decision was made to go with gRPC over just plain HTTP+JSON (with a
>>> schema, of course).
>>>
>>> The AIP covers why XMLRPC and Thrift lost out to gRPC, which I agree
>>> with - but does not go into the option of using a standard Python HTTP
>>> server with JSON schema enforcement, such as FastAPI. In my experience, the
>>> tools for load balancing, debugging, testing and monitoring JSON/HTTP are
>>> superior and more numerous than those for gRPC, and in addition the
>>> asynchronous support for gRPC servers is lacking compared to their plain
>>> HTTP counterparts, and the fact that you can interact and play with the
>>> APIs in prototyping stages without having to handle obtaining correct
>>> protobuf versions for the Airflow version you're using.
>>>
>>> I wouldn't go so far as to suggest a veto, but I do want to see the AIP
>>> address why gRPC would win over this option. Apologies again for the late
>>> realisation that gRPC got chosen and was being voted on - it's been a very
>>> busy summer.
>>>
>>> Thanks,
>>> Andrew
>>>
>>> On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>>> Just let me express my rather strong dissatisfaction with the way
>>>> this "last minute" is raised.
>>>>
>>>> It is very late to come up with such a statement - not that it comes at
>>>> all, but when it comes when everyone had a chance to take a look and
>>>> comment, including taking a look at the POC and result of checks. This has
>>>> never been raised even 4 months ago where the only choices were Thrift and
>>>> gRPc).
>>>>
>>>> I REALLY hope the arguments are very strong and backed by real examples
>>>> and data why it is a bad choice rather than opinions.
>>>>
>>>> J,.
>>>>
>>>> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <as...@apache.org>
>>>> wrote:
>>>>
>>>>> Sorry to weigh in at the last minute, but I'm wary of gRPC over just
>>>>> JSON, so -1 to that specific choice. Everything else I'm happy with.
>>>>>
>>>>> I (or Andrew G) will follow up with more details shortly.
>>>>>
>>>>> -ash
>>>>>
>>>>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <
>>>>> jarek@potiuk.com> wrote:
>>>>>
>>>>> Oh yeah :)
>>>>>
>>>>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pi...@umich.edu> wrote:
>>>>>
>>>>>> ah, good call.
>>>>>>
>>>>>> I guess the email template can be updated:
>>>>>>
>>>>>> Only votes from PMC members are binding, but members of the community
>>>>>>> are encouraged to check the AIP and vote with "(non-binding)".
>>>>>>
>>>>>>
>>>>>>
>>>>>> +1 (binding)
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Ping
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's
>>>>>>> votes are binding. See
>>>>>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>>>>>>> :D
>>>>>>>
>>>>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu> wrote:
>>>>>>>
>>>>>>>> +1 (non-binding)
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Ping
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hey everyone,
>>>>>>>>>
>>>>>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>>>>>>>
>>>>>>>>> The AIP-44 is here:
>>>>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>>>>>>
>>>>>>>>> Discussion thread:
>>>>>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>>>>>>
>>>>>>>>> The voting will last for 5 days (until 9th of August 2022 11:00
>>>>>>>>> CEST), and until at least 3 binding votes have been cast.
>>>>>>>>>
>>>>>>>>> Please vote accordingly:
>>>>>>>>>
>>>>>>>>> [ ] + 1 approve
>>>>>>>>> [ ] + 0 no opinion
>>>>>>>>> [ ] - 1 disapprove with the reason
>>>>>>>>>
>>>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>>> community are encouraged to check the AIP and vote with
>>>>>>>>> "(non-binding)".
>>>>>>>>>
>>>>>>>>> ----
>>>>>>>>>
>>>>>>>>> Just a summary of where we are:
>>>>>>>>>
>>>>>>>>> It's been long in the making, but I think it might be a great
>>>>>>>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>>>>>>>> have is quite well thought out and discussed a lot in the past:
>>>>>>>>>
>>>>>>>>> * we have a working POC for implementation used for performance
>>>>>>>>> testing:  https://github.com/apache/airflow/pull/25094
>>>>>>>>> * it is based on on industry-standard open-source gRPC (which is
>>>>>>>>> already our dependency) which fits better the RPC "model" we need than our
>>>>>>>>> public REST API.
>>>>>>>>> * it has moderate performance impact and rather good
>>>>>>>>> maintainability features (very little impact on regular development effort)
>>>>>>>>> * it is fully backwards compatible - the new DB isolation will be
>>>>>>>>> an optional feature
>>>>>>>>> * has a solid plan for full test coverage in our CI
>>>>>>>>> * has a backing and plans for more extensive complete testing in
>>>>>>>>> "real" environment with Google Composer team support
>>>>>>>>> * allows for further extensions as part of AIP-1 (I am planning to
>>>>>>>>> re-establish sig-multitenancy effort for follow up AIPs once this one is
>>>>>>>>> well in progress).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> J.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Andrew Godwin <an...@astronomer.io.INVALID>.
Well, binary representation serialization performance is worse for protobuf
than for JSON in my experience (in Python specifically) - unless you mean
size-on-the-wire, which I can agree with but tend to not worry about very
much since it's rarely a bottleneck.

The autogenerated code also is available from OpenAPI specs, of course, and
the request/response semantics in the same thread are precisely what make
load balancing these calls a little harder, and horizontal scaling to
multiple threads comes with every HTTP server, but I digress - clearly
you've made a choice here to write an RPC layer rather than a lightweight
API layer, and go with the more RPC-style features, and I get that.

I still think it's choosing something that's more complex to maintain over
a simpler, more accessible option, but since I won't be the one writing it,
that's not really for me to worry about. I am very curious to see how this
evolves as the work towards multitenancy really kicks in and all the API
schemas need to change to add namespacing!

Andrew

On Wed, Aug 10, 2022 at 12:52 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Sure I can explain - the main reasons are:
>
> - 1) binary representation performance - impact of this is rather limited
> because our API calls are doing rather heavy processing compared to the
> data being transmitted. But I believe it's not negligible.
> - 2) automated tools to automatically generate strongly typed Python code
> (that's the ProtoBuf part). The strongly typed Python code is what
> convinced me (see my POC). The tooling we got for that is excellent. Far
> more superior than dealing with json-encoded data even with schema.
> - 2) built-in "remote procedure" layer - where we have request/response
> semantics optimisations (for multiple calls over the same chanel) and
> exception handling done for us (This is basically what "Remote Procedure"
> interface provide us)
> - 3) built-in server that efficiently distributes the method called from
> multiple client into a multi-threaded/multi-threaded execution (all
> individual calls are stateless so multi-processing can be added on top
> regardless from the "transport" chosen).
>
> BTW. If you look at my POC code, there is nothing that strongly "binds" us
> to gRPC. The nice thing is that once it is implemented, it can be swapped
> out very easily. The only Proto/gRPC code that "leaks" to "generic" Airflow
> code is mapping of some (most complex) parameters to Proto . And this is
> only for most complex cases - literally only few of our types require
> custom serialisation - most of the mapping is handled automatically by
> generated protobuf code. And we can easily put the "complex" mapping in a
> separate package. Plus there is an "if" statement for each of the ~ 30 or
> so methods that we will have to turn into remotely-callable. We can even
> (as I proposed it as an option) add a little python magic and add a simple
> decorator to handle the "ifs". Then the decorator "if" can be swapped with
> some other "remote call" implementation.
>
> The actual bulk of the implementation is to make sure that all the places
> are covered (that's the testing harness).
>
>
>
> On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin
> <an...@astronomer.io.invalid> wrote:
>
>> Hi Jarek,
>>
>> Apologies - I was as not involved as I wanted to be in the AIP-44
>> process, and obviously my vote is non-binding anyway - but having done a
>> lot of Python API development over the past decade or so I wanted to know
>> why the decision was made to go with gRPC over just plain HTTP+JSON (with a
>> schema, of course).
>>
>> The AIP covers why XMLRPC and Thrift lost out to gRPC, which I agree with
>> - but does not go into the option of using a standard Python HTTP server
>> with JSON schema enforcement, such as FastAPI. In my experience, the tools
>> for load balancing, debugging, testing and monitoring JSON/HTTP are
>> superior and more numerous than those for gRPC, and in addition the
>> asynchronous support for gRPC servers is lacking compared to their plain
>> HTTP counterparts, and the fact that you can interact and play with the
>> APIs in prototyping stages without having to handle obtaining correct
>> protobuf versions for the Airflow version you're using.
>>
>> I wouldn't go so far as to suggest a veto, but I do want to see the AIP
>> address why gRPC would win over this option. Apologies again for the late
>> realisation that gRPC got chosen and was being voted on - it's been a very
>> busy summer.
>>
>> Thanks,
>> Andrew
>>
>> On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> Just let me express my rather strong dissatisfaction with the way
>>> this "last minute" is raised.
>>>
>>> It is very late to come up with such a statement - not that it comes at
>>> all, but when it comes when everyone had a chance to take a look and
>>> comment, including taking a look at the POC and result of checks. This has
>>> never been raised even 4 months ago where the only choices were Thrift and
>>> gRPc).
>>>
>>> I REALLY hope the arguments are very strong and backed by real examples
>>> and data why it is a bad choice rather than opinions.
>>>
>>> J,.
>>>
>>> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <as...@apache.org>
>>> wrote:
>>>
>>>> Sorry to weigh in at the last minute, but I'm wary of gRPC over just
>>>> JSON, so -1 to that specific choice. Everything else I'm happy with.
>>>>
>>>> I (or Andrew G) will follow up with more details shortly.
>>>>
>>>> -ash
>>>>
>>>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <
>>>> jarek@potiuk.com> wrote:
>>>>
>>>> Oh yeah :)
>>>>
>>>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pi...@umich.edu> wrote:
>>>>
>>>>> ah, good call.
>>>>>
>>>>> I guess the email template can be updated:
>>>>>
>>>>> Only votes from PMC members are binding, but members of the community
>>>>>> are encouraged to check the AIP and vote with "(non-binding)".
>>>>>
>>>>>
>>>>>
>>>>> +1 (binding)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Ping
>>>>>
>>>>>
>>>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com>
>>>>> wrote:
>>>>>
>>>>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's votes
>>>>>> are binding. See
>>>>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>>>>>> :D
>>>>>>
>>>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu> wrote:
>>>>>>
>>>>>>> +1 (non-binding)
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Ping
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hey everyone,
>>>>>>>>
>>>>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>>>>>>
>>>>>>>> The AIP-44 is here:
>>>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>>>>>
>>>>>>>> Discussion thread:
>>>>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>>>>>
>>>>>>>> The voting will last for 5 days (until 9th of August 2022 11:00
>>>>>>>> CEST), and until at least 3 binding votes have been cast.
>>>>>>>>
>>>>>>>> Please vote accordingly:
>>>>>>>>
>>>>>>>> [ ] + 1 approve
>>>>>>>> [ ] + 0 no opinion
>>>>>>>> [ ] - 1 disapprove with the reason
>>>>>>>>
>>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>> community are encouraged to check the AIP and vote with
>>>>>>>> "(non-binding)".
>>>>>>>>
>>>>>>>> ----
>>>>>>>>
>>>>>>>> Just a summary of where we are:
>>>>>>>>
>>>>>>>> It's been long in the making, but I think it might be a great
>>>>>>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>>>>>>> have is quite well thought out and discussed a lot in the past:
>>>>>>>>
>>>>>>>> * we have a working POC for implementation used for performance
>>>>>>>> testing:  https://github.com/apache/airflow/pull/25094
>>>>>>>> * it is based on on industry-standard open-source gRPC (which is
>>>>>>>> already our dependency) which fits better the RPC "model" we need than our
>>>>>>>> public REST API.
>>>>>>>> * it has moderate performance impact and rather good
>>>>>>>> maintainability features (very little impact on regular development effort)
>>>>>>>> * it is fully backwards compatible - the new DB isolation will be
>>>>>>>> an optional feature
>>>>>>>> * has a solid plan for full test coverage in our CI
>>>>>>>> * has a backing and plans for more extensive complete testing in
>>>>>>>> "real" environment with Google Composer team support
>>>>>>>> * allows for further extensions as part of AIP-1 (I am planning to
>>>>>>>> re-establish sig-multitenancy effort for follow up AIPs once this one is
>>>>>>>> well in progress).
>>>>>>>>
>>>>>>>>
>>>>>>>> J.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Xiaodong Deng <xd...@apache.org>.
Thanks, Jarek, for the clarification, and of course, for the very
nicely-done PoC!

+1 (binding) from me.


Regards,
XD

On Wed, Aug 10, 2022 at 8:52 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Sure I can explain - the main reasons are:
>
> - 1) binary representation performance - impact of this is rather limited
> because our API calls are doing rather heavy processing compared to the
> data being transmitted. But I believe it's not negligible.
> - 2) automated tools to automatically generate strongly typed Python code
> (that's the ProtoBuf part). The strongly typed Python code is what
> convinced me (see my POC). The tooling we got for that is excellent. Far
> more superior than dealing with json-encoded data even with schema.
> - 2) built-in "remote procedure" layer - where we have request/response
> semantics optimisations (for multiple calls over the same chanel) and
> exception handling done for us (This is basically what "Remote Procedure"
> interface provide us)
> - 3) built-in server that efficiently distributes the method called from
> multiple client into a multi-threaded/multi-threaded execution (all
> individual calls are stateless so multi-processing can be added on top
> regardless from the "transport" chosen).
>
> BTW. If you look at my POC code, there is nothing that strongly "binds" us
> to gRPC. The nice thing is that once it is implemented, it can be swapped
> out very easily. The only Proto/gRPC code that "leaks" to "generic" Airflow
> code is mapping of some (most complex) parameters to Proto . And this is
> only for most complex cases - literally only few of our types require
> custom serialisation - most of the mapping is handled automatically by
> generated protobuf code. And we can easily put the "complex" mapping in a
> separate package. Plus there is an "if" statement for each of the ~ 30 or
> so methods that we will have to turn into remotely-callable. We can even
> (as I proposed it as an option) add a little python magic and add a simple
> decorator to handle the "ifs". Then the decorator "if" can be swapped with
> some other "remote call" implementation.
>
> The actual bulk of the implementation is to make sure that all the places
> are covered (that's the testing harness).
>
>
>
> On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin
> <an...@astronomer.io.invalid> wrote:
>
>> Hi Jarek,
>>
>> Apologies - I was as not involved as I wanted to be in the AIP-44
>> process, and obviously my vote is non-binding anyway - but having done a
>> lot of Python API development over the past decade or so I wanted to know
>> why the decision was made to go with gRPC over just plain HTTP+JSON (with a
>> schema, of course).
>>
>> The AIP covers why XMLRPC and Thrift lost out to gRPC, which I agree with
>> - but does not go into the option of using a standard Python HTTP server
>> with JSON schema enforcement, such as FastAPI. In my experience, the tools
>> for load balancing, debugging, testing and monitoring JSON/HTTP are
>> superior and more numerous than those for gRPC, and in addition the
>> asynchronous support for gRPC servers is lacking compared to their plain
>> HTTP counterparts, and the fact that you can interact and play with the
>> APIs in prototyping stages without having to handle obtaining correct
>> protobuf versions for the Airflow version you're using.
>>
>> I wouldn't go so far as to suggest a veto, but I do want to see the AIP
>> address why gRPC would win over this option. Apologies again for the late
>> realisation that gRPC got chosen and was being voted on - it's been a very
>> busy summer.
>>
>> Thanks,
>> Andrew
>>
>> On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> Just let me express my rather strong dissatisfaction with the way
>>> this "last minute" is raised.
>>>
>>> It is very late to come up with such a statement - not that it comes at
>>> all, but when it comes when everyone had a chance to take a look and
>>> comment, including taking a look at the POC and result of checks. This has
>>> never been raised even 4 months ago where the only choices were Thrift and
>>> gRPc).
>>>
>>> I REALLY hope the arguments are very strong and backed by real examples
>>> and data why it is a bad choice rather than opinions.
>>>
>>> J,.
>>>
>>> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <as...@apache.org>
>>> wrote:
>>>
>>>> Sorry to weigh in at the last minute, but I'm wary of gRPC over just
>>>> JSON, so -1 to that specific choice. Everything else I'm happy with.
>>>>
>>>> I (or Andrew G) will follow up with more details shortly.
>>>>
>>>> -ash
>>>>
>>>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <
>>>> jarek@potiuk.com> wrote:
>>>>
>>>> Oh yeah :)
>>>>
>>>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pi...@umich.edu> wrote:
>>>>
>>>>> ah, good call.
>>>>>
>>>>> I guess the email template can be updated:
>>>>>
>>>>> Only votes from PMC members are binding, but members of the community
>>>>>> are encouraged to check the AIP and vote with "(non-binding)".
>>>>>
>>>>>
>>>>>
>>>>> +1 (binding)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Ping
>>>>>
>>>>>
>>>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com>
>>>>> wrote:
>>>>>
>>>>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's votes
>>>>>> are binding. See
>>>>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>>>>>> :D
>>>>>>
>>>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu> wrote:
>>>>>>
>>>>>>> +1 (non-binding)
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Ping
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hey everyone,
>>>>>>>>
>>>>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>>>>>>
>>>>>>>> The AIP-44 is here:
>>>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>>>>>
>>>>>>>> Discussion thread:
>>>>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>>>>>
>>>>>>>> The voting will last for 5 days (until 9th of August 2022 11:00
>>>>>>>> CEST), and until at least 3 binding votes have been cast.
>>>>>>>>
>>>>>>>> Please vote accordingly:
>>>>>>>>
>>>>>>>> [ ] + 1 approve
>>>>>>>> [ ] + 0 no opinion
>>>>>>>> [ ] - 1 disapprove with the reason
>>>>>>>>
>>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>> community are encouraged to check the AIP and vote with
>>>>>>>> "(non-binding)".
>>>>>>>>
>>>>>>>> ----
>>>>>>>>
>>>>>>>> Just a summary of where we are:
>>>>>>>>
>>>>>>>> It's been long in the making, but I think it might be a great
>>>>>>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>>>>>>> have is quite well thought out and discussed a lot in the past:
>>>>>>>>
>>>>>>>> * we have a working POC for implementation used for performance
>>>>>>>> testing:  https://github.com/apache/airflow/pull/25094
>>>>>>>> * it is based on on industry-standard open-source gRPC (which is
>>>>>>>> already our dependency) which fits better the RPC "model" we need than our
>>>>>>>> public REST API.
>>>>>>>> * it has moderate performance impact and rather good
>>>>>>>> maintainability features (very little impact on regular development effort)
>>>>>>>> * it is fully backwards compatible - the new DB isolation will be
>>>>>>>> an optional feature
>>>>>>>> * has a solid plan for full test coverage in our CI
>>>>>>>> * has a backing and plans for more extensive complete testing in
>>>>>>>> "real" environment with Google Composer team support
>>>>>>>> * allows for further extensions as part of AIP-1 (I am planning to
>>>>>>>> re-establish sig-multitenancy effort for follow up AIPs once this one is
>>>>>>>> well in progress).
>>>>>>>>
>>>>>>>>
>>>>>>>> J.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Hongyi Wang <wh...@gmail.com>.
+1 (non-binding) Really look forward to it! Howie

On Wed, Aug 10, 2022 at 11:52 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Sure I can explain - the main reasons are:
>
> - 1) binary representation performance - impact of this is rather limited
> because our API calls are doing rather heavy processing compared to the
> data being transmitted. But I believe it's not negligible.
> - 2) automated tools to automatically generate strongly typed Python code
> (that's the ProtoBuf part). The strongly typed Python code is what
> convinced me (see my POC). The tooling we got for that is excellent. Far
> more superior than dealing with json-encoded data even with schema.
> - 2) built-in "remote procedure" layer - where we have request/response
> semantics optimisations (for multiple calls over the same chanel) and
> exception handling done for us (This is basically what "Remote Procedure"
> interface provide us)
> - 3) built-in server that efficiently distributes the method called from
> multiple client into a multi-threaded/multi-threaded execution (all
> individual calls are stateless so multi-processing can be added on top
> regardless from the "transport" chosen).
>
> BTW. If you look at my POC code, there is nothing that strongly "binds" us
> to gRPC. The nice thing is that once it is implemented, it can be swapped
> out very easily. The only Proto/gRPC code that "leaks" to "generic" Airflow
> code is mapping of some (most complex) parameters to Proto . And this is
> only for most complex cases - literally only few of our types require
> custom serialisation - most of the mapping is handled automatically by
> generated protobuf code. And we can easily put the "complex" mapping in a
> separate package. Plus there is an "if" statement for each of the ~ 30 or
> so methods that we will have to turn into remotely-callable. We can even
> (as I proposed it as an option) add a little python magic and add a simple
> decorator to handle the "ifs". Then the decorator "if" can be swapped with
> some other "remote call" implementation.
>
> The actual bulk of the implementation is to make sure that all the places
> are covered (that's the testing harness).
>
>
>
> On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin
> <an...@astronomer.io.invalid> wrote:
>
>> Hi Jarek,
>>
>> Apologies - I was as not involved as I wanted to be in the AIP-44
>> process, and obviously my vote is non-binding anyway - but having done a
>> lot of Python API development over the past decade or so I wanted to know
>> why the decision was made to go with gRPC over just plain HTTP+JSON (with a
>> schema, of course).
>>
>> The AIP covers why XMLRPC and Thrift lost out to gRPC, which I agree with
>> - but does not go into the option of using a standard Python HTTP server
>> with JSON schema enforcement, such as FastAPI. In my experience, the tools
>> for load balancing, debugging, testing and monitoring JSON/HTTP are
>> superior and more numerous than those for gRPC, and in addition the
>> asynchronous support for gRPC servers is lacking compared to their plain
>> HTTP counterparts, and the fact that you can interact and play with the
>> APIs in prototyping stages without having to handle obtaining correct
>> protobuf versions for the Airflow version you're using.
>>
>> I wouldn't go so far as to suggest a veto, but I do want to see the AIP
>> address why gRPC would win over this option. Apologies again for the late
>> realisation that gRPC got chosen and was being voted on - it's been a very
>> busy summer.
>>
>> Thanks,
>> Andrew
>>
>> On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> Just let me express my rather strong dissatisfaction with the way
>>> this "last minute" is raised.
>>>
>>> It is very late to come up with such a statement - not that it comes at
>>> all, but when it comes when everyone had a chance to take a look and
>>> comment, including taking a look at the POC and result of checks. This has
>>> never been raised even 4 months ago where the only choices were Thrift and
>>> gRPc).
>>>
>>> I REALLY hope the arguments are very strong and backed by real examples
>>> and data why it is a bad choice rather than opinions.
>>>
>>> J,.
>>>
>>> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <as...@apache.org>
>>> wrote:
>>>
>>>> Sorry to weigh in at the last minute, but I'm wary of gRPC over just
>>>> JSON, so -1 to that specific choice. Everything else I'm happy with.
>>>>
>>>> I (or Andrew G) will follow up with more details shortly.
>>>>
>>>> -ash
>>>>
>>>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <
>>>> jarek@potiuk.com> wrote:
>>>>
>>>> Oh yeah :)
>>>>
>>>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pi...@umich.edu> wrote:
>>>>
>>>>> ah, good call.
>>>>>
>>>>> I guess the email template can be updated:
>>>>>
>>>>> Only votes from PMC members are binding, but members of the community
>>>>>> are encouraged to check the AIP and vote with "(non-binding)".
>>>>>
>>>>>
>>>>>
>>>>> +1 (binding)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Ping
>>>>>
>>>>>
>>>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com>
>>>>> wrote:
>>>>>
>>>>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's votes
>>>>>> are binding. See
>>>>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>>>>>> :D
>>>>>>
>>>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu> wrote:
>>>>>>
>>>>>>> +1 (non-binding)
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Ping
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hey everyone,
>>>>>>>>
>>>>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>>>>>>
>>>>>>>> The AIP-44 is here:
>>>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>>>>>
>>>>>>>> Discussion thread:
>>>>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>>>>>
>>>>>>>> The voting will last for 5 days (until 9th of August 2022 11:00
>>>>>>>> CEST), and until at least 3 binding votes have been cast.
>>>>>>>>
>>>>>>>> Please vote accordingly:
>>>>>>>>
>>>>>>>> [ ] + 1 approve
>>>>>>>> [ ] + 0 no opinion
>>>>>>>> [ ] - 1 disapprove with the reason
>>>>>>>>
>>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>> community are encouraged to check the AIP and vote with
>>>>>>>> "(non-binding)".
>>>>>>>>
>>>>>>>> ----
>>>>>>>>
>>>>>>>> Just a summary of where we are:
>>>>>>>>
>>>>>>>> It's been long in the making, but I think it might be a great
>>>>>>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>>>>>>> have is quite well thought out and discussed a lot in the past:
>>>>>>>>
>>>>>>>> * we have a working POC for implementation used for performance
>>>>>>>> testing:  https://github.com/apache/airflow/pull/25094
>>>>>>>> * it is based on on industry-standard open-source gRPC (which is
>>>>>>>> already our dependency) which fits better the RPC "model" we need than our
>>>>>>>> public REST API.
>>>>>>>> * it has moderate performance impact and rather good
>>>>>>>> maintainability features (very little impact on regular development effort)
>>>>>>>> * it is fully backwards compatible - the new DB isolation will be
>>>>>>>> an optional feature
>>>>>>>> * has a solid plan for full test coverage in our CI
>>>>>>>> * has a backing and plans for more extensive complete testing in
>>>>>>>> "real" environment with Google Composer team support
>>>>>>>> * allows for further extensions as part of AIP-1 (I am planning to
>>>>>>>> re-establish sig-multitenancy effort for follow up AIPs once this one is
>>>>>>>> well in progress).
>>>>>>>>
>>>>>>>>
>>>>>>>> J.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Jarek Potiuk <ja...@potiuk.com>.
Sure I can explain - the main reasons are:

- 1) binary representation performance - impact of this is rather limited
because our API calls are doing rather heavy processing compared to the
data being transmitted. But I believe it's not negligible.
- 2) automated tools to automatically generate strongly typed Python code
(that's the ProtoBuf part). The strongly typed Python code is what
convinced me (see my POC). The tooling we got for that is excellent. Far
more superior than dealing with json-encoded data even with schema.
- 2) built-in "remote procedure" layer - where we have request/response
semantics optimisations (for multiple calls over the same chanel) and
exception handling done for us (This is basically what "Remote Procedure"
interface provide us)
- 3) built-in server that efficiently distributes the method called from
multiple client into a multi-threaded/multi-threaded execution (all
individual calls are stateless so multi-processing can be added on top
regardless from the "transport" chosen).

BTW. If you look at my POC code, there is nothing that strongly "binds" us
to gRPC. The nice thing is that once it is implemented, it can be swapped
out very easily. The only Proto/gRPC code that "leaks" to "generic" Airflow
code is mapping of some (most complex) parameters to Proto . And this is
only for most complex cases - literally only few of our types require
custom serialisation - most of the mapping is handled automatically by
generated protobuf code. And we can easily put the "complex" mapping in a
separate package. Plus there is an "if" statement for each of the ~ 30 or
so methods that we will have to turn into remotely-callable. We can even
(as I proposed it as an option) add a little python magic and add a simple
decorator to handle the "ifs". Then the decorator "if" can be swapped with
some other "remote call" implementation.

The actual bulk of the implementation is to make sure that all the places
are covered (that's the testing harness).



On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin
<an...@astronomer.io.invalid> wrote:

> Hi Jarek,
>
> Apologies - I was as not involved as I wanted to be in the AIP-44 process,
> and obviously my vote is non-binding anyway - but having done a lot of
> Python API development over the past decade or so I wanted to know why the
> decision was made to go with gRPC over just plain HTTP+JSON (with a schema,
> of course).
>
> The AIP covers why XMLRPC and Thrift lost out to gRPC, which I agree with
> - but does not go into the option of using a standard Python HTTP server
> with JSON schema enforcement, such as FastAPI. In my experience, the tools
> for load balancing, debugging, testing and monitoring JSON/HTTP are
> superior and more numerous than those for gRPC, and in addition the
> asynchronous support for gRPC servers is lacking compared to their plain
> HTTP counterparts, and the fact that you can interact and play with the
> APIs in prototyping stages without having to handle obtaining correct
> protobuf versions for the Airflow version you're using.
>
> I wouldn't go so far as to suggest a veto, but I do want to see the AIP
> address why gRPC would win over this option. Apologies again for the late
> realisation that gRPC got chosen and was being voted on - it's been a very
> busy summer.
>
> Thanks,
> Andrew
>
> On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Just let me express my rather strong dissatisfaction with the way
>> this "last minute" is raised.
>>
>> It is very late to come up with such a statement - not that it comes at
>> all, but when it comes when everyone had a chance to take a look and
>> comment, including taking a look at the POC and result of checks. This has
>> never been raised even 4 months ago where the only choices were Thrift and
>> gRPc).
>>
>> I REALLY hope the arguments are very strong and backed by real examples
>> and data why it is a bad choice rather than opinions.
>>
>> J,.
>>
>> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>>
>>> Sorry to weigh in at the last minute, but I'm wary of gRPC over just
>>> JSON, so -1 to that specific choice. Everything else I'm happy with.
>>>
>>> I (or Andrew G) will follow up with more details shortly.
>>>
>>> -ash
>>>
>>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <
>>> jarek@potiuk.com> wrote:
>>>
>>> Oh yeah :)
>>>
>>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pi...@umich.edu> wrote:
>>>
>>>> ah, good call.
>>>>
>>>> I guess the email template can be updated:
>>>>
>>>> Only votes from PMC members are binding, but members of the community
>>>>> are encouraged to check the AIP and vote with "(non-binding)".
>>>>
>>>>
>>>>
>>>> +1 (binding)
>>>>
>>>> Thanks,
>>>>
>>>> Ping
>>>>
>>>>
>>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>
>>>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's votes
>>>>> are binding. See
>>>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>>>>> :D
>>>>>
>>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu> wrote:
>>>>>
>>>>>> +1 (non-binding)
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Ping
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>>>
>>>>>>> Hey everyone,
>>>>>>>
>>>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>>>>>
>>>>>>> The AIP-44 is here:
>>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>>>>
>>>>>>> Discussion thread:
>>>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>>>>
>>>>>>> The voting will last for 5 days (until 9th of August 2022 11:00
>>>>>>> CEST), and until at least 3 binding votes have been cast.
>>>>>>>
>>>>>>> Please vote accordingly:
>>>>>>>
>>>>>>> [ ] + 1 approve
>>>>>>> [ ] + 0 no opinion
>>>>>>> [ ] - 1 disapprove with the reason
>>>>>>>
>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>> community are encouraged to check the AIP and vote with
>>>>>>> "(non-binding)".
>>>>>>>
>>>>>>> ----
>>>>>>>
>>>>>>> Just a summary of where we are:
>>>>>>>
>>>>>>> It's been long in the making, but I think it might be a great
>>>>>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>>>>>> have is quite well thought out and discussed a lot in the past:
>>>>>>>
>>>>>>> * we have a working POC for implementation used for performance
>>>>>>> testing:  https://github.com/apache/airflow/pull/25094
>>>>>>> * it is based on on industry-standard open-source gRPC (which is
>>>>>>> already our dependency) which fits better the RPC "model" we need than our
>>>>>>> public REST API.
>>>>>>> * it has moderate performance impact and rather good
>>>>>>> maintainability features (very little impact on regular development effort)
>>>>>>> * it is fully backwards compatible - the new DB isolation will be an
>>>>>>> optional feature
>>>>>>> * has a solid plan for full test coverage in our CI
>>>>>>> * has a backing and plans for more extensive complete testing in
>>>>>>> "real" environment with Google Composer team support
>>>>>>> * allows for further extensions as part of AIP-1 (I am planning to
>>>>>>> re-establish sig-multitenancy effort for follow up AIPs once this one is
>>>>>>> well in progress).
>>>>>>>
>>>>>>>
>>>>>>> J.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Andrew Godwin <an...@astronomer.io.INVALID>.
Hi Jarek,

Apologies - I was as not involved as I wanted to be in the AIP-44 process,
and obviously my vote is non-binding anyway - but having done a lot of
Python API development over the past decade or so I wanted to know why the
decision was made to go with gRPC over just plain HTTP+JSON (with a schema,
of course).

The AIP covers why XMLRPC and Thrift lost out to gRPC, which I agree with -
but does not go into the option of using a standard Python HTTP server with
JSON schema enforcement, such as FastAPI. In my experience, the tools for
load balancing, debugging, testing and monitoring JSON/HTTP are superior
and more numerous than those for gRPC, and in addition the asynchronous
support for gRPC servers is lacking compared to their plain HTTP
counterparts, and the fact that you can interact and play with the APIs in
prototyping stages without having to handle obtaining correct protobuf
versions for the Airflow version you're using.

I wouldn't go so far as to suggest a veto, but I do want to see the AIP
address why gRPC would win over this option. Apologies again for the late
realisation that gRPC got chosen and was being voted on - it's been a very
busy summer.

Thanks,
Andrew

On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Just let me express my rather strong dissatisfaction with the way
> this "last minute" is raised.
>
> It is very late to come up with such a statement - not that it comes at
> all, but when it comes when everyone had a chance to take a look and
> comment, including taking a look at the POC and result of checks. This has
> never been raised even 4 months ago where the only choices were Thrift and
> gRPc).
>
> I REALLY hope the arguments are very strong and backed by real examples
> and data why it is a bad choice rather than opinions.
>
> J,.
>
> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> Sorry to weigh in at the last minute, but I'm wary of gRPC over just
>> JSON, so -1 to that specific choice. Everything else I'm happy with.
>>
>> I (or Andrew G) will follow up with more details shortly.
>>
>> -ash
>>
>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <ja...@potiuk.com>
>> wrote:
>>
>> Oh yeah :)
>>
>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pi...@umich.edu> wrote:
>>
>>> ah, good call.
>>>
>>> I guess the email template can be updated:
>>>
>>> Only votes from PMC members are binding, but members of the community
>>>> are encouraged to check the AIP and vote with "(non-binding)".
>>>
>>>
>>>
>>> +1 (binding)
>>>
>>> Thanks,
>>>
>>> Ping
>>>
>>>
>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's votes
>>>> are binding. See
>>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>>>> :D
>>>>
>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu> wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Ping
>>>>>
>>>>>
>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>>
>>>>>> Hey everyone,
>>>>>>
>>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>>>>
>>>>>> The AIP-44 is here:
>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>>>
>>>>>> Discussion thread:
>>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>>>
>>>>>> The voting will last for 5 days (until 9th of August 2022 11:00
>>>>>> CEST), and until at least 3 binding votes have been cast.
>>>>>>
>>>>>> Please vote accordingly:
>>>>>>
>>>>>> [ ] + 1 approve
>>>>>> [ ] + 0 no opinion
>>>>>> [ ] - 1 disapprove with the reason
>>>>>>
>>>>>> Only votes from PMC members are binding, but members of the
>>>>>> community are encouraged to check the AIP and vote with
>>>>>> "(non-binding)".
>>>>>>
>>>>>> ----
>>>>>>
>>>>>> Just a summary of where we are:
>>>>>>
>>>>>> It's been long in the making, but I think it might be a great
>>>>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>>>>> have is quite well thought out and discussed a lot in the past:
>>>>>>
>>>>>> * we have a working POC for implementation used for performance
>>>>>> testing:  https://github.com/apache/airflow/pull/25094
>>>>>> * it is based on on industry-standard open-source gRPC (which is
>>>>>> already our dependency) which fits better the RPC "model" we need than our
>>>>>> public REST API.
>>>>>> * it has moderate performance impact and rather good
>>>>>> maintainability features (very little impact on regular development effort)
>>>>>> * it is fully backwards compatible - the new DB isolation will be an
>>>>>> optional feature
>>>>>> * has a solid plan for full test coverage in our CI
>>>>>> * has a backing and plans for more extensive complete testing in
>>>>>> "real" environment with Google Composer team support
>>>>>> * allows for further extensions as part of AIP-1 (I am planning to
>>>>>> re-establish sig-multitenancy effort for follow up AIPs once this one is
>>>>>> well in progress).
>>>>>>
>>>>>>
>>>>>> J.
>>>>>>
>>>>>>
>>>>>>
>>>>>>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Jarek Potiuk <ja...@potiuk.com>.
Just let me express my rather strong dissatisfaction with the way
this "last minute" is raised.

It is very late to come up with such a statement - not that it comes at
all, but when it comes when everyone had a chance to take a look and
comment, including taking a look at the POC and result of checks. This has
never been raised even 4 months ago where the only choices were Thrift and
gRPc).

I REALLY hope the arguments are very strong and backed by real examples and
data why it is a bad choice rather than opinions.

J,.

On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> Sorry to weigh in at the last minute, but I'm wary of gRPC over just JSON,
> so -1 to that specific choice. Everything else I'm happy with.
>
> I (or Andrew G) will follow up with more details shortly.
>
> -ash
>
> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <ja...@potiuk.com>
> wrote:
>
> Oh yeah :)
>
> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pi...@umich.edu> wrote:
>
>> ah, good call.
>>
>> I guess the email template can be updated:
>>
>> Only votes from PMC members are binding, but members of the community are
>>> encouraged to check the AIP and vote with "(non-binding)".
>>
>>
>>
>> +1 (binding)
>>
>> Thanks,
>>
>> Ping
>>
>>
>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's votes
>>> are binding. See
>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>>> :D
>>>
>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu> wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>> Thanks,
>>>>
>>>> Ping
>>>>
>>>>
>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>
>>>>> Hey everyone,
>>>>>
>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>>>
>>>>> The AIP-44 is here:
>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>>
>>>>> Discussion thread:
>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>>
>>>>> The voting will last for 5 days (until 9th of August 2022 11:00
>>>>> CEST), and until at least 3 binding votes have been cast.
>>>>>
>>>>> Please vote accordingly:
>>>>>
>>>>> [ ] + 1 approve
>>>>> [ ] + 0 no opinion
>>>>> [ ] - 1 disapprove with the reason
>>>>>
>>>>> Only votes from PMC members are binding, but members of the community
>>>>> are encouraged to check the AIP and vote with "(non-binding)".
>>>>>
>>>>> ----
>>>>>
>>>>> Just a summary of where we are:
>>>>>
>>>>> It's been long in the making, but I think it might be a great
>>>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>>>> have is quite well thought out and discussed a lot in the past:
>>>>>
>>>>> * we have a working POC for implementation used for performance
>>>>> testing:  https://github.com/apache/airflow/pull/25094
>>>>> * it is based on on industry-standard open-source gRPC (which is
>>>>> already our dependency) which fits better the RPC "model" we need than our
>>>>> public REST API.
>>>>> * it has moderate performance impact and rather good
>>>>> maintainability features (very little impact on regular development effort)
>>>>> * it is fully backwards compatible - the new DB isolation will be an
>>>>> optional feature
>>>>> * has a solid plan for full test coverage in our CI
>>>>> * has a backing and plans for more extensive complete testing in
>>>>> "real" environment with Google Composer team support
>>>>> * allows for further extensions as part of AIP-1 (I am planning to
>>>>> re-establish sig-multitenancy effort for follow up AIPs once this one is
>>>>> well in progress).
>>>>>
>>>>>
>>>>> J.
>>>>>
>>>>>
>>>>>
>>>>>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Ash Berlin-Taylor <as...@apache.org>.
Sorry to weigh in at the last minute, but I'm wary of gRPC over just 
JSON, so -1 to that specific choice. Everything else I'm happy with.

I (or Andrew G) will follow up with more details shortly.

-ash

On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk 
<ja...@potiuk.com> wrote:
> Oh yeah :)
> 
> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pingzh@umich.edu 
> <ma...@umich.edu>> wrote:
>> ah, good call.
>> 
>> I guess the email template can be updated:
>> 
>>> Only votes from PMC members are binding, but members of the 
>>> community are encouraged to check the AIP and vote with 
>>> "(non-binding)".
>> 
>> 
>> +1 (binding)
>> 
>> Thanks,
>> 
>> Ping
>> 
>> 
>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <jarek@potiuk.com 
>> <ma...@potiuk.com>> wrote:
>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's 
>>> votes are binding. See 
>>> <https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy> 
>>> :D
>>> 
>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pingzh@umich.edu 
>>> <ma...@umich.edu>> wrote:
>>>> +1 (non-binding)
>>>> 
>>>> Thanks,
>>>> 
>>>> Ping
>>>> 
>>>> 
>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <jarek@potiuk.com 
>>>> <ma...@potiuk.com>> wrote:
>>>>> Hey everyone,
>>>>> 
>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>>> 
>>>>> The AIP-44 is here: 
>>>>> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API>
>>>>> 
>>>>> Discussion thread: 
>>>>> <https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3>
>>>>> 
>>>>> The voting will last for 5 days (until 9th of August 2022 11:00 
>>>>> CEST), and until at least 3 binding votes have been cast.
>>>>> 
>>>>> Please vote accordingly:
>>>>> 
>>>>> [ ] + 1 approve
>>>>> [ ] + 0 no opinion
>>>>> [ ] - 1 disapprove with the reason
>>>>> 
>>>>> Only votes from PMC members are binding, but members of the 
>>>>> community are encouraged to check the AIP and vote with 
>>>>> "(non-binding)".
>>>>> 
>>>>> ----
>>>>> 
>>>>> Just a summary of where we are:
>>>>> 
>>>>> It's been long in the making, but I think it might be a great 
>>>>> step-forward to our long-term multi-tenancy goal. I believe the 
>>>>> proposal I have is quite well thought out and discussed a lot in 
>>>>> the past:
>>>>> 
>>>>> * we have a working POC for implementation used for performance 
>>>>> testing:  <https://github.com/apache/airflow/pull/25094>
>>>>> * it is based on on industry-standard open-source gRPC (which is 
>>>>> already our dependency) which fits better the RPC "model" we need 
>>>>> than our public REST API.
>>>>> * it has moderate performance impact and rather good 
>>>>> maintainability features (very little impact on regular 
>>>>> development effort)
>>>>> * it is fully backwards compatible - the new DB isolation will be 
>>>>> an optional feature
>>>>> * has a solid plan for full test coverage in our CI
>>>>> * has a backing and plans for more extensive complete testing in 
>>>>> "real" environment with Google Composer team support
>>>>> * allows for further extensions as part of AIP-1 (I am planning 
>>>>> to re-establish sig-multitenancy effort for follow up AIPs once 
>>>>> this one is well in progress).
>>>>> 
>>>>> 
>>>>> J.
>>>>> 
>>>>> 
>>>>> 


Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Jarek Potiuk <ja...@potiuk.com>.
Oh yeah :)

On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pi...@umich.edu> wrote:

> ah, good call.
>
> I guess the email template can be updated:
>
> Only votes from PMC members are binding, but members of the community are
>> encouraged to check the AIP and vote with "(non-binding)".
>
>
>
> +1 (binding)
>
> Thanks,
>
> Ping
>
>
> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's votes are
>> binding. See
>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>> :D
>>
>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Thanks,
>>>
>>> Ping
>>>
>>>
>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>>> Hey everyone,
>>>>
>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>>
>>>> The AIP-44 is here:
>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>
>>>> Discussion thread:
>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>
>>>> The voting will last for 5 days (until 9th of August 2022 11:00 CEST),
>>>> and until at least 3 binding votes have been cast.
>>>>
>>>> Please vote accordingly:
>>>>
>>>> [ ] + 1 approve
>>>> [ ] + 0 no opinion
>>>> [ ] - 1 disapprove with the reason
>>>>
>>>> Only votes from PMC members are binding, but members of the community
>>>> are encouraged to check the AIP and vote with "(non-binding)".
>>>>
>>>> ----
>>>>
>>>> Just a summary of where we are:
>>>>
>>>> It's been long in the making, but I think it might be a great
>>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>>> have is quite well thought out and discussed a lot in the past:
>>>>
>>>> * we have a working POC for implementation used for performance
>>>> testing:  https://github.com/apache/airflow/pull/25094
>>>> * it is based on on industry-standard open-source gRPC (which is
>>>> already our dependency) which fits better the RPC "model" we need than our
>>>> public REST API.
>>>> * it has moderate performance impact and rather good
>>>> maintainability features (very little impact on regular development effort)
>>>> * it is fully backwards compatible - the new DB isolation will be an
>>>> optional feature
>>>> * has a solid plan for full test coverage in our CI
>>>> * has a backing and plans for more extensive complete testing in "real"
>>>> environment with Google Composer team support
>>>> * allows for further extensions as part of AIP-1 (I am planning to
>>>> re-establish sig-multitenancy effort for follow up AIPs once this one is
>>>> well in progress).
>>>>
>>>>
>>>> J.
>>>>
>>>>
>>>>
>>>>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Ping Zhang <pi...@umich.edu>.
ah, good call.

I guess the email template can be updated:

Only votes from PMC members are binding, but members of the community are
> encouraged to check the AIP and vote with "(non-binding)".



+1 (binding)

Thanks,

Ping


On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Thank you . And BTW. It's binding Ping :). For AIP's commiter's votes are
> binding. See
> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
> :D
>
> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu> wrote:
>
>> +1 (non-binding)
>>
>> Thanks,
>>
>> Ping
>>
>>
>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> Hey everyone,
>>>
>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>
>>> The AIP-44 is here:
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>
>>> Discussion thread:
>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>
>>> The voting will last for 5 days (until 9th of August 2022 11:00 CEST),
>>> and until at least 3 binding votes have been cast.
>>>
>>> Please vote accordingly:
>>>
>>> [ ] + 1 approve
>>> [ ] + 0 no opinion
>>> [ ] - 1 disapprove with the reason
>>>
>>> Only votes from PMC members are binding, but members of the community
>>> are encouraged to check the AIP and vote with "(non-binding)".
>>>
>>> ----
>>>
>>> Just a summary of where we are:
>>>
>>> It's been long in the making, but I think it might be a great
>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>> have is quite well thought out and discussed a lot in the past:
>>>
>>> * we have a working POC for implementation used for performance
>>> testing:  https://github.com/apache/airflow/pull/25094
>>> * it is based on on industry-standard open-source gRPC (which is already
>>> our dependency) which fits better the RPC "model" we need than our public
>>> REST API.
>>> * it has moderate performance impact and rather good
>>> maintainability features (very little impact on regular development effort)
>>> * it is fully backwards compatible - the new DB isolation will be an
>>> optional feature
>>> * has a solid plan for full test coverage in our CI
>>> * has a backing and plans for more extensive complete testing in "real"
>>> environment with Google Composer team support
>>> * allows for further extensions as part of AIP-1 (I am planning to
>>> re-establish sig-multitenancy effort for follow up AIPs once this one is
>>> well in progress).
>>>
>>>
>>> J.
>>>
>>>
>>>
>>>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Jarek Potiuk <ja...@potiuk.com>.
Thank you . And BTW. It's binding Ping :). For AIP's commiter's votes are
binding. See
https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
:D

On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu> wrote:

> +1 (non-binding)
>
> Thanks,
>
> Ping
>
>
> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Hey everyone,
>>
>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>
>> The AIP-44 is here:
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>
>> Discussion thread:
>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>
>> The voting will last for 5 days (until 9th of August 2022 11:00 CEST),
>> and until at least 3 binding votes have been cast.
>>
>> Please vote accordingly:
>>
>> [ ] + 1 approve
>> [ ] + 0 no opinion
>> [ ] - 1 disapprove with the reason
>>
>> Only votes from PMC members are binding, but members of the community
>> are encouraged to check the AIP and vote with "(non-binding)".
>>
>> ----
>>
>> Just a summary of where we are:
>>
>> It's been long in the making, but I think it might be a great
>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>> have is quite well thought out and discussed a lot in the past:
>>
>> * we have a working POC for implementation used for performance testing:
>> https://github.com/apache/airflow/pull/25094
>> * it is based on on industry-standard open-source gRPC (which is already
>> our dependency) which fits better the RPC "model" we need than our public
>> REST API.
>> * it has moderate performance impact and rather good
>> maintainability features (very little impact on regular development effort)
>> * it is fully backwards compatible - the new DB isolation will be an
>> optional feature
>> * has a solid plan for full test coverage in our CI
>> * has a backing and plans for more extensive complete testing in "real"
>> environment with Google Composer team support
>> * allows for further extensions as part of AIP-1 (I am planning to
>> re-establish sig-multitenancy effort for follow up AIPs once this one is
>> well in progress).
>>
>>
>> J.
>>
>>
>>
>>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Phani Kumar <ph...@astronomer.io.INVALID>.
+1 (non-binding)

On Wed, Aug 10, 2022 at 10:50 PM Igor Kholopov <ik...@google.com.invalid>
wrote:

> +1 (non-binding)
>
> Best,
> Igor
>
> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu> wrote:
>
>> +1 (non-binding)
>>
>> Thanks,
>>
>> Ping
>>
>>
>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> Hey everyone,
>>>
>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>
>>> The AIP-44 is here:
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>
>>> Discussion thread:
>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>
>>> The voting will last for 5 days (until 9th of August 2022 11:00 CEST),
>>> and until at least 3 binding votes have been cast.
>>>
>>> Please vote accordingly:
>>>
>>> [ ] + 1 approve
>>> [ ] + 0 no opinion
>>> [ ] - 1 disapprove with the reason
>>>
>>> Only votes from PMC members are binding, but members of the community
>>> are encouraged to check the AIP and vote with "(non-binding)".
>>>
>>> ----
>>>
>>> Just a summary of where we are:
>>>
>>> It's been long in the making, but I think it might be a great
>>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>>> have is quite well thought out and discussed a lot in the past:
>>>
>>> * we have a working POC for implementation used for performance
>>> testing:  https://github.com/apache/airflow/pull/25094
>>> * it is based on on industry-standard open-source gRPC (which is already
>>> our dependency) which fits better the RPC "model" we need than our public
>>> REST API.
>>> * it has moderate performance impact and rather good
>>> maintainability features (very little impact on regular development effort)
>>> * it is fully backwards compatible - the new DB isolation will be an
>>> optional feature
>>> * has a solid plan for full test coverage in our CI
>>> * has a backing and plans for more extensive complete testing in "real"
>>> environment with Google Composer team support
>>> * allows for further extensions as part of AIP-1 (I am planning to
>>> re-establish sig-multitenancy effort for follow up AIPs once this one is
>>> well in progress).
>>>
>>>
>>> J.
>>>
>>>
>>>
>>>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Igor Kholopov <ik...@google.com.INVALID>.
+1 (non-binding)

Best,
Igor

On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pi...@umich.edu> wrote:

> +1 (non-binding)
>
> Thanks,
>
> Ping
>
>
> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Hey everyone,
>>
>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>
>> The AIP-44 is here:
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>
>> Discussion thread:
>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>
>> The voting will last for 5 days (until 9th of August 2022 11:00 CEST),
>> and until at least 3 binding votes have been cast.
>>
>> Please vote accordingly:
>>
>> [ ] + 1 approve
>> [ ] + 0 no opinion
>> [ ] - 1 disapprove with the reason
>>
>> Only votes from PMC members are binding, but members of the community
>> are encouraged to check the AIP and vote with "(non-binding)".
>>
>> ----
>>
>> Just a summary of where we are:
>>
>> It's been long in the making, but I think it might be a great
>> step-forward to our long-term multi-tenancy goal. I believe the proposal I
>> have is quite well thought out and discussed a lot in the past:
>>
>> * we have a working POC for implementation used for performance testing:
>> https://github.com/apache/airflow/pull/25094
>> * it is based on on industry-standard open-source gRPC (which is already
>> our dependency) which fits better the RPC "model" we need than our public
>> REST API.
>> * it has moderate performance impact and rather good
>> maintainability features (very little impact on regular development effort)
>> * it is fully backwards compatible - the new DB isolation will be an
>> optional feature
>> * has a solid plan for full test coverage in our CI
>> * has a backing and plans for more extensive complete testing in "real"
>> environment with Google Composer team support
>> * allows for further extensions as part of AIP-1 (I am planning to
>> re-establish sig-multitenancy effort for follow up AIPs once this one is
>> well in progress).
>>
>>
>> J.
>>
>>
>>
>>

Re: [VOTE] AIP-44 - Airflow Internal API

Posted by Ping Zhang <pi...@umich.edu>.
+1 (non-binding)

Thanks,

Ping


On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Hey everyone,
>
> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>
> The AIP-44 is here:
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>
> Discussion thread:
> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>
> The voting will last for 5 days (until 9th of August 2022 11:00 CEST),
> and until at least 3 binding votes have been cast.
>
> Please vote accordingly:
>
> [ ] + 1 approve
> [ ] + 0 no opinion
> [ ] - 1 disapprove with the reason
>
> Only votes from PMC members are binding, but members of the community are
> encouraged to check the AIP and vote with "(non-binding)".
>
> ----
>
> Just a summary of where we are:
>
> It's been long in the making, but I think it might be a great step-forward
> to our long-term multi-tenancy goal. I believe the proposal I have is quite
> well thought out and discussed a lot in the past:
>
> * we have a working POC for implementation used for performance testing:
> https://github.com/apache/airflow/pull/25094
> * it is based on on industry-standard open-source gRPC (which is already
> our dependency) which fits better the RPC "model" we need than our public
> REST API.
> * it has moderate performance impact and rather good
> maintainability features (very little impact on regular development effort)
> * it is fully backwards compatible - the new DB isolation will be an
> optional feature
> * has a solid plan for full test coverage in our CI
> * has a backing and plans for more extensive complete testing in "real"
> environment with Google Composer team support
> * allows for further extensions as part of AIP-1 (I am planning to
> re-establish sig-multitenancy effort for follow up AIPs once this one is
> well in progress).
>
>
> J.
>
>
>
>