You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <ja...@potiuk.com> on 2022/02/16 21:06:58 UTC

Muti-tenancy Meeting #4 summary

Thanks everyone for participating today - we had again quite a number of
people interested.

Meeting notes here:
https://docs.google.com/document/d/14mzVkvm5GheCCAcMUzOBN9hw-2aWZKQgQYDzTJIiGFg/edit#


Video Recording here:
https://drive.google.com/file/d/13cRteIuB-6NJm8lKWACiO_VMTWOkEHCR/view?usp=drive_web

And Chat transcript here:
https://drive.google.com/file/d/1i4vnwB_2tcQHkaywcse6FsWpufp3kj9f/view?usp=drive_web


I invented a nice way to quickly enlist all attendees :). See yourself in
the notes :)

Summary of the Multi Tenancy discussion #4 today (this was a bit more than
Multi-Tenancy but all things discussed were related to the work we do in
this area with some cross-dependencies to existing/proposed AIPs)::

Notes:

   -

   AIP-43 (DAG Processor  separation) -> Mateusz: In Progress. Things are
   progressing without surprises
   -

   AIP-44 (Internal API)-> Jarek: came with some variation on a modified
   approach on how to implement logic of the replaced function. Speculatively:
   -

      using RPC in-memory or using local TCP/Unix domain socket might
      improve the maintainability and make it easier to implement DB isolation
      (to be checked by benchmarking if the overhead is not a problem)
      -

      Apache Thrift and gRPC have been proposed as viable implementations
      -

      Discussion around scalability, hops, SSL and deployment scenario:
      leads to the conclusion that we need to describe this in the
documentation
      (and link to the documentation of the chosen technology for RPC
- regarding
      deployment).
      -

      Before voting some more benchmarking and testing is needed (Jarek
      with the engagement/help of Evgeni and Giorgio). Evgeni has an experience
      from Databand with similar approaches and this can be reused.
      -

   Ping: presented AIP-45 (Remove Double Dag Parsing):
   -

      Ash: Potential problem with deps when dynamically set (they are not
      serializable). Possibly can be replaced by Scheduler doing extra work.
      -

      The savings are mostly important for big DAGs and run_as_user
      scenario only
      -

      General consensus: idea is good, needs clarification of the deps case
      but seems like everyone like the solution especially that it shifts some
      code in the way that is good (airflow local without DAG parsing, at all,
      airflow run doing the heavy lifting, scheduler doing a bit more
with ‘deps”
      -

   Ping: present AIP-46 (Docker Runtime Isolation):
   -

      Good Idea
      -

      We all agree that this should be an optional add-on rather than
      Airflow Feature. Instead of implementing it in the core of
Airflow, Airflow
      should be extended with necessary hook, that will enable to provide a
      “matching” runtimes for Parsing and Scheduling
      -

      Rather than trying to implement Docker Runtime code that Airflow
      Community should maintain - this way AirBnb or others  can provide their
      own Parse/Execute “runtime” implementation.


Action items

   -

   Mateusz: continues AIP-43
   -

   Jarek (+Evgeni/Giorgio): benchmarking/analysis of implementation details
   for AIP-44
   -

   Ping: AIP-45 - deeper dive on what to do with deps
   -

   Ping: AIP-46 - look at updating the AIP-46 with details and description
   on how to modify Airflow to allow pluggable runtimes for parsing and
   execution (which might get AirBnB example implementation using Docker
   Runtime).


J.

Re: Muti-tenancy Meeting #4 summary

Posted by Giorgio Zoppi <gi...@gmail.com>.
Hello,
It was quite interesting. From my side, we have done some work in the past
with GRPC and with envoy proxy translating calls from REST to GRPC.
The problem was to store remotely browser settings.
-
https://blog.envoyproxy.io/envoy-and-grpc-web-a-fresh-new-alternative-to-rest-6504ce7eb880
-
https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/grpc_json_transcoder_filter

I haven't realized the need of an 'in-memory' GRPC since a interprcess
shared memory message queue abstraction will suffice:

1. serialize the message in protobuf to queue
2. wake up the receiver
3. the receiver gets the message in a queue slot and  deserializes protobuf.

This might happen without passing through slow unix domain sockets or
network sockets.

It might be an interesting project to provide an abstraction that follows
- local://endpoint:port - does local rpc via interprocess message queue
- remote://endpoint:port - does remote rpc via Grpc.

As side note i would like to share this:
https://www.alluxio.io/blog/moving-from-apache-thrift-to-grpc-a-perspective-from-alluxio/
More or less this matches our experience and it's the reason because i
asked if you needed streaming in GRPC.

Those are the two main strong points for grpc vs thrift:

- Streaming
- Use of HTTP/2 (https://grpc.io/blog/grpc-on-http2/)

Just 1c,
Giorgio/