You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/16 20:00:54 UTC

[GitHub] [arrow-rs] nevi-me commented on issue #1323: Consider support / implementation of Arrow Flight SQL

nevi-me commented on issue #1323:
URL: https://github.com/apache/arrow-rs/issues/1323#issuecomment-1042111353


   @e-dard there was a discussion a while ago about RDBMS query passthrough in Datafusion. I tried it out using @alamb's excellent guide with the TopK node. My approach was to create a custom optimizer that understands the source of `LogicalPlan::TableScan`'s source (e.g. if we've registered a Postgres schema in the catalog).
   That way, if a plan includes a join of 2 tables from the same DB, they could be rewritten into a SQL join query. Or when we have something like:
   
   ```sql
   select sum(a), avg(b), max(c), d from postgres.my_table
   where <filter expressions>
   group by d
   ```
   
   For the above query, I've previously been able to convert the logical plan:
   
   ```
   |_ Aggregate (sum(a), avg(b), min(c)) by d
      |_ Filter (<filter expr>)
         |_ Project (a, b, c, d)
            |_ TableScan (select * from postgres.my_table)
   ```
   
   Into the below by implementing `TableScan` and converting filter-pushdown predicates to SQL expressions
   
   ```
   |_ Aggregate (sum(a), avg(b), min(c)) by d
      |_ TableScan (select a, b, c, d from postgres.my_table where <filter expr>)
   ```
   
   With aggregate, join, union etc pushdown missing.
   
   Through trying to implement custom nodes & planners for RDBMS, I've also found that it's quite possible to translate much of the `LogicalPlan` types into a SQL query, though the code I was generating by hand would get very nested, and could benefit from some further optimization ala Apache Calcite.
   
   I appreciate that I'm very inactive of late, but I'm just mentioning it out of interest, because I also thought of Flight SQL being the standard interface to connect to various DBs when I saw the PR a few weeks ago. The alternative of supporting N interfaces can get daunting. `connector-x` would help with that to some extent, but I would prefer if DB implementations end up supporting Flight SQL natively.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org