You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "lidavidm (via GitHub)" <gi...@apache.org> on 2023/03/09 14:30:55 UTC

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #504: docs: update README and add FAQ

lidavidm commented on code in PR #504:
URL: https://github.com/apache/arrow-adbc/pull/504#discussion_r1131108485


##########
docs/source/faq.rst:
##########
@@ -0,0 +1,112 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+================================
+Frequently Asked Questions (FAQ)
+================================
+
+What exactly is ADBC?
+=====================
+
+ADBC is:
+
+- A set of abstract APIs in different languages (C/C++, Go, and Java)
+  for working with databases and Arrow data.
+
+  For example, result sets of queries in ADBC are all returned as
+  streams of Arrow data, not row-by-row.
+- A set of implementations of that API in different languages (C/C++,
+  Go, Java, Python, and Ruby) that target different databases
+  (e.g. PostgreSQL, SQLite, any database supporting Flight SQL).
+
+Why not just use JDBC/ODBC?
+===========================
+
+JDBC uses row-based interfaces like `ResultSet`_.  When working with
+columnar data, like Arrow data, this means that we have to convert the
+data at least once and possibly twice:
+
+- Once (possibly) in the driver or database, to take columnar data and
+  convert it into a row-based format so it can be returned through the
+  JDBC APIs.
+- Once (always) when a client application pulls data from the JDBC
+  API, to convert the rows into columns.
+
+In keeping with Arrow's "zero-copy" or "minimal-copy" ethos, we would
+like to avoid these unnecessary conversions.
+
+ODBC is in a similar situation.  Although ODBC does support
+`"column-wise binding"`_, not all ODBC drivers support it, and it is
+more complex to use.  Additionally, ODBC uses caller-allocated buffers
+(which often means forcing a data copy), and ODBC specifies data
+layouts that are not quite Arrow-compatible (requiring a data
+conversion anyways).
+
+.. _ResultSet: https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSet.html
+.. _"column-wise binding": https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/column-wise-binding?view=sql-server-ver16
+
+How do ADBC and Arrow Flight SQL differ?
+========================================
+
+ADBC is an *API abstraction*.  It doesn't specify how to talk to the
+database, just the API calls that you make as an application
+developer.  Under the hood, a driver must take those API calls and
+talk to the actual database.  Another perspective is that ADBC is all
+about the client-side, and specifies nothing about the network
+protocol or server-side implementation.
+
+Flight SQL is a *wire protocol*.  It specifies the exact commands to
+send to a database to perform various actions like authenticating with
+the database, creating prepared statements, or executing queries.
+Flight SQL specifies the network protocol that the client and the
+server must implement.
+
+One more way of looking at it: an ADBC driver can be written for a
+database purely as a client library.  (That's how the PostgreSQL
+driver in this repository is implemented, for instance—as a wrapper
+around libpq.)  But adding Flight SQL support to a database means
+either modifying the database to run a Flight SQL service, or putting
+the database behind a proxy that translates between Flight SQL and the
+database.

Review Comment:
   Thanks for the suggestion. I added two new questions talking more about that relationship, and also talking about JDBC (and citing Turbodbc as a conceptually similar project)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org