You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/28 11:52:40 UTC

[GitHub] [arrow] alamb commented on a change in pull request #8731: [Rust] [RFC] Native Rust Arrow SQL IO

alamb commented on a change in pull request #8731:
URL: https://github.com/apache/arrow/pull/8731#discussion_r532030344



##########
File path: rust/datafusion/src/physical_plan/sql.rs
##########
@@ -0,0 +1,246 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Execution plan for reading Parquet files
+
+use std::any::Any;
+use std::sync::Arc;
+use std::task::{Context, Poll};
+use std::{fmt, thread};
+
+use super::{RecordBatchStream, SendableRecordBatchStream};
+use crate::error::{DataFusionError, Result};
+use crate::physical_plan::ExecutionPlan;
+use crate::physical_plan::Partitioning;
+use arrow::datatypes::SchemaRef;
+use arrow::error::{ArrowError, Result as ArrowResult};
+use arrow::record_batch::RecordBatch;
+use arrow_sql::postgres::PostgresReadIterator;
+use crossbeam::channel::{bounded, Receiver, RecvError, Sender};
+use fmt::Debug;
+
+use async_trait::async_trait;
+use futures::stream::Stream;
+
+/// Execution plan for scanning a SQL data source
+#[derive(Debug, Clone)]
+pub struct SqlExec {
+    /// SQL connection
+    connection: String,
+    /// SQL query
+    query: String,
+    /// Schema after projection is applied
+    schema: SchemaRef,
+    /// Projection for which columns to load
+    projection: Vec<usize>,
+    /// Batch size
+    batch_size: usize,
+}
+
+impl SqlExec {
+    /// Create a new SQL reader execution plan
+    pub fn try_new(
+        connection: &str,
+        query: &str,
+        projection: Option<Vec<usize>>,
+        batch_size: usize,
+    ) -> Result<Self> {
+        // TODO: we could/should determine the type of database at this point

Review comment:
       I think @andygrove 's suggestion of using some generic database connectivity layer here (e.g. RDBC https://github.com/tokio-rs/rdbc) rather than building our own registry / set of database drivers would be a good idea to avoid duplicating the (substantial) effort needed for various database connectivity. 

##########
File path: rust/sql/src/lib.rs
##########
@@ -0,0 +1,79 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#![allow(dead_code)]
+
+use arrow::datatypes::{Schema, SchemaRef};
+use arrow::error::Result;
+use arrow::record_batch::RecordBatch;
+
+// supported database modules
+pub mod postgres;
+
+///
+/// a SQL data source, used to read data from a SQL database into Arrow batches
+pub trait SqlDataSource {

Review comment:
       This code seems similar to https://github.com/tokio-rs/rdbc (e.g. `Connection`, `Driver` and `ResultSet`)

##########
File path: rust/sql/src/postgres/reader.rs
##########
@@ -0,0 +1,938 @@
+// Licensed to the Apache Software Foundation (ASF) under one

Review comment:
       This is pretty cool (a partial implementation of hte postgres line protocol and conversions to/from arrow). I wonder if we could leverage something like https://github.com/sfackler/rust-postgres rather than taking on a new independent implementation




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org