You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/08 00:28:16 UTC

[GitHub] [arrow-ballista] avantgardnerio opened a new pull request, #501: Automatically register tables if env var specified

avantgardnerio opened a new pull request, #501:
URL: https://github.com/apache/arrow-ballista/pull/501

   # Which issue does this PR close?
   
   Closes #500.
   
    # Rationale for this change
   
   Described in issue.
   
   # What changes are included in this PR?
   
   1. upgrade of datafusion & arrow
   2. config plumbing
   3. conversion of `default_session_builder` to a supserset of functionality (now with state)
   4. default table factories
   5. a bunch of flight sql work for database introspection (more required in future PRs)
   6. 
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] andygrove commented on pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#issuecomment-1310443974

   > Thank you so much for your review! I realize you cloned and did manual testing and that probably took a fair bit of time, which I really appreciate :)
   
   Yes, this is one of the challenges with this project. Many PRs need to be tested end-to-end and this does take time, which is why I mostly only review larger PRs at weekends.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#issuecomment-1309365283

   @stuartcarnie you might be interested in this as well. I can start copying you on FlightSQL related PRs if you like.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#issuecomment-1307784600

   ![image](https://user-images.githubusercontent.com/3855243/200667706-1213b43c-72cf-4b3c-80b0-858fe53dd56f.png)
   
   We can now see columns


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on a diff in pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#discussion_r1018233454


##########
ballista/client/src/context.rs:
##########
@@ -137,13 +139,17 @@ impl BallistaContext {
     pub async fn standalone(
         config: &BallistaConfig,
         concurrent_tasks: usize,
+        table_factories: HashMap<String, Arc<dyn TableProviderFactory>>,

Review Comment:
   I made it `Option`al with a default. Hopefully passing `None` will be less burdensome.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#issuecomment-1306419404

   @andygrove this allows users to see tables in ballista from DataGrip (and probably tableau):
   
   ![image](https://user-images.githubusercontent.com/3855243/200444475-c278b2fc-3131-40ca-be0a-ecdf771bd34b.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] andygrove commented on a diff in pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
andygrove commented on code in PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#discussion_r1018496984


##########
ballista-cli/Cargo.toml:
##########
@@ -29,12 +29,12 @@ rust-version = "1.59"
 readme = "README.md"
 
 [dependencies]
-ballista = { path = "../ballista/client", version = "0.9.0", features = [
-    "standalone",
-] }
+ballista = { path = "../ballista/client", version = "0.9.0", features = ["standalone"] }
+ballista-core = { path = "../ballista/core", version = "0.9.0" }
 clap = { version = "3", features = ["derive", "cargo"] }
-datafusion = "14.0.0"
-datafusion-cli = "14.0.0"
+dashmap = "5.4.0"
+datafusion = { git = "https://github.com/apache/arrow-datafusion.git", rev = "7b5842b91ebd00a2c7f894fcad797bea68a56d0f" }

Review Comment:
   If we need features added since 14.0.0, then we either need to delay merging this PR until after we release Ballista 0.10.0, or we can merge now and then wait for DataFusion 15.0.0 before releasing Ballista 0.10.0. I don't have a strong preference either way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on a diff in pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#discussion_r1018630373


##########
ballista/core/src/config.rs:
##########
@@ -87,6 +87,17 @@ impl BallistaConfigBuilder {
         Self { settings }
     }
 
+    pub fn load_env(&self) -> Self {
+        let mut settings = self.settings.clone();
+        if let Ok(it) = env::var("DATAFUSION_CATALOG_LOCATION") {
+            settings.insert("datafusion.catalog.location".to_string(), it);

Review Comment:
   > Flight SQL page in the user guide should be updated
   
   I can do that, but I think it should work across the board - even if you connect from a regular ballista client. I'll test that.
   
   > configs.md page should be updated with these DataFusion settings
   
   Thanks, I was unaware. I'll go do that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] r4ntix commented on pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
r4ntix commented on PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#issuecomment-1310024782

   great feature! 👍 
   Is it possible to support custom `CatalogProvider` in the next step?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] yahoNanJing commented on a diff in pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on code in PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#discussion_r1017804258


##########
ballista/client/src/context.rs:
##########
@@ -137,13 +139,17 @@ impl BallistaContext {
     pub async fn standalone(
         config: &BallistaConfig,
         concurrent_tasks: usize,
+        table_factories: HashMap<String, Arc<dyn TableProviderFactory>>,

Review Comment:
   Should we use `TableProviderSessionBuilder` here? And for the standalone case, can the table_factories be determined by the ballista config? Maybe we don't need this additional parameter for this interface.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#issuecomment-1309736376

   > I tried this out
   
   Thank you so much for your review! I realize you cloned and did manual testing and that probably took a fair bit of time, which I really appreciate :)
   
   > did run into an issue
   
   It definitely works with csvs with headers, and there's a test for that. I'll look into parquet directory support tomorrow. If it's not a big deal, let's get it in this PR. If it isn't trivial though, I'd prefer to document it in another issue and get this PR merged - with the logic that getting the framework in place for auto-registration is the bigger deal, and once it's in we can extend it for all the permutations of parameters.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] yahoNanJing commented on a diff in pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on code in PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#discussion_r1017800494


##########
ballista/core/src/utils.rs:
##########
@@ -53,25 +56,86 @@ use log::error;
 #[cfg(feature = "s3")]
 use object_store::aws::AmazonS3Builder;
 use object_store::ObjectStore;
+use std::collections::HashMap;
 use std::io::{BufWriter, Write};
 use std::marker::PhantomData;
+use std::ops::Deref;
 use std::sync::atomic::{AtomicUsize, Ordering};
-use std::sync::Arc;
+use std::sync::{Arc, Mutex};
 use std::time::Duration;
 use std::{fs::File, pin::Pin};
 use tonic::codegen::StdError;
 use tonic::transport::{Channel, Error, Server};
 use url::Url;
 
+pub trait SessionBuilder: Send + Sync {
+    fn build(&self, config: SessionConfig) -> crate::error::Result<SessionState>;
+}
+
 /// Default session builder using the provided configuration
-pub fn default_session_builder(config: SessionConfig) -> SessionState {
-    SessionState::with_config_rt(
-        config,
-        Arc::new(
-            RuntimeEnv::new(with_object_store_provider(RuntimeConfig::default()))
-                .unwrap(),
-        ),
-    )
+#[derive(Clone)]
+pub struct DefaultSessionBuilder {}
+
+impl SessionBuilder for DefaultSessionBuilder {
+    fn build(&self, config: SessionConfig) -> crate::error::Result<SessionState> {
+        let rt_cfg = with_object_store_provider(RuntimeConfig::default());
+        let state =
+            SessionState::with_config_rt(config, Arc::new(RuntimeEnv::new(rt_cfg)?));
+        Ok(state)
+    }
+}
+
+/// Session builder with custom table providers
+#[derive(Clone)]
+pub struct TableProviderSessionBuilder {
+    table_factories: Arc<Mutex<HashMap<String, Arc<dyn TableProviderFactory>>>>,

Review Comment:
   Should we use `DashMap` to replace the `Mutex<HashMap>`, since we have done the similar change in other places and it will improve the performance under high throughput scenarios.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on a diff in pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#discussion_r1018628617


##########
ballista-cli/Cargo.toml:
##########
@@ -29,12 +29,12 @@ rust-version = "1.59"
 readme = "README.md"
 
 [dependencies]
-ballista = { path = "../ballista/client", version = "0.9.0", features = [
-    "standalone",
-] }
+ballista = { path = "../ballista/client", version = "0.9.0", features = ["standalone"] }
+ballista-core = { path = "../ballista/core", version = "0.9.0" }
 clap = { version = "3", features = ["derive", "cargo"] }
-datafusion = "14.0.0"
-datafusion-cli = "14.0.0"
+dashmap = "5.4.0"
+datafusion = { git = "https://github.com/apache/arrow-datafusion.git", rev = "7b5842b91ebd00a2c7f894fcad797bea68a56d0f" }

Review Comment:
   I see we have done the git-ref thing in the past, so there's precedent.
   
   For FlightSql in particular, I've had to wait a long time for arrow to release, then datafusion, then ballista. I've been rebasing changes for months and our fork continues to have changes that are still waiting (in this case on delta-rs, but now they are way behind datafusion).
   
   For these reasons, I'd strongly prefer to do a ballista release shortly after the datafusion release, but then go back to git-ref tracking, so we can keep PRs flowing. 
   
   Just my $0.02



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] andygrove commented on a diff in pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
andygrove commented on code in PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#discussion_r1019237119


##########
ballista-cli/Cargo.toml:
##########
@@ -29,12 +29,12 @@ rust-version = "1.59"
 readme = "README.md"
 
 [dependencies]
-ballista = { path = "../ballista/client", version = "0.9.0", features = [
-    "standalone",
-] }
+ballista = { path = "../ballista/client", version = "0.9.0", features = ["standalone"] }
+ballista-core = { path = "../ballista/core", version = "0.9.0" }
 clap = { version = "3", features = ["derive", "cargo"] }
-datafusion = "14.0.0"
-datafusion-cli = "14.0.0"
+dashmap = "5.4.0"
+datafusion = { git = "https://github.com/apache/arrow-datafusion.git", rev = "7b5842b91ebd00a2c7f894fcad797bea68a56d0f" }

Review Comment:
   Ok, I will plan on releasing Ballista 0.10.0-rc1 tomorrow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#issuecomment-1312230094

   > is it possible to support CatalogProvider in SessionBuilder
   
   I don't see why not. You could follow the patterns in this PR but with a CatalogProvider instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] andygrove commented on pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#issuecomment-1309607938

   I tried this out and did run into an issue when connecting from DataGrip:
   
   ```
   INTERNAL: Failed to create SessionContext: Context("Could not create table for file:///mnt/bigdata/tpch/sf10-parquet/partsupp.parquet at /home/andy/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/7b5842b/datafusion/core/src/catalog/listing_schema.rs:116", ObjectStore(Generic { store: "LocalFileSystem", source: UnableToReadBytes { source: Os { code: 21, kind: IsADirectory, message: "Is a directory" }, path: "/mnt/bigdata/tpch/sf10-parquet/partsupp.parquet" } })).
   ```
   
   The code assumes that files in `DATAFUSION_CATALOG_LOCATION` will be single files and not directories containing files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] stuartcarnie commented on pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
stuartcarnie commented on PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#issuecomment-1309432945

   @avantgardnerio great, thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#issuecomment-1307523156

   @yahoNanJing if you have a minute, I'd appreciate your feedback as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on a diff in pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#discussion_r1018208929


##########
ballista/client/src/context.rs:
##########
@@ -137,13 +139,17 @@ impl BallistaContext {
     pub async fn standalone(
         config: &BallistaConfig,
         concurrent_tasks: usize,
+        table_factories: HashMap<String, Arc<dyn TableProviderFactory>>,

Review Comment:
   > can the table_factories be determined by the ballista config?
   
   Unfortunately, I don't think so:
   
   ```
   #[derive(Debug, Clone, PartialEq, Eq)]
   pub struct BallistaConfig {
   ```
   
   I don't think it is logical for `TableProviderFactories` to implement `PartialEq` or `Eq`, and without making another DataFusion PR, they don't impl `Debug` at present either. Everything else in the `BallistaConfig` is a key-value pair, and extending it to include stateful runtime objects seems like a questionable choice.
   
   I can however swap out the Map for an `Arc<dyn SessionBuilder>`, but then `client` needs to have a new dependency on `core`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on a diff in pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#discussion_r1018315474


##########
ballista/client/src/context.rs:
##########
@@ -137,13 +139,17 @@ impl BallistaContext {
     pub async fn standalone(
         config: &BallistaConfig,
         concurrent_tasks: usize,
+        table_factories: HashMap<String, Arc<dyn TableProviderFactory>>,

Review Comment:
   Another option I just thought of: create a new struct (i.e. `BallistaSettings`) which wraps `BallistaConfig` but also adds `concurrent_tasks` and `table_factories` fields.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] r4ntix commented on pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
r4ntix commented on PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#issuecomment-1312965483

   > > is it possible to support CatalogProvider in SessionBuilder
   > 
   > I don't see why not. You could follow the patterns in this PR but with a CatalogProvider instead.
   
   Ok, I will try submit a RP for this, thx


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] andygrove commented on pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#issuecomment-1307643339

   I'm excited about this feature! I will test this out later this week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] yahoNanJing commented on pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#issuecomment-1308562263

   Thanks @avantgardnerio. I'll read carefully for this PR tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] andygrove commented on a diff in pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
andygrove commented on code in PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#discussion_r1018532142


##########
ballista/core/src/config.rs:
##########
@@ -87,6 +87,17 @@ impl BallistaConfigBuilder {
         Self { settings }
     }
 
+    pub fn load_env(&self) -> Self {
+        let mut settings = self.settings.clone();
+        if let Ok(it) = env::var("DATAFUSION_CATALOG_LOCATION") {
+            settings.insert("datafusion.catalog.location".to_string(), it);

Review Comment:
   The Flight SQL page in the user guide should be updated with information about these env vars, and the configs.md page should be updated with these DataFusion settings.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#issuecomment-1310457826

   > support custom `CatalogProvider`
   
   @r4ntix could you explain your use-case? It may already be covered. For example, when this gets merged, [deltatables](https://github.com/spaceandtimelabs/delta-rs/blob/cb553eac000ab491f8b553b8f3e20bd791dc10bf/rust/src/delta_datafusion.rs#L834) should work out of the box, as long as there are `TableProviderFactory`s registered for them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] r4ntix commented on pull request #501: Automatically register tables if env var specified

Posted by GitBox <gi...@apache.org>.
r4ntix commented on PR #501:
URL: https://github.com/apache/arrow-ballista/pull/501#issuecomment-1311298682

   > > support custom `CatalogProvider`
   > 
   > @r4ntix could you explain your use-case? It may already be covered. For example, when this gets merged, [deltatables](https://github.com/spaceandtimelabs/delta-rs/blob/cb553eac000ab491f8b553b8f3e20bd791dc10bf/rust/src/delta_datafusion.rs#L834) should work out of the box, as long as there are `TableProviderFactory`s registered for them.
   
   @avantgardnerio according to the current implementation, the automatically registered tables are put into the `datafusion.default.{table}` datafusion schema. 
   In my case, for tenant isolation, I need to be able to dynamically register tables with different schema or catalog, eg: `datafusion.{tenant}.{table}` or `{tenant}.default.{table}`. 
   So is it possible to support `CatalogProvider` in `SessionBuilder` to support such case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org