You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/08 18:06:46 UTC

[GitHub] [arrow-datafusion] matthewmturner opened a new pull request #1959: Add Create Schema functionality in SQL

matthewmturner opened a new pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959


   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Task 1 for #1877 
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959#issuecomment-1064464409


   This has surprisingly been a bit of a pain.  Currently having issues with getting the schema registration to work.  At the moment its not clear to me what the issue is.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959#issuecomment-1069113290


   @alamb i believe this is ready now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959#issuecomment-1067090119


   Ah need to do ballista update still


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959#discussion_r829069418



##########
File path: datafusion/src/execution/context.rs
##########
@@ -2907,6 +2947,29 @@ mod tests {
         assert_eq!(Weak::strong_count(&catalog_weak), 0);
     }
 
+    #[tokio::test]
+    async fn sql_create_schema() -> Result<()> {
+        // the information schema used to introduce cyclic Arcs

Review comment:
       not sure what this comment means

##########
File path: datafusion/src/execution/context.rs
##########
@@ -286,6 +285,39 @@ impl ExecutionContext {
                     Ok(Arc::new(DataFrame::new(self.state.clone(), &plan)))
                 }
             }
+            LogicalPlan::CreateCatalogSchema(CreateCatalogSchema {
+                schema_name,
+                if_not_exists,
+                ..
+            }) => {
+                // sqlparser doesnt accept database / catalog as parameter to CREATE SCHEMA
+                // so for now, we default to "datafusion" catalog
+                let default_catalog = "datafusion";
+                let catalog = self.catalog(default_catalog).ok_or_else(|| {
+                    DataFusionError::Execution(String::from(
+                        "Missing 'datafusion' catalog",
+                    ))
+                })?;
+
+                let schema = catalog.schema(&schema_name);
+
+                match (if_not_exists, schema) {

Review comment:
       👍 

##########
File path: datafusion/src/execution/context.rs
##########
@@ -286,6 +285,39 @@ impl ExecutionContext {
                     Ok(Arc::new(DataFrame::new(self.state.clone(), &plan)))
                 }
             }
+            LogicalPlan::CreateCatalogSchema(CreateCatalogSchema {
+                schema_name,
+                if_not_exists,
+                ..
+            }) => {
+                // sqlparser doesnt accept database / catalog as parameter to CREATE SCHEMA

Review comment:
       maybe worth a ticket to sqlparser to support this? Or maybe just a PR :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959#issuecomment-1063111000


   I'm working on this and im wondering why `register_schema` isnt a trait method for `CatalogProvider`.  For example `register_table` is part of `SchemaProvider` trait and i consider a schema to be the equivalent for a catalog.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959#discussion_r829069418



##########
File path: datafusion/src/execution/context.rs
##########
@@ -2907,6 +2947,29 @@ mod tests {
         assert_eq!(Weak::strong_count(&catalog_weak), 0);
     }
 
+    #[tokio::test]
+    async fn sql_create_schema() -> Result<()> {
+        // the information schema used to introduce cyclic Arcs

Review comment:
       not sure what this comment means

##########
File path: datafusion/src/execution/context.rs
##########
@@ -286,6 +285,39 @@ impl ExecutionContext {
                     Ok(Arc::new(DataFrame::new(self.state.clone(), &plan)))
                 }
             }
+            LogicalPlan::CreateCatalogSchema(CreateCatalogSchema {
+                schema_name,
+                if_not_exists,
+                ..
+            }) => {
+                // sqlparser doesnt accept database / catalog as parameter to CREATE SCHEMA
+                // so for now, we default to "datafusion" catalog
+                let default_catalog = "datafusion";
+                let catalog = self.catalog(default_catalog).ok_or_else(|| {
+                    DataFusionError::Execution(String::from(
+                        "Missing 'datafusion' catalog",
+                    ))
+                })?;
+
+                let schema = catalog.schema(&schema_name);
+
+                match (if_not_exists, schema) {

Review comment:
       👍 

##########
File path: datafusion/src/execution/context.rs
##########
@@ -286,6 +285,39 @@ impl ExecutionContext {
                     Ok(Arc::new(DataFrame::new(self.state.clone(), &plan)))
                 }
             }
+            LogicalPlan::CreateCatalogSchema(CreateCatalogSchema {
+                schema_name,
+                if_not_exists,
+                ..
+            }) => {
+                // sqlparser doesnt accept database / catalog as parameter to CREATE SCHEMA

Review comment:
       maybe worth a ticket to sqlparser to support this? Or maybe just a PR :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] doki23 commented on a change in pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
doki23 commented on a change in pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959#discussion_r828675617



##########
File path: datafusion/src/catalog/mod.rs
##########
@@ -105,7 +105,21 @@ impl<'a> TableReference<'a> {
 
 impl<'a> From<&'a str> for TableReference<'a> {
     fn from(s: &'a str) -> Self {
-        Self::Bare { table: s }
+        let parts: Vec<&str> = s.split('.').collect();
+
+        match parts.len() {
+            1 => Self::Bare { table: s },
+            2 => Self::Partial {
+                schema: parts[0],
+                table: parts[1],
+            },
+            3 => Self::Full {
+                catalog: parts[0],
+                schema: parts[1],
+                table: parts[2],
+            },
+            _ => Self::Bare { table: s },
+        }
     }

Review comment:
       👍🏻! When I implemented qualified wildcard, I realize I hope this function work as this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on a change in pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on a change in pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959#discussion_r829127891



##########
File path: datafusion/src/execution/context.rs
##########
@@ -2907,6 +2947,29 @@ mod tests {
         assert_eq!(Weak::strong_count(&catalog_weak), 0);
     }
 
+    #[tokio::test]
+    async fn sql_create_schema() -> Result<()> {
+        // the information schema used to introduce cyclic Arcs

Review comment:
       oops that may have been straggling comment from a copy paste, sry about that.  i can remove in subsequent PR




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on a change in pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on a change in pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959#discussion_r824179001



##########
File path: datafusion/src/execution/context.rs
##########
@@ -283,6 +282,61 @@ impl ExecutionContext {
                     Ok(Arc::new(DataFrameImpl::new(self.state.clone(), &plan)))
                 }
             }
+            LogicalPlan::CreateCatalogSchema(CreateCatalogSchema {
+                schema_name,
+                if_not_exists,
+                ..
+            }) => {
+                // sqlparser doesnt accept database / catalog as parameter to CREATE SCHEMA
+                // so for now, we default to "datafusion" catalog
+                let default_catalog = "datafusion";
+                let catalog = self.catalog(default_catalog).ok_or_else(|| {
+                    DataFusionError::Execution(String::from(
+                        "Missing 'datafusion' catalog",
+                    ))
+                })?;
+
+                let schema = catalog.schema(&schema_name);
+
+                match (if_not_exists, schema) {
+                    //
+                    (true, Some(_)) => {
+                        println!("Schema '{:?}' already exists", &schema_name);
+                        let plan = LogicalPlanBuilder::empty(false).build()?;
+                        Ok(Arc::new(DataFrameImpl::new(self.state.clone(), &plan)))
+                    }
+                    (true, None) | (false, None) => {
+                        println!("Creating schema {:?}", schema_name);
+                        let schema = Arc::new(MemorySchemaProvider::new());
+                        let plan = LogicalPlanBuilder::empty(false).build()?;
+                        schema.register_table(
+                            "test".into(),
+                            Arc::new(DataFrameImpl::new(self.state.clone(), &plan)),
+                        )?;
+                        let schem_reg_res = catalog.register_schema(&schema_name, schema);

Review comment:
       It's not clear to me why registering schema here isnt working. I reregister the catalog below just in case but still has no effect.  I wasnt expecting to have to reregister.  Still digging deep into the implementations to get better idea whats going on.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959#issuecomment-1067325547


   > @alamb do you have preference on whether to leave this PR as is or if i can also add CREATE CATALOG on here?
   
   I suggest we do it in a follow on PR to keep this one smaller. 
   
   > I see sql-parser has a CreateDatabase statement already that i was thinking i could use. Or do you think that would get confusing? i.e. CREATE DATABASE would call ctx.register_catalog and then be visible as a catalog
   
   seems reasonable
   
   (sorry for the delayed replies -- I am only inline intermittently this week)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] doki23 commented on a change in pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
doki23 commented on a change in pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959#discussion_r828675617



##########
File path: datafusion/src/catalog/mod.rs
##########
@@ -105,7 +105,21 @@ impl<'a> TableReference<'a> {
 
 impl<'a> From<&'a str> for TableReference<'a> {
     fn from(s: &'a str) -> Self {
-        Self::Bare { table: s }
+        let parts: Vec<&str> = s.split('.').collect();
+
+        match parts.len() {
+            1 => Self::Bare { table: s },
+            2 => Self::Partial {
+                schema: parts[0],
+                table: parts[1],
+            },
+            3 => Self::Full {
+                catalog: parts[0],
+                schema: parts[1],
+                table: parts[2],
+            },
+            _ => Self::Bare { table: s },
+        }
     }

Review comment:
       👍🏻! When I implemented qualified wildcard, I realize I hope the function work as this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on a change in pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on a change in pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959#discussion_r829127891



##########
File path: datafusion/src/execution/context.rs
##########
@@ -2907,6 +2947,29 @@ mod tests {
         assert_eq!(Weak::strong_count(&catalog_weak), 0);
     }
 
+    #[tokio::test]
+    async fn sql_create_schema() -> Result<()> {
+        // the information schema used to introduce cyclic Arcs

Review comment:
       oops that may have been straggling comment from a copy paste, sry about that.  i can remove in subsequent PR




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on a change in pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on a change in pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959#discussion_r824183466



##########
File path: datafusion/src/execution/context.rs
##########
@@ -283,6 +282,61 @@ impl ExecutionContext {
                     Ok(Arc::new(DataFrameImpl::new(self.state.clone(), &plan)))
                 }
             }
+            LogicalPlan::CreateCatalogSchema(CreateCatalogSchema {
+                schema_name,
+                if_not_exists,
+                ..
+            }) => {
+                // sqlparser doesnt accept database / catalog as parameter to CREATE SCHEMA
+                // so for now, we default to "datafusion" catalog
+                let default_catalog = "datafusion";
+                let catalog = self.catalog(default_catalog).ok_or_else(|| {
+                    DataFusionError::Execution(String::from(
+                        "Missing 'datafusion' catalog",
+                    ))
+                })?;
+
+                let schema = catalog.schema(&schema_name);
+
+                match (if_not_exists, schema) {
+                    //
+                    (true, Some(_)) => {
+                        println!("Schema '{:?}' already exists", &schema_name);
+                        let plan = LogicalPlanBuilder::empty(false).build()?;
+                        Ok(Arc::new(DataFrameImpl::new(self.state.clone(), &plan)))
+                    }
+                    (true, None) | (false, None) => {
+                        println!("Creating schema {:?}", schema_name);
+                        let schema = Arc::new(MemorySchemaProvider::new());
+                        let plan = LogicalPlanBuilder::empty(false).build()?;
+                        schema.register_table(
+                            "test".into(),
+                            Arc::new(DataFrameImpl::new(self.state.clone(), &plan)),
+                        )?;
+                        let schem_reg_res = catalog.register_schema(&schema_name, schema);

Review comment:
       (i know theres a lot of fluff right now that im using for debugging purposes)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] doki23 commented on a change in pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
doki23 commented on a change in pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959#discussion_r828675617



##########
File path: datafusion/src/catalog/mod.rs
##########
@@ -105,7 +105,21 @@ impl<'a> TableReference<'a> {
 
 impl<'a> From<&'a str> for TableReference<'a> {
     fn from(s: &'a str) -> Self {
-        Self::Bare { table: s }
+        let parts: Vec<&str> = s.split('.').collect();
+
+        match parts.len() {
+            1 => Self::Bare { table: s },
+            2 => Self::Partial {
+                schema: parts[0],
+                table: parts[1],
+            },
+            3 => Self::Full {
+                catalog: parts[0],
+                schema: parts[1],
+                table: parts[2],
+            },
+            _ => Self::Bare { table: s },
+        }
     }

Review comment:
       👍🏻! It's helpful to me!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1959: Add Create Schema functionality in SQL

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1959:
URL: https://github.com/apache/arrow-datafusion/pull/1959#issuecomment-1067011394


   Got this working.  Just need to resolve conflicts.
   
   @alamb do you have preference on whether to leave this PR as is or if i can also add `CREATE CATALOG` on here?  I see sql-parser has a `CreateDatabase` statement already that i was thinking i could use.  Or do you think that would get confusing? i.e. `CREATE DATABASE` would call `ctx.register_catalog` and then be visible as a catalog.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org