You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/04/25 17:42:47 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6096: Support DROP SCHEMA statements

alamb commented on code in PR #6096:
URL: https://github.com/apache/arrow-datafusion/pull/6096#discussion_r1176689360


##########
datafusion/core/tests/sqllogictests/test_files/information_schema.slt:
##########
@@ -375,7 +375,7 @@ SHOW CREATE TABLE test.xyz
 ----
 datafusion test xyz CREATE VIEW test.xyz AS SELECT * FROM abc
 
-statement error DataFusion error: This feature is not implemented: Only `DROP TABLE/VIEW
+statement error DataFusion error: Execution error: Cannot drop schema test because other tables depend on it: xyz

Review Comment:
   👍 



##########
datafusion/core/src/execution/context.rs:
##########
@@ -676,6 +701,53 @@ impl SessionContext {
         Ok(false)
     }
 
+    /// Attempts to find a schema and deregister it. Returns a tuple of the schema and a
+    /// flag indicating whether dereg was performed (e.g if schema is found but has tables
+    /// then `cascade` must be set)
+    fn find_and_deregister_schema<'a>(

Review Comment:
   I wonder if different catalogs might want to implement the "is the schema empty" check differently  🤔  
   
   If so, it might make sense to put logic for "is the schema empty so can we drop it" in the `CatalogProvider`
   
   Perhaps if `CatalogProvider` had a signature like:
   ```
       fn deregister_schema(&self, name: &str, cascade: bool) -> Result<Option<Arc<dyn SchemaProvider>>> {
   ```
   
   I think this code would get significantly simpler as well as the `debug_assert`s to reassert what had just been validated would be unnecessary.
   
   That being said, we could also do such a change as a follow on PR



##########
datafusion/core/src/catalog/catalog.rs:
##########
@@ -124,6 +124,17 @@ pub trait CatalogProvider: Sync + Send {
             "Registering new schemas is not supported".to_string(),
         ))
     }
+
+    /// Remove schema from this catalog if it exists.
+    ///
+    /// By default returns a "Not Implemented" error
+    fn deregister_schema(&self, name: &str) -> Result<Option<Arc<dyn SchemaProvider>>> {
+        // use variables to avoid unused variable warnings

Review Comment:
   An alternative pattern is to name the variable with a prefix. Like`_`
   
   ```
       fn deregister_schema(&self, _name: &str) -> Result<Option<Arc<dyn SchemaProvider>>> {
   ```



##########
datafusion/core/src/execution/context.rs:
##########
@@ -2617,4 +2690,96 @@ mod tests {
                 .unwrap()
         }
     }
+
+    /// helper for the following drop schema tests

Review Comment:
   I think we could use sqllogic tests https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sqllogictests/ to write this more concisely
   
   For example, perhaps we could extend:
   
   https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sqllogictests/test_files/ddl.slt



##########
datafusion/common/src/table_reference.rs:
##########
@@ -450,3 +450,39 @@ mod tests {
         );
     }
 }
+
+#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)]

Review Comment:
   This is somewhat strange / non standard to have code below a `mod test` -- maybe it would be cleaner in its own module (e.g. `datafusion/common/src/schema_reference.rs`) perhaps



##########
datafusion/core/tests/sql/create_drop.rs:
##########
@@ -73,3 +73,40 @@ async fn create_external_table_with_ddl() -> Result<()> {
 
     Ok(())
 }
+
+#[tokio::test]

Review Comment:
   I recommend putting these tests in https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sqllogictests/test_files/ddl.slt rather than a rs test



##########
datafusion/core/src/execution/context.rs:
##########
@@ -447,6 +447,31 @@ impl SessionContext {
                 }
             }
 
+            LogicalPlan::DropCatalogSchema(DropCatalogSchema {
+                name,
+                if_exists,
+                cascade,
+                ..
+            }) => {
+                let (dereg, schema) = self.find_and_deregister_schema(&name, cascade)?;
+                match (if_exists, schema, dereg) {
+                    // successful deregsiter
+                    (_, _, true) => self.return_empty_dataframe(),
+                    // schema found but failed to deregister
+                    (_, Some(schema), false) => Err(DataFusionError::Execution(format!(
+                        "Cannot drop schema {} because other tables depend on it: {}",
+                        name,
+                        itertools::join(schema.table_names().iter(), ", ")
+                    ))),
+                    // no schema found
+                    (false, None, false) => Err(DataFusionError::Execution(format!(
+                        "Schema '{name}' doesn't exist."
+                    ))),
+                    // no schema found but dont return error

Review Comment:
   The is like `DROP SCHEMA foo IF EXISTS`
   
   This PR I think is consistent with what postgres does in this case
   
   ```
   postgres=# drop schema if exists foo;
   NOTICE:  schema "foo" does not exist, skipping
   DROP SCHEMA
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org