You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/04/25 17:42:47 UTC
[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6096: Support DROP SCHEMA statements
alamb commented on code in PR #6096:
URL: https://github.com/apache/arrow-datafusion/pull/6096#discussion_r1176689360
##########
datafusion/core/tests/sqllogictests/test_files/information_schema.slt:
##########
@@ -375,7 +375,7 @@ SHOW CREATE TABLE test.xyz
----
datafusion test xyz CREATE VIEW test.xyz AS SELECT * FROM abc
-statement error DataFusion error: This feature is not implemented: Only `DROP TABLE/VIEW
+statement error DataFusion error: Execution error: Cannot drop schema test because other tables depend on it: xyz
Review Comment:
👍
##########
datafusion/core/src/execution/context.rs:
##########
@@ -676,6 +701,53 @@ impl SessionContext {
Ok(false)
}
+ /// Attempts to find a schema and deregister it. Returns a tuple of the schema and a
+ /// flag indicating whether dereg was performed (e.g if schema is found but has tables
+ /// then `cascade` must be set)
+ fn find_and_deregister_schema<'a>(
Review Comment:
I wonder if different catalogs might want to implement the "is the schema empty" check differently 🤔
If so, it might make sense to put logic for "is the schema empty so can we drop it" in the `CatalogProvider`
Perhaps if `CatalogProvider` had a signature like:
```
fn deregister_schema(&self, name: &str, cascade: bool) -> Result<Option<Arc<dyn SchemaProvider>>> {
```
I think this code would get significantly simpler as well as the `debug_assert`s to reassert what had just been validated would be unnecessary.
That being said, we could also do such a change as a follow on PR
##########
datafusion/core/src/catalog/catalog.rs:
##########
@@ -124,6 +124,17 @@ pub trait CatalogProvider: Sync + Send {
"Registering new schemas is not supported".to_string(),
))
}
+
+ /// Remove schema from this catalog if it exists.
+ ///
+ /// By default returns a "Not Implemented" error
+ fn deregister_schema(&self, name: &str) -> Result<Option<Arc<dyn SchemaProvider>>> {
+ // use variables to avoid unused variable warnings
Review Comment:
An alternative pattern is to name the variable with a prefix. Like`_`
```
fn deregister_schema(&self, _name: &str) -> Result<Option<Arc<dyn SchemaProvider>>> {
```
##########
datafusion/core/src/execution/context.rs:
##########
@@ -2617,4 +2690,96 @@ mod tests {
.unwrap()
}
}
+
+ /// helper for the following drop schema tests
Review Comment:
I think we could use sqllogic tests https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sqllogictests/ to write this more concisely
For example, perhaps we could extend:
https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sqllogictests/test_files/ddl.slt
##########
datafusion/common/src/table_reference.rs:
##########
@@ -450,3 +450,39 @@ mod tests {
);
}
}
+
+#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)]
Review Comment:
This is somewhat strange / non standard to have code below a `mod test` -- maybe it would be cleaner in its own module (e.g. `datafusion/common/src/schema_reference.rs`) perhaps
##########
datafusion/core/tests/sql/create_drop.rs:
##########
@@ -73,3 +73,40 @@ async fn create_external_table_with_ddl() -> Result<()> {
Ok(())
}
+
+#[tokio::test]
Review Comment:
I recommend putting these tests in https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sqllogictests/test_files/ddl.slt rather than a rs test
##########
datafusion/core/src/execution/context.rs:
##########
@@ -447,6 +447,31 @@ impl SessionContext {
}
}
+ LogicalPlan::DropCatalogSchema(DropCatalogSchema {
+ name,
+ if_exists,
+ cascade,
+ ..
+ }) => {
+ let (dereg, schema) = self.find_and_deregister_schema(&name, cascade)?;
+ match (if_exists, schema, dereg) {
+ // successful deregsiter
+ (_, _, true) => self.return_empty_dataframe(),
+ // schema found but failed to deregister
+ (_, Some(schema), false) => Err(DataFusionError::Execution(format!(
+ "Cannot drop schema {} because other tables depend on it: {}",
+ name,
+ itertools::join(schema.table_names().iter(), ", ")
+ ))),
+ // no schema found
+ (false, None, false) => Err(DataFusionError::Execution(format!(
+ "Schema '{name}' doesn't exist."
+ ))),
+ // no schema found but dont return error
Review Comment:
The is like `DROP SCHEMA foo IF EXISTS`
This PR I think is consistent with what postgres does in this case
```
postgres=# drop schema if exists foo;
NOTICE: schema "foo" does not exist, skipping
DROP SCHEMA
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org