You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "avantgardnerio (via GitHub)" <gi...@apache.org> on 2023/04/03 00:12:48 UTC

[GitHub] [arrow-datafusion] avantgardnerio opened a new pull request, #5835: Bg pks

avantgardnerio opened a new pull request, #5835:
URL: https://github.com/apache/arrow-datafusion/pull/5835

   # Which issue does this PR close?
   
   Closes #5834.
   
   # Rationale for this change
   
   As a developer building a database atop DataFusion, it would be useful to have knowledge of primary keys retained in the LogicalPlan.
   
   # What changes are included in this PR?
   
   A new `primary_key` field on `LogicalPlan::CreateMemoryTable`
   
   # Are these changes tested?
   
   Sort of - there's a test that broke before, but I couldn't figure out what prints this LogicalPlan node, so I wasn't able to add the pk info to the `Debug` trait and assert that.
   
   # Are there any user-facing changes?
   
   `create table` statements that used to fail (e.g. with `primary key` clauses) will now not fail. I'm not sure if that's an issue for anyone.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #5835: Add primary key information to CreateMemoryTable LogicalPlan node

Posted by "avantgardnerio (via GitHub)" <gi...@apache.org>.
avantgardnerio commented on code in PR #5835:
URL: https://github.com/apache/arrow-datafusion/pull/5835#discussion_r1156013972


##########
datafusion/sql/src/statement.rs:
##########
@@ -128,69 +128,91 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 if_not_exists,
                 or_replace,
                 ..
-            } if constraints.is_empty()
-                && table_properties.is_empty()
-                && with_options.is_empty() =>
-            {
-                match query {
-                    Some(query) => {
-                        let plan = self.query_to_plan(*query, planner_context)?;
-                        let input_schema = plan.schema();
-
-                        let plan = if !columns.is_empty() {
-                            let schema = self.build_schema(columns)?.to_dfschema_ref()?;
-                            if schema.fields().len() != input_schema.fields().len() {
-                                return Err(DataFusionError::Plan(format!(
+            } if table_properties.is_empty() && with_options.is_empty() => match query {
+                Some(query) => {
+                    let plan = self.query_to_plan(*query, planner_context)?;
+                    let input_schema = plan.schema();
+
+                    let plan = if !columns.is_empty() {
+                        let schema = self.build_schema(columns)?.to_dfschema_ref()?;
+                        if schema.fields().len() != input_schema.fields().len() {
+                            return Err(DataFusionError::Plan(format!(
                             "Mismatch: {} columns specified, but result has {} columns",
                             schema.fields().len(),
                             input_schema.fields().len()
                         )));
-                            }
-                            let input_fields = input_schema.fields();
-                            let project_exprs = schema
-                                .fields()
-                                .iter()
-                                .zip(input_fields)
-                                .map(|(field, input_field)| {
-                                    cast(
-                                        col(input_field.name()),
-                                        field.data_type().clone(),
-                                    )
+                        }
+                        let input_fields = input_schema.fields();
+                        let project_exprs = schema
+                            .fields()
+                            .iter()
+                            .zip(input_fields)
+                            .map(|(field, input_field)| {
+                                cast(col(input_field.name()), field.data_type().clone())
                                     .alias(field.name())
-                                })
-                                .collect::<Vec<_>>();
-                            LogicalPlanBuilder::from(plan.clone())
-                                .project(project_exprs)?
-                                .build()?
-                        } else {
-                            plan
-                        };
-
-                        Ok(LogicalPlan::CreateMemoryTable(CreateMemoryTable {
-                            name: self.object_name_to_table_reference(name)?,
-                            input: Arc::new(plan),
-                            if_not_exists,
-                            or_replace,
-                        }))
-                    }
+                            })
+                            .collect::<Vec<_>>();
+                        LogicalPlanBuilder::from(plan.clone())
+                            .project(project_exprs)?
+                            .build()?
+                    } else {
+                        plan
+                    };
+
+                    Ok(LogicalPlan::CreateMemoryTable(CreateMemoryTable {

Review Comment:
   I'll ignore for this PR, and assume that the larger discussions will result in consensus about what to name it and where it goes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #5835: Add primary key information to CreateMemoryTable LogicalPlan node

Posted by "avantgardnerio (via GitHub)" <gi...@apache.org>.
avantgardnerio commented on code in PR #5835:
URL: https://github.com/apache/arrow-datafusion/pull/5835#discussion_r1156106254


##########
datafusion/sql/tests/integration_test.rs:
##########
@@ -199,6 +199,17 @@ fn cast_to_invalid_decimal_type() {
     }
 }
 
+#[test]
+fn plan_create_table_with_pk() {
+    let sql = "create table person (id int, name string, primary key(id))";
+    let plan = r#"
+CreateMemoryTable: Bare { table: "person" }

Review Comment:
   That worked, ty!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] yjshen commented on a diff in pull request #5835: Add primary key information to CreateMemoryTable LogicalPlan node

Posted by "yjshen (via GitHub)" <gi...@apache.org>.
yjshen commented on code in PR #5835:
URL: https://github.com/apache/arrow-datafusion/pull/5835#discussion_r1155614711


##########
datafusion/sql/src/statement.rs:
##########
@@ -128,69 +128,91 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 if_not_exists,
                 or_replace,
                 ..
-            } if constraints.is_empty()
-                && table_properties.is_empty()
-                && with_options.is_empty() =>
-            {
-                match query {
-                    Some(query) => {
-                        let plan = self.query_to_plan(*query, planner_context)?;
-                        let input_schema = plan.schema();
-
-                        let plan = if !columns.is_empty() {
-                            let schema = self.build_schema(columns)?.to_dfschema_ref()?;
-                            if schema.fields().len() != input_schema.fields().len() {
-                                return Err(DataFusionError::Plan(format!(
+            } if table_properties.is_empty() && with_options.is_empty() => match query {
+                Some(query) => {
+                    let plan = self.query_to_plan(*query, planner_context)?;
+                    let input_schema = plan.schema();
+
+                    let plan = if !columns.is_empty() {
+                        let schema = self.build_schema(columns)?.to_dfschema_ref()?;
+                        if schema.fields().len() != input_schema.fields().len() {
+                            return Err(DataFusionError::Plan(format!(
                             "Mismatch: {} columns specified, but result has {} columns",
                             schema.fields().len(),
                             input_schema.fields().len()
                         )));
-                            }
-                            let input_fields = input_schema.fields();
-                            let project_exprs = schema
-                                .fields()
-                                .iter()
-                                .zip(input_fields)
-                                .map(|(field, input_field)| {
-                                    cast(
-                                        col(input_field.name()),
-                                        field.data_type().clone(),
-                                    )
+                        }
+                        let input_fields = input_schema.fields();
+                        let project_exprs = schema
+                            .fields()
+                            .iter()
+                            .zip(input_fields)
+                            .map(|(field, input_field)| {
+                                cast(col(input_field.name()), field.data_type().clone())
                                     .alias(field.name())
-                                })
-                                .collect::<Vec<_>>();
-                            LogicalPlanBuilder::from(plan.clone())
-                                .project(project_exprs)?
-                                .build()?
-                        } else {
-                            plan
-                        };
-
-                        Ok(LogicalPlan::CreateMemoryTable(CreateMemoryTable {
-                            name: self.object_name_to_table_reference(name)?,
-                            input: Arc::new(plan),
-                            if_not_exists,
-                            or_replace,
-                        }))
-                    }
+                            })
+                            .collect::<Vec<_>>();
+                        LogicalPlanBuilder::from(plan.clone())
+                            .project(project_exprs)?
+                            .build()?
+                    } else {
+                        plan
+                    };
+
+                    Ok(LogicalPlan::CreateMemoryTable(CreateMemoryTable {
+                        name: self.object_name_to_table_reference(name)?,
+                        primary_key: vec![],

Review Comment:
   Should we also consider the `TableContraint` here?



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1490,6 +1490,8 @@ pub struct Union {
 pub struct CreateMemoryTable {
     /// The table name
     pub name: OwnedTableReference,
+    /// The ordered list of columns in the primary key, or an empty vector if none
+    pub primary_key: Vec<Column>,

Review Comment:
   I think we might not be able to get the PK index ordering from AST.



##########
datafusion/sql/tests/integration_test.rs:
##########
@@ -199,6 +199,17 @@ fn cast_to_invalid_decimal_type() {
     }
 }
 
+#[test]
+fn plan_create_table_with_pk() {
+    let sql = "create table person (id int, name string, primary key(id))";
+    let plan = r#"
+CreateMemoryTable: Bare { table: "person" }

Review Comment:
   I think it's https://github.com/apache/arrow-datafusion/blob/main/datafusion/expr/src/logical_plan/plan.rs#L1051
   and 
   https://github.com/apache/arrow-datafusion/blob/main/datafusion/common/src/table_reference.rs#L110 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5835: Add primary key information to CreateMemoryTable LogicalPlan node

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5835:
URL: https://github.com/apache/arrow-datafusion/pull/5835#discussion_r1155825377


##########
datafusion/sql/src/statement.rs:
##########
@@ -128,69 +128,91 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 if_not_exists,
                 or_replace,
                 ..
-            } if constraints.is_empty()
-                && table_properties.is_empty()
-                && with_options.is_empty() =>
-            {
-                match query {
-                    Some(query) => {
-                        let plan = self.query_to_plan(*query, planner_context)?;
-                        let input_schema = plan.schema();
-
-                        let plan = if !columns.is_empty() {
-                            let schema = self.build_schema(columns)?.to_dfschema_ref()?;
-                            if schema.fields().len() != input_schema.fields().len() {
-                                return Err(DataFusionError::Plan(format!(
+            } if table_properties.is_empty() && with_options.is_empty() => match query {
+                Some(query) => {
+                    let plan = self.query_to_plan(*query, planner_context)?;
+                    let input_schema = plan.schema();
+
+                    let plan = if !columns.is_empty() {
+                        let schema = self.build_schema(columns)?.to_dfschema_ref()?;
+                        if schema.fields().len() != input_schema.fields().len() {
+                            return Err(DataFusionError::Plan(format!(
                             "Mismatch: {} columns specified, but result has {} columns",
                             schema.fields().len(),
                             input_schema.fields().len()
                         )));
-                            }
-                            let input_fields = input_schema.fields();
-                            let project_exprs = schema
-                                .fields()
-                                .iter()
-                                .zip(input_fields)
-                                .map(|(field, input_field)| {
-                                    cast(
-                                        col(input_field.name()),
-                                        field.data_type().clone(),
-                                    )
+                        }
+                        let input_fields = input_schema.fields();
+                        let project_exprs = schema
+                            .fields()
+                            .iter()
+                            .zip(input_fields)
+                            .map(|(field, input_field)| {
+                                cast(col(input_field.name()), field.data_type().clone())
                                     .alias(field.name())
-                                })
-                                .collect::<Vec<_>>();
-                            LogicalPlanBuilder::from(plan.clone())
-                                .project(project_exprs)?
-                                .build()?
-                        } else {
-                            plan
-                        };
-
-                        Ok(LogicalPlan::CreateMemoryTable(CreateMemoryTable {
-                            name: self.object_name_to_table_reference(name)?,
-                            input: Arc::new(plan),
-                            if_not_exists,
-                            or_replace,
-                        }))
-                    }
+                            })
+                            .collect::<Vec<_>>();
+                        LogicalPlanBuilder::from(plan.clone())
+                            .project(project_exprs)?
+                            .build()?
+                    } else {
+                        plan
+                    };
+
+                    Ok(LogicalPlan::CreateMemoryTable(CreateMemoryTable {

Review Comment:
   I agree -- it may be time to do https://github.com/apache/arrow-datafusion/issues/3349 more seriously (aka extract the DDL statements out) . I started with https://github.com/apache/arrow-datafusion/pull/5842



##########
datafusion/sql/src/statement.rs:
##########
@@ -128,69 +128,91 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 if_not_exists,
                 or_replace,
                 ..
-            } if constraints.is_empty()
-                && table_properties.is_empty()
-                && with_options.is_empty() =>
-            {
-                match query {
-                    Some(query) => {
-                        let plan = self.query_to_plan(*query, planner_context)?;
-                        let input_schema = plan.schema();
-
-                        let plan = if !columns.is_empty() {
-                            let schema = self.build_schema(columns)?.to_dfschema_ref()?;
-                            if schema.fields().len() != input_schema.fields().len() {
-                                return Err(DataFusionError::Plan(format!(
+            } if table_properties.is_empty() && with_options.is_empty() => match query {
+                Some(query) => {
+                    let plan = self.query_to_plan(*query, planner_context)?;
+                    let input_schema = plan.schema();
+
+                    let plan = if !columns.is_empty() {
+                        let schema = self.build_schema(columns)?.to_dfschema_ref()?;
+                        if schema.fields().len() != input_schema.fields().len() {
+                            return Err(DataFusionError::Plan(format!(
                             "Mismatch: {} columns specified, but result has {} columns",
                             schema.fields().len(),
                             input_schema.fields().len()
                         )));
-                            }
-                            let input_fields = input_schema.fields();
-                            let project_exprs = schema
-                                .fields()
-                                .iter()
-                                .zip(input_fields)
-                                .map(|(field, input_field)| {
-                                    cast(
-                                        col(input_field.name()),
-                                        field.data_type().clone(),
-                                    )
+                        }
+                        let input_fields = input_schema.fields();
+                        let project_exprs = schema
+                            .fields()
+                            .iter()
+                            .zip(input_fields)
+                            .map(|(field, input_field)| {
+                                cast(col(input_field.name()), field.data_type().clone())
                                     .alias(field.name())
-                                })
-                                .collect::<Vec<_>>();
-                            LogicalPlanBuilder::from(plan.clone())
-                                .project(project_exprs)?
-                                .build()?
-                        } else {
-                            plan
-                        };
-
-                        Ok(LogicalPlan::CreateMemoryTable(CreateMemoryTable {
-                            name: self.object_name_to_table_reference(name)?,
-                            input: Arc::new(plan),
-                            if_not_exists,
-                            or_replace,
-                        }))
-                    }
+                            })
+                            .collect::<Vec<_>>();
+                        LogicalPlanBuilder::from(plan.clone())
+                            .project(project_exprs)?
+                            .build()?
+                    } else {
+                        plan
+                    };
+
+                    Ok(LogicalPlan::CreateMemoryTable(CreateMemoryTable {
+                        name: self.object_name_to_table_reference(name)?,
+                        primary_key: vec![],
+                        input: Arc::new(plan),
+                        if_not_exists,
+                        or_replace,
+                    }))
+                }
 
-                    None => {
-                        let schema = self.build_schema(columns)?.to_dfschema_ref()?;
-                        let plan = EmptyRelation {
-                            produce_one_row: false,
-                            schema,
-                        };
-                        let plan = LogicalPlan::EmptyRelation(plan);
-
-                        Ok(LogicalPlan::CreateMemoryTable(CreateMemoryTable {
-                            name: self.object_name_to_table_reference(name)?,
-                            input: Arc::new(plan),
-                            if_not_exists,
-                            or_replace,
-                        }))
-                    }
+                None => {
+                    let pk: Option<Vec<Ident>> = constraints
+                        .iter()
+                        .filter_map(|c: &TableConstraint| match c {
+                            TableConstraint::Unique {
+                                columns,
+                                is_primary,
+                                ..
+                            } => match is_primary {
+                                true => Some(columns),
+                                false => None,
+                            },
+                            TableConstraint::ForeignKey { .. } => None,
+                            TableConstraint::Check { .. } => None,
+                            TableConstraint::Index { .. } => None,
+                            TableConstraint::FulltextOrSpatial { .. } => None,

Review Comment:
   Yes I think throwing an error is better than silently ignoring them



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #5835: Add primary key information to CreateMemoryTable LogicalPlan node

Posted by "avantgardnerio (via GitHub)" <gi...@apache.org>.
avantgardnerio commented on code in PR #5835:
URL: https://github.com/apache/arrow-datafusion/pull/5835#discussion_r1155399820


##########
datafusion/sql/src/statement.rs:
##########
@@ -128,69 +128,91 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 if_not_exists,
                 or_replace,
                 ..
-            } if constraints.is_empty()
-                && table_properties.is_empty()
-                && with_options.is_empty() =>
-            {
-                match query {
-                    Some(query) => {
-                        let plan = self.query_to_plan(*query, planner_context)?;
-                        let input_schema = plan.schema();
-
-                        let plan = if !columns.is_empty() {
-                            let schema = self.build_schema(columns)?.to_dfschema_ref()?;
-                            if schema.fields().len() != input_schema.fields().len() {
-                                return Err(DataFusionError::Plan(format!(
+            } if table_properties.is_empty() && with_options.is_empty() => match query {
+                Some(query) => {
+                    let plan = self.query_to_plan(*query, planner_context)?;
+                    let input_schema = plan.schema();
+
+                    let plan = if !columns.is_empty() {
+                        let schema = self.build_schema(columns)?.to_dfschema_ref()?;
+                        if schema.fields().len() != input_schema.fields().len() {
+                            return Err(DataFusionError::Plan(format!(
                             "Mismatch: {} columns specified, but result has {} columns",
                             schema.fields().len(),
                             input_schema.fields().len()
                         )));
-                            }
-                            let input_fields = input_schema.fields();
-                            let project_exprs = schema
-                                .fields()
-                                .iter()
-                                .zip(input_fields)
-                                .map(|(field, input_field)| {
-                                    cast(
-                                        col(input_field.name()),
-                                        field.data_type().clone(),
-                                    )
+                        }
+                        let input_fields = input_schema.fields();
+                        let project_exprs = schema
+                            .fields()
+                            .iter()
+                            .zip(input_fields)
+                            .map(|(field, input_field)| {
+                                cast(col(input_field.name()), field.data_type().clone())
                                     .alias(field.name())
-                                })
-                                .collect::<Vec<_>>();
-                            LogicalPlanBuilder::from(plan.clone())
-                                .project(project_exprs)?
-                                .build()?
-                        } else {
-                            plan
-                        };
-
-                        Ok(LogicalPlan::CreateMemoryTable(CreateMemoryTable {
-                            name: self.object_name_to_table_reference(name)?,
-                            input: Arc::new(plan),
-                            if_not_exists,
-                            or_replace,
-                        }))
-                    }
+                            })
+                            .collect::<Vec<_>>();
+                        LogicalPlanBuilder::from(plan.clone())
+                            .project(project_exprs)?
+                            .build()?
+                    } else {
+                        plan
+                    };
+
+                    Ok(LogicalPlan::CreateMemoryTable(CreateMemoryTable {

Review Comment:
   Should we rename `CreateMemoryTable` to just `CreateTable`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #5835: Add primary key information to CreateMemoryTable LogicalPlan node

Posted by "avantgardnerio (via GitHub)" <gi...@apache.org>.
avantgardnerio commented on code in PR #5835:
URL: https://github.com/apache/arrow-datafusion/pull/5835#discussion_r1156107453


##########
datafusion/sql/tests/integration_test.rs:
##########
@@ -199,6 +199,36 @@ fn cast_to_invalid_decimal_type() {
     }
 }
 
+#[test]
+fn plan_create_table_with_pk() {
+    let sql = "create table person (id int, name string, primary key(id))";
+    let plan = r#"
+CreateMemoryTable: Bare { table: "person" } primary_key=[id]

Review Comment:
   Should show PK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #5835: Add primary key information to CreateMemoryTable LogicalPlan node

Posted by "avantgardnerio (via GitHub)" <gi...@apache.org>.
avantgardnerio commented on code in PR #5835:
URL: https://github.com/apache/arrow-datafusion/pull/5835#discussion_r1156107738


##########
datafusion/sql/tests/integration_test.rs:
##########
@@ -199,6 +199,36 @@ fn cast_to_invalid_decimal_type() {
     }
 }
 
+#[test]
+fn plan_create_table_with_pk() {
+    let sql = "create table person (id int, name string, primary key(id))";
+    let plan = r#"
+CreateMemoryTable: Bare { table: "person" } primary_key=[id]
+  EmptyRelation
+    "#
+    .trim();
+    quick_test(sql, plan);
+}
+
+#[test]
+fn plan_create_table_no_pk() {
+    let sql = "create table person (id int, name string)";
+    let plan = r#"
+CreateMemoryTable: Bare { table: "person" }

Review Comment:
   Does not change existing behavior if no PK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #5835: Add primary key information to CreateMemoryTable LogicalPlan node

Posted by "avantgardnerio (via GitHub)" <gi...@apache.org>.
avantgardnerio commented on code in PR #5835:
URL: https://github.com/apache/arrow-datafusion/pull/5835#discussion_r1155399901


##########
datafusion/sql/src/statement.rs:
##########
@@ -128,69 +128,91 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                 if_not_exists,
                 or_replace,
                 ..
-            } if constraints.is_empty()
-                && table_properties.is_empty()
-                && with_options.is_empty() =>
-            {
-                match query {
-                    Some(query) => {
-                        let plan = self.query_to_plan(*query, planner_context)?;
-                        let input_schema = plan.schema();
-
-                        let plan = if !columns.is_empty() {
-                            let schema = self.build_schema(columns)?.to_dfschema_ref()?;
-                            if schema.fields().len() != input_schema.fields().len() {
-                                return Err(DataFusionError::Plan(format!(
+            } if table_properties.is_empty() && with_options.is_empty() => match query {
+                Some(query) => {
+                    let plan = self.query_to_plan(*query, planner_context)?;
+                    let input_schema = plan.schema();
+
+                    let plan = if !columns.is_empty() {
+                        let schema = self.build_schema(columns)?.to_dfschema_ref()?;
+                        if schema.fields().len() != input_schema.fields().len() {
+                            return Err(DataFusionError::Plan(format!(
                             "Mismatch: {} columns specified, but result has {} columns",
                             schema.fields().len(),
                             input_schema.fields().len()
                         )));
-                            }
-                            let input_fields = input_schema.fields();
-                            let project_exprs = schema
-                                .fields()
-                                .iter()
-                                .zip(input_fields)
-                                .map(|(field, input_field)| {
-                                    cast(
-                                        col(input_field.name()),
-                                        field.data_type().clone(),
-                                    )
+                        }
+                        let input_fields = input_schema.fields();
+                        let project_exprs = schema
+                            .fields()
+                            .iter()
+                            .zip(input_fields)
+                            .map(|(field, input_field)| {
+                                cast(col(input_field.name()), field.data_type().clone())
                                     .alias(field.name())
-                                })
-                                .collect::<Vec<_>>();
-                            LogicalPlanBuilder::from(plan.clone())
-                                .project(project_exprs)?
-                                .build()?
-                        } else {
-                            plan
-                        };
-
-                        Ok(LogicalPlan::CreateMemoryTable(CreateMemoryTable {
-                            name: self.object_name_to_table_reference(name)?,
-                            input: Arc::new(plan),
-                            if_not_exists,
-                            or_replace,
-                        }))
-                    }
+                            })
+                            .collect::<Vec<_>>();
+                        LogicalPlanBuilder::from(plan.clone())
+                            .project(project_exprs)?
+                            .build()?
+                    } else {
+                        plan
+                    };
+
+                    Ok(LogicalPlan::CreateMemoryTable(CreateMemoryTable {
+                        name: self.object_name_to_table_reference(name)?,
+                        primary_key: vec![],
+                        input: Arc::new(plan),
+                        if_not_exists,
+                        or_replace,
+                    }))
+                }
 
-                    None => {
-                        let schema = self.build_schema(columns)?.to_dfschema_ref()?;
-                        let plan = EmptyRelation {
-                            produce_one_row: false,
-                            schema,
-                        };
-                        let plan = LogicalPlan::EmptyRelation(plan);
-
-                        Ok(LogicalPlan::CreateMemoryTable(CreateMemoryTable {
-                            name: self.object_name_to_table_reference(name)?,
-                            input: Arc::new(plan),
-                            if_not_exists,
-                            or_replace,
-                        }))
-                    }
+                None => {
+                    let pk: Option<Vec<Ident>> = constraints
+                        .iter()
+                        .filter_map(|c: &TableConstraint| match c {
+                            TableConstraint::Unique {
+                                columns,
+                                is_primary,
+                                ..
+                            } => match is_primary {
+                                true => Some(columns),
+                                false => None,
+                            },
+                            TableConstraint::ForeignKey { .. } => None,
+                            TableConstraint::Check { .. } => None,
+                            TableConstraint::Index { .. } => None,
+                            TableConstraint::FulltextOrSpatial { .. } => None,

Review Comment:
   Should I throw the original error if these are set?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #5835: Add primary key information to CreateMemoryTable LogicalPlan node

Posted by "avantgardnerio (via GitHub)" <gi...@apache.org>.
avantgardnerio commented on code in PR #5835:
URL: https://github.com/apache/arrow-datafusion/pull/5835#discussion_r1156108242


##########
datafusion/sql/tests/integration_test.rs:
##########
@@ -199,6 +199,36 @@ fn cast_to_invalid_decimal_type() {
     }
 }
 
+#[test]
+fn plan_create_table_with_pk() {
+    let sql = "create table person (id int, name string, primary key(id))";
+    let plan = r#"
+CreateMemoryTable: Bare { table: "person" } primary_key=[id]
+  EmptyRelation
+    "#
+    .trim();
+    quick_test(sql, plan);
+}
+
+#[test]
+fn plan_create_table_no_pk() {
+    let sql = "create table person (id int, name string)";
+    let plan = r#"
+CreateMemoryTable: Bare { table: "person" }
+  EmptyRelation
+    "#
+    .trim();
+    quick_test(sql, plan);
+}
+
+#[test]
+#[should_panic(expected = "Non-primary unique constraints are not supported")]

Review Comment:
   Unhanded syntax causes errors.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #5835: Add primary key information to CreateMemoryTable LogicalPlan node

Posted by "avantgardnerio (via GitHub)" <gi...@apache.org>.
avantgardnerio commented on code in PR #5835:
URL: https://github.com/apache/arrow-datafusion/pull/5835#discussion_r1155399682


##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1490,6 +1490,8 @@ pub struct Union {
 pub struct CreateMemoryTable {
     /// The table name
     pub name: OwnedTableReference,
+    /// The ordered list of columns in the primary key, or an empty vector if none
+    pub primary_key: Vec<Column>,

Review Comment:
   TODO: asc, desc



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #5835: Add primary key information to CreateMemoryTable LogicalPlan node

Posted by "avantgardnerio (via GitHub)" <gi...@apache.org>.
avantgardnerio commented on code in PR #5835:
URL: https://github.com/apache/arrow-datafusion/pull/5835#discussion_r1155400105


##########
datafusion/sql/tests/integration_test.rs:
##########
@@ -199,6 +199,17 @@ fn cast_to_invalid_decimal_type() {
     }
 }
 
+#[test]
+fn plan_create_table_with_pk() {
+    let sql = "create table person (id int, name string, primary key(id))";
+    let plan = r#"
+CreateMemoryTable: Bare { table: "person" }

Review Comment:
   I'd like to change this to `CreateMemoryTable: Bare { table: "person", primary_key="x, y" }`, but I couldn't figure out where `Debug` is `impl`ed for `CreateMemoryTable`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio merged pull request #5835: Add primary key information to CreateMemoryTable LogicalPlan node

Posted by "avantgardnerio (via GitHub)" <gi...@apache.org>.
avantgardnerio merged PR #5835:
URL: https://github.com/apache/arrow-datafusion/pull/5835


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #5835: Add primary key information to CreateMemoryTable LogicalPlan node

Posted by "avantgardnerio (via GitHub)" <gi...@apache.org>.
avantgardnerio commented on code in PR #5835:
URL: https://github.com/apache/arrow-datafusion/pull/5835#discussion_r1156012986


##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1490,6 +1490,8 @@ pub struct Union {
 pub struct CreateMemoryTable {
     /// The table name
     pub name: OwnedTableReference,
+    /// The ordered list of columns in the primary key, or an empty vector if none
+    pub primary_key: Vec<Column>,

Review Comment:
   I did some research: sqlparser is correct in this case, as `CreateTable` has columns that define the `unique constraint` that is the `primary key`. Constraints don't have ordering, indexes do. 
   
   I assume that if the users does a `create index` we will receive information about sort order of each column, but either way it's proper not to have it here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on pull request #5835: Add primary key information to CreateMemoryTable LogicalPlan node

Posted by "avantgardnerio (via GitHub)" <gi...@apache.org>.
avantgardnerio commented on PR #5835:
URL: https://github.com/apache/arrow-datafusion/pull/5835#issuecomment-1494518266

   Okay fellas, I think this is ready...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org