You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "Jefffrey (via GitHub)" <gi...@apache.org> on 2023/02/05 00:21:51 UTC

[GitHub] [arrow-datafusion] Jefffrey opened a new pull request, #5183: Parse identifiers properly for TableReferences

Jefffrey opened a new pull request, #5183:
URL: https://github.com/apache/arrow-datafusion/pull/5183

# Which issue does this PR close?

Closes #4532

# Rationale for this change

Be able to parse identifiers properly for TableReference:

- normalize unquoted identifiers
- able to parse quoted identifiers with special characters (e.g. period)

# What changes are included in this PR?

Change TableReference to hold Cows of its str's since need to allocate new String due to parsing (not to mention for normalization as well)
- need to fix usages since can't derive Copy anymore

Use SQL parser to parse the possible multipart identifier, which handles quoted identifiers correctly

# Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
-->

New unit test added

# Are there any user-facing changes?

TableReference public API is changed (`Cow<str>` instead of raw `&str`)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb merged pull request #5183: Parse identifiers properly for TableReferences

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb merged PR #5183:
URL: https://github.com/apache/arrow-datafusion/pull/5183


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] Jefffrey commented on a diff in pull request #5183: Parse identifiers properly for TableReferences

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.

Jefffrey commented on code in PR #5183:
URL: https://github.com/apache/arrow-datafusion/pull/5183#discussion_r1096603617


##########
datafusion/common/src/table_reference.rs:
##########
@@ -155,50 +178,110 @@ impl<'a> TableReference<'a> {
                 table,
             },
             Self::Partial { schema, table } => ResolvedTableReference {
-                catalog: default_catalog,
+                catalog: default_catalog.into(),
                 schema,
                 table,
             },
             Self::Bare { table } => ResolvedTableReference {
-                catalog: default_catalog,
-                schema: default_schema,
+                catalog: default_catalog.into(),
+                schema: default_schema.into(),
                 table,
             },
         }
     }
 
-    /// Forms a [`TableReference`] by splitting `s` on periods `.`.
-    ///
-    /// Note that this function does NOT handle periods or name
-    /// normalization correctly (e.g. `"foo.bar"` will be parsed as
-    /// `"foo`.`bar"`. and `Foo` will be parsed as `Foo` (not `foo`).
-    ///
-    /// If you need to handle such identifiers correctly, you should
-    /// use a SQL parser or form the [`OwnedTableReference`] directly.
+    /// Forms a [`TableReference`] by attempting to parse `s` as a multipart identifier,
+    /// failing that then taking the entire input as the identifier itself.
     ///
-    /// See more detail in <https://github.com/apache/arrow-datafusion/issues/4532>
+    /// Will normalize (convert to lowercase) any unquoted identifiers.
+    /// e.g. `Foo` will be parsed as `foo`, and `"Foo"".bar"` will be parsed as
+    /// `Foo".bar` (note the preserved case and requiring two double quotes to represent
+    /// a single double quote in the identifier)
     pub fn parse_str(s: &'a str) -> Self {
-        let parts: Vec<&str> = s.split('.').collect();
+        let mut parts = parse_identifiers(s)
+            .unwrap_or(vec![])
+            .into_iter()
+            .map(|id| match id.quote_style {
+                Some(_) => id.value,
+                // TODO: someway to be able to toggle this functionality?
+                None => id.value.to_ascii_lowercase(),
+            })
+            .collect::<Vec<_>>();
 
         match parts.len() {
-            1 => Self::Bare { table: s },
+            1 => Self::Bare {
+                table: parts.remove(0).into(),
+            },
             2 => Self::Partial {
-                schema: parts[0],
-                table: parts[1],
+                schema: parts.remove(0).into(),
+                table: parts.remove(0).into(),
             },
             3 => Self::Full {
-                catalog: parts[0],
-                schema: parts[1],
-                table: parts[2],
+                catalog: parts.remove(0).into(),
+                schema: parts.remove(0).into(),
+                table: parts.remove(0).into(),
             },
-            _ => Self::Bare { table: s },
+            // TODO: should normalize?
+            _ => Self::Bare { table: s.into() },
         }
     }
 }
 
-/// Parse a string into a TableReference, by splittig on `.`
+fn parse_identifiers(s: &str) -> Result<Vec<Ident>> {

Review Comment:
   could maybe upstream this whole function into sqlparser-rs instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5183: Parse identifiers properly for TableReferences

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on code in PR #5183:
URL: https://github.com/apache/arrow-datafusion/pull/5183#discussion_r1097919282


##########
datafusion/common/src/table_reference.rs:
##########
@@ -155,50 +178,110 @@ impl<'a> TableReference<'a> {
                 table,
             },
             Self::Partial { schema, table } => ResolvedTableReference {
-                catalog: default_catalog,
+                catalog: default_catalog.into(),
                 schema,
                 table,
             },
             Self::Bare { table } => ResolvedTableReference {
-                catalog: default_catalog,
-                schema: default_schema,
+                catalog: default_catalog.into(),
+                schema: default_schema.into(),
                 table,
             },
         }
     }
 
-    /// Forms a [`TableReference`] by splitting `s` on periods `.`.
-    ///
-    /// Note that this function does NOT handle periods or name
-    /// normalization correctly (e.g. `"foo.bar"` will be parsed as
-    /// `"foo`.`bar"`. and `Foo` will be parsed as `Foo` (not `foo`).
-    ///
-    /// If you need to handle such identifiers correctly, you should
-    /// use a SQL parser or form the [`OwnedTableReference`] directly.
+    /// Forms a [`TableReference`] by attempting to parse `s` as a multipart identifier,
+    /// failing that then taking the entire input as the identifier itself.
     ///
-    /// See more detail in <https://github.com/apache/arrow-datafusion/issues/4532>
+    /// Will normalize (convert to lowercase) any unquoted identifiers.
+    /// e.g. `Foo` will be parsed as `foo`, and `"Foo"".bar"` will be parsed as
+    /// `Foo".bar` (note the preserved case and requiring two double quotes to represent

Review Comment:
   No, I think I was confused. Sorry about that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #5183: Parse identifiers properly for TableReferences

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on PR #5183:
URL: https://github.com/apache/arrow-datafusion/pull/5183#issuecomment-1417458033

   > however even though using Cow now for TableReference so could own the string, it still carries lifetime requirements which would make it tedious to replace usages of OwnedTableReference with TableReference and having to ensure all the lifetime requirements, so will keep it around for now (unless theres some other solution)
   
   I think that makes sense for this PR. Thanks @Jefffrey 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] Jefffrey commented on a diff in pull request #5183: Parse identifiers properly for TableReferences

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.

Jefffrey commented on code in PR #5183:
URL: https://github.com/apache/arrow-datafusion/pull/5183#discussion_r1096603578


##########
datafusion/common/src/table_reference.rs:
##########
@@ -155,50 +178,110 @@ impl<'a> TableReference<'a> {
                 table,
             },
             Self::Partial { schema, table } => ResolvedTableReference {
-                catalog: default_catalog,
+                catalog: default_catalog.into(),
                 schema,
                 table,
             },
             Self::Bare { table } => ResolvedTableReference {
-                catalog: default_catalog,
-                schema: default_schema,
+                catalog: default_catalog.into(),
+                schema: default_schema.into(),
                 table,
             },
         }
     }
 
-    /// Forms a [`TableReference`] by splitting `s` on periods `.`.
-    ///
-    /// Note that this function does NOT handle periods or name
-    /// normalization correctly (e.g. `"foo.bar"` will be parsed as
-    /// `"foo`.`bar"`. and `Foo` will be parsed as `Foo` (not `foo`).
-    ///
-    /// If you need to handle such identifiers correctly, you should
-    /// use a SQL parser or form the [`OwnedTableReference`] directly.
+    /// Forms a [`TableReference`] by attempting to parse `s` as a multipart identifier,
+    /// failing that then taking the entire input as the identifier itself.
     ///
-    /// See more detail in <https://github.com/apache/arrow-datafusion/issues/4532>
+    /// Will normalize (convert to lowercase) any unquoted identifiers.
+    /// e.g. `Foo` will be parsed as `foo`, and `"Foo"".bar"` will be parsed as
+    /// `Foo".bar` (note the preserved case and requiring two double quotes to represent
+    /// a single double quote in the identifier)
     pub fn parse_str(s: &'a str) -> Self {
-        let parts: Vec<&str> = s.split('.').collect();
+        let mut parts = parse_identifiers(s)
+            .unwrap_or(vec![])
+            .into_iter()
+            .map(|id| match id.quote_style {
+                Some(_) => id.value,
+                // TODO: someway to be able to toggle this functionality?
+                None => id.value.to_ascii_lowercase(),
+            })

Review Comment:
   comparing with datafusion-sql package where some functionality is duplicated:
   
   https://github.com/apache/arrow-datafusion/blob/ac876dbc9729b16e272e00496c51e53d9f649173/datafusion/sql/src/utils.rs#L540-L546
   
   unsure if also should have a way to toggle normalization, as for datafusion-sql package it allows it:
   
   https://github.com/apache/arrow-datafusion/blob/c323721192ba2733a56c4201b2255a36b1eaa859/datafusion/sql/src/planner.rs#L424-L430
   
   but since this `parse_str` is meant to be used to implement `From<&str>`, there wouldn't really be a way to pass information on whether to toggle or not



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] Jefffrey commented on a diff in pull request #5183: Parse identifiers properly for TableReferences

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.

Jefffrey commented on code in PR #5183:
URL: https://github.com/apache/arrow-datafusion/pull/5183#discussion_r1096603289


##########
datafusion/common/src/table_reference.rs:
##########
@@ -155,50 +178,110 @@ impl<'a> TableReference<'a> {
                 table,
             },
             Self::Partial { schema, table } => ResolvedTableReference {
-                catalog: default_catalog,
+                catalog: default_catalog.into(),
                 schema,
                 table,
             },
             Self::Bare { table } => ResolvedTableReference {
-                catalog: default_catalog,
-                schema: default_schema,
+                catalog: default_catalog.into(),
+                schema: default_schema.into(),
                 table,
             },
         }
     }
 
-    /// Forms a [`TableReference`] by splitting `s` on periods `.`.
-    ///
-    /// Note that this function does NOT handle periods or name
-    /// normalization correctly (e.g. `"foo.bar"` will be parsed as
-    /// `"foo`.`bar"`. and `Foo` will be parsed as `Foo` (not `foo`).
-    ///
-    /// If you need to handle such identifiers correctly, you should
-    /// use a SQL parser or form the [`OwnedTableReference`] directly.
+    /// Forms a [`TableReference`] by attempting to parse `s` as a multipart identifier,
+    /// failing that then taking the entire input as the identifier itself.
     ///
-    /// See more detail in <https://github.com/apache/arrow-datafusion/issues/4532>
+    /// Will normalize (convert to lowercase) any unquoted identifiers.
+    /// e.g. `Foo` will be parsed as `foo`, and `"Foo"".bar"` will be parsed as
+    /// `Foo".bar` (note the preserved case and requiring two double quotes to represent
+    /// a single double quote in the identifier)
     pub fn parse_str(s: &'a str) -> Self {
-        let parts: Vec<&str> = s.split('.').collect();
+        let mut parts = parse_identifiers(s)
+            .unwrap_or(vec![])
+            .into_iter()
+            .map(|id| match id.quote_style {
+                Some(_) => id.value,
+                // TODO: someway to be able to toggle this functionality?
+                None => id.value.to_ascii_lowercase(),
+            })
+            .collect::<Vec<_>>();
 
         match parts.len() {
-            1 => Self::Bare { table: s },
+            1 => Self::Bare {
+                table: parts.remove(0).into(),
+            },
             2 => Self::Partial {
-                schema: parts[0],
-                table: parts[1],
+                schema: parts.remove(0).into(),
+                table: parts.remove(0).into(),
             },
             3 => Self::Full {
-                catalog: parts[0],
-                schema: parts[1],
-                table: parts[2],
+                catalog: parts.remove(0).into(),
+                schema: parts.remove(0).into(),
+                table: parts.remove(0).into(),
             },
-            _ => Self::Bare { table: s },
+            // TODO: should normalize?
+            _ => Self::Bare { table: s.into() },

Review Comment:
   parse_identifiers will return `String` anyway (wrapped in `Ident`) so only this last case could actually be `Cow::Borrowed(...)`
   
   also unsure about how to handle this last case, i followed in footsteps of what was done before (just taking entire string), since alternative would be to panic which seems undesirable. also unsure if desirable to normalize it or not



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] Jefffrey commented on pull request #5183: Parse identifiers properly for TableReferences

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.

Jefffrey commented on PR #5183:
URL: https://github.com/apache/arrow-datafusion/pull/5183#issuecomment-1416885998

   @alamb you mentioned possibility of removing `OwnedTableReference` https://github.com/apache/arrow-datafusion/issues/4532#issuecomment-1415727576
   
   however even though using `Cow` now for `TableReference` so could own the string, it still carries lifetime requirements which would make it tedious to replace usages of `OwnedTableReference` with `TableReference` and having to ensure all the lifetime requirements, so will keep it around for now (unless theres some other solution)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5183: Parse identifiers properly for TableReferences

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on code in PR #5183:
URL: https://github.com/apache/arrow-datafusion/pull/5183#discussion_r1097765851


##########
datafusion/common/src/table_reference.rs:
##########
@@ -155,50 +178,110 @@ impl<'a> TableReference<'a> {
                 table,
             },
             Self::Partial { schema, table } => ResolvedTableReference {
-                catalog: default_catalog,
+                catalog: default_catalog.into(),
                 schema,
                 table,
             },
             Self::Bare { table } => ResolvedTableReference {
-                catalog: default_catalog,
-                schema: default_schema,
+                catalog: default_catalog.into(),
+                schema: default_schema.into(),
                 table,
             },
         }
     }
 
-    /// Forms a [`TableReference`] by splitting `s` on periods `.`.
-    ///
-    /// Note that this function does NOT handle periods or name
-    /// normalization correctly (e.g. `"foo.bar"` will be parsed as
-    /// `"foo`.`bar"`. and `Foo` will be parsed as `Foo` (not `foo`).
-    ///
-    /// If you need to handle such identifiers correctly, you should
-    /// use a SQL parser or form the [`OwnedTableReference`] directly.
+    /// Forms a [`TableReference`] by attempting to parse `s` as a multipart identifier,
+    /// failing that then taking the entire input as the identifier itself.
     ///
-    /// See more detail in <https://github.com/apache/arrow-datafusion/issues/4532>
+    /// Will normalize (convert to lowercase) any unquoted identifiers.
+    /// e.g. `Foo` will be parsed as `foo`, and `"Foo"".bar"` will be parsed as
+    /// `Foo".bar` (note the preserved case and requiring two double quotes to represent
+    /// a single double quote in the identifier)
     pub fn parse_str(s: &'a str) -> Self {
-        let parts: Vec<&str> = s.split('.').collect();
+        let mut parts = parse_identifiers(s)
+            .unwrap_or(vec![])
+            .into_iter()
+            .map(|id| match id.quote_style {
+                Some(_) => id.value,
+                // TODO: someway to be able to toggle this functionality?
+                None => id.value.to_ascii_lowercase(),
+            })
+            .collect::<Vec<_>>();
 
         match parts.len() {
-            1 => Self::Bare { table: s },
+            1 => Self::Bare {
+                table: parts.remove(0).into(),
+            },
             2 => Self::Partial {
-                schema: parts[0],
-                table: parts[1],
+                schema: parts.remove(0).into(),
+                table: parts.remove(0).into(),
             },
             3 => Self::Full {
-                catalog: parts[0],
-                schema: parts[1],
-                table: parts[2],
+                catalog: parts.remove(0).into(),
+                schema: parts.remove(0).into(),
+                table: parts.remove(0).into(),
             },
-            _ => Self::Bare { table: s },
+            // TODO: should normalize?
+            _ => Self::Bare { table: s.into() },

Review Comment:
   I think documenting the behavior is about the best we can do in this case. 



##########
datafusion/common/src/table_reference.rs:
##########
@@ -155,50 +178,110 @@ impl<'a> TableReference<'a> {
                 table,
             },
             Self::Partial { schema, table } => ResolvedTableReference {
-                catalog: default_catalog,
+                catalog: default_catalog.into(),
                 schema,
                 table,
             },
             Self::Bare { table } => ResolvedTableReference {
-                catalog: default_catalog,
-                schema: default_schema,
+                catalog: default_catalog.into(),
+                schema: default_schema.into(),
                 table,
             },
         }
     }
 
-    /// Forms a [`TableReference`] by splitting `s` on periods `.`.
-    ///
-    /// Note that this function does NOT handle periods or name
-    /// normalization correctly (e.g. `"foo.bar"` will be parsed as
-    /// `"foo`.`bar"`. and `Foo` will be parsed as `Foo` (not `foo`).
-    ///
-    /// If you need to handle such identifiers correctly, you should
-    /// use a SQL parser or form the [`OwnedTableReference`] directly.
+    /// Forms a [`TableReference`] by attempting to parse `s` as a multipart identifier,
+    /// failing that then taking the entire input as the identifier itself.
     ///
-    /// See more detail in <https://github.com/apache/arrow-datafusion/issues/4532>
+    /// Will normalize (convert to lowercase) any unquoted identifiers.
+    /// e.g. `Foo` will be parsed as `foo`, and `"Foo"".bar"` will be parsed as
+    /// `Foo".bar` (note the preserved case and requiring two double quotes to represent
+    /// a single double quote in the identifier)
     pub fn parse_str(s: &'a str) -> Self {
-        let parts: Vec<&str> = s.split('.').collect();
+        let mut parts = parse_identifiers(s)
+            .unwrap_or(vec![])
+            .into_iter()
+            .map(|id| match id.quote_style {
+                Some(_) => id.value,
+                // TODO: someway to be able to toggle this functionality?
+                None => id.value.to_ascii_lowercase(),
+            })
+            .collect::<Vec<_>>();
 
         match parts.len() {
-            1 => Self::Bare { table: s },
+            1 => Self::Bare {
+                table: parts.remove(0).into(),
+            },
             2 => Self::Partial {
-                schema: parts[0],
-                table: parts[1],
+                schema: parts.remove(0).into(),
+                table: parts.remove(0).into(),
             },
             3 => Self::Full {
-                catalog: parts[0],
-                schema: parts[1],
-                table: parts[2],
+                catalog: parts.remove(0).into(),
+                schema: parts.remove(0).into(),
+                table: parts.remove(0).into(),
             },
-            _ => Self::Bare { table: s },
+            // TODO: should normalize?
+            _ => Self::Bare { table: s.into() },
         }
     }
 }
 
-/// Parse a string into a TableReference, by splittig on `.`
+fn parse_identifiers(s: &str) -> Result<Vec<Ident>> {

Review Comment:
   That certainly seems like a good idea (at least to file a ticket to track)



##########
datafusion/common/src/table_reference.rs:
##########
@@ -155,50 +178,110 @@ impl<'a> TableReference<'a> {
                 table,
             },
             Self::Partial { schema, table } => ResolvedTableReference {
-                catalog: default_catalog,
+                catalog: default_catalog.into(),
                 schema,
                 table,
             },
             Self::Bare { table } => ResolvedTableReference {
-                catalog: default_catalog,
-                schema: default_schema,
+                catalog: default_catalog.into(),
+                schema: default_schema.into(),
                 table,
             },
         }
     }
 
-    /// Forms a [`TableReference`] by splitting `s` on periods `.`.
-    ///
-    /// Note that this function does NOT handle periods or name
-    /// normalization correctly (e.g. `"foo.bar"` will be parsed as
-    /// `"foo`.`bar"`. and `Foo` will be parsed as `Foo` (not `foo`).
-    ///
-    /// If you need to handle such identifiers correctly, you should
-    /// use a SQL parser or form the [`OwnedTableReference`] directly.
+    /// Forms a [`TableReference`] by attempting to parse `s` as a multipart identifier,
+    /// failing that then taking the entire input as the identifier itself.
     ///
-    /// See more detail in <https://github.com/apache/arrow-datafusion/issues/4532>
+    /// Will normalize (convert to lowercase) any unquoted identifiers.
+    /// e.g. `Foo` will be parsed as `foo`, and `"Foo"".bar"` will be parsed as
+    /// `Foo".bar` (note the preserved case and requiring two double quotes to represent

Review Comment:
   ```suggestion
       /// `Foo.bar` (note the preserved case and requiring two double quotes to represent
   ```



##########
datafusion/common/src/table_reference.rs:
##########
@@ -155,50 +178,110 @@ impl<'a> TableReference<'a> {
                 table,
             },
             Self::Partial { schema, table } => ResolvedTableReference {
-                catalog: default_catalog,
+                catalog: default_catalog.into(),
                 schema,
                 table,
             },
             Self::Bare { table } => ResolvedTableReference {
-                catalog: default_catalog,
-                schema: default_schema,
+                catalog: default_catalog.into(),
+                schema: default_schema.into(),
                 table,
             },
         }
     }
 
-    /// Forms a [`TableReference`] by splitting `s` on periods `.`.
-    ///
-    /// Note that this function does NOT handle periods or name
-    /// normalization correctly (e.g. `"foo.bar"` will be parsed as
-    /// `"foo`.`bar"`. and `Foo` will be parsed as `Foo` (not `foo`).
-    ///
-    /// If you need to handle such identifiers correctly, you should
-    /// use a SQL parser or form the [`OwnedTableReference`] directly.
+    /// Forms a [`TableReference`] by attempting to parse `s` as a multipart identifier,
+    /// failing that then taking the entire input as the identifier itself.
     ///
-    /// See more detail in <https://github.com/apache/arrow-datafusion/issues/4532>
+    /// Will normalize (convert to lowercase) any unquoted identifiers.
+    /// e.g. `Foo` will be parsed as `foo`, and `"Foo"".bar"` will be parsed as
+    /// `Foo".bar` (note the preserved case and requiring two double quotes to represent
+    /// a single double quote in the identifier)
     pub fn parse_str(s: &'a str) -> Self {
-        let parts: Vec<&str> = s.split('.').collect();
+        let mut parts = parse_identifiers(s)
+            .unwrap_or(vec![])
+            .into_iter()
+            .map(|id| match id.quote_style {
+                Some(_) => id.value,
+                // TODO: someway to be able to toggle this functionality?
+                None => id.value.to_ascii_lowercase(),
+            })

Review Comment:
   I agree -- if users want to normalize their identifiers I think they'll have to do it manually. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] Jefffrey commented on a diff in pull request #5183: Parse identifiers properly for TableReferences

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.

Jefffrey commented on code in PR #5183:
URL: https://github.com/apache/arrow-datafusion/pull/5183#discussion_r1097860599


##########
datafusion/common/src/table_reference.rs:
##########
@@ -155,50 +178,110 @@ impl<'a> TableReference<'a> {
                 table,
             },
             Self::Partial { schema, table } => ResolvedTableReference {
-                catalog: default_catalog,
+                catalog: default_catalog.into(),
                 schema,
                 table,
             },
             Self::Bare { table } => ResolvedTableReference {
-                catalog: default_catalog,
-                schema: default_schema,
+                catalog: default_catalog.into(),
+                schema: default_schema.into(),
                 table,
             },
         }
     }
 
-    /// Forms a [`TableReference`] by splitting `s` on periods `.`.
-    ///
-    /// Note that this function does NOT handle periods or name
-    /// normalization correctly (e.g. `"foo.bar"` will be parsed as
-    /// `"foo`.`bar"`. and `Foo` will be parsed as `Foo` (not `foo`).
-    ///
-    /// If you need to handle such identifiers correctly, you should
-    /// use a SQL parser or form the [`OwnedTableReference`] directly.
+    /// Forms a [`TableReference`] by attempting to parse `s` as a multipart identifier,
+    /// failing that then taking the entire input as the identifier itself.
     ///
-    /// See more detail in <https://github.com/apache/arrow-datafusion/issues/4532>
+    /// Will normalize (convert to lowercase) any unquoted identifiers.
+    /// e.g. `Foo` will be parsed as `foo`, and `"Foo"".bar"` will be parsed as
+    /// `Foo".bar` (note the preserved case and requiring two double quotes to represent

Review Comment:
   It needs to contain the single double quote since it was escaped by prepending it with another double quote in the original, unless you mean to change the example?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #5183: Parse identifiers properly for TableReferences

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on PR #5183:
URL: https://github.com/apache/arrow-datafusion/pull/5183#issuecomment-1417417399

   I think CI is failing due to an issue fixed in https://github.com/apache/arrow-datafusion/pull/5177


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] ursabot commented on pull request #5183: Parse identifiers properly for TableReferences

Posted by "ursabot (via GitHub)" <gi...@apache.org>.

ursabot commented on PR #5183:
URL: https://github.com/apache/arrow-datafusion/pull/5183#issuecomment-1419761943

   Benchmark runs are scheduled for baseline = 0dfc66da3814616b3e7d7832d685e5b40ac2dc92 and contender = 3f9b99663ce202b16a95ba7baaa6665555aeefdb. 3f9b99663ce202b16a95ba7baaa6665555aeefdb is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/1799b72915dc4ea4a27f726e96c10b56...54fbd39bd8394a349fc8c2a223edb78d/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/4feae58260724b6ebb8b39e22f18a6c1...8d093f4d9e084eba8b7411cdc4f9472e/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bfdabdeebf78401e8ea37f25e3ea0bf8...d3eaaa7eff774a2c968fa567e08777b7/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/84d89b6074564d8c95a3fcee8cd4cc73...30dd68960e6b4591ac72a4c86ed95ecb/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org