You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/02/28 10:10:25 UTC

[GitHub] [arrow-datafusion] tustvold opened a new issue, #4680: Make DfSchema wrap SchemaRef

tustvold opened a new issue, #4680:
URL: https://github.com/apache/arrow-datafusion/issues/4680

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   Currently DataFusion has two ways to represent a schema, `Schema` and `DFSchema`. The former is the representation used by arrow, and in most components. `DFSchema` appears to "enhance" an arrow schema with the notion of a qualifier.
   
   I'm not entirely sure of the history of this split, but to the uninitiated the split is confusing and frustrating. It also results in a non-trivial amount of schema munging logic to convert to/from the relevant representations
   
   **Describe the solution you'd like**
   
   I would like to change `DfSchema` to be
   
   ```
   struct DfSchema {
       schema: SchemaRef,
       fields: Vec<DfFieldMetadata>
   }
   
   struct DfFieldMetadata {
       qualifier: Option<String>,
   }
   ```
   
   We could then make `DfSchema` automatically deref to `SchemaRef`, or at the very least implement `AsRef<SchemaRef>`, avoiding a lot of code that ends up looking like
   
   ```
   let schema: Schema = self.plan.schema().as_ref().into();
   Arc::new(schema)
   ```
   
   Components wishing to combine the information can easily zip the two together, we could even assist this by adding
   
   ```
   struct DfField<'a> {
       field: &'a Field,
       metadata: &'a DfFieldMetadata
   }
   
   impl DfSchema {
   
       fn df_fields() -> impl Iterator<Item=DfField<'_>> + '_ {
           self.arrow_schema.fields().iter().zip(&self.fields).map(|(field, metadata)| DfField { field, metadata })
       }
   }
   ```
   
   **Describe alternatives you've considered**
   
   We could not do this
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold closed issue #4680: Make DfSchema wrap SchemaRef

Posted by GitBox <gi...@apache.org>.
tustvold closed issue #4680: Make DfSchema wrap SchemaRef
URL: https://github.com/apache/arrow-datafusion/issues/4680


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Make DfSchema wrap SchemaRef [arrow-datafusion]

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #4680:
URL: https://github.com/apache/arrow-datafusion/issues/4680#issuecomment-1889782344

   Attaching link to DFSchema struct for convenience https://github.com/apache/arrow-datafusion/blob/main/datafusion/common/src/dfschema.rs#L108


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #4680: Make DfSchema wrap SchemaRef

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #4680:
URL: https://github.com/apache/arrow-datafusion/issues/4680#issuecomment-1359920573

   > I'm not entirely sure of the history of this split, but to the uninitiated the split is confusing and frustrating. It also results in a non-trivial amount of schema munging logic to convert to/from the relevant representations
   
   I wrote the original `DFSchema` wrapping logic back in the day. If it can be improved that would be great.
   
   The key design goal is that DataFusion should *always* use a `DFSchema` (aka a qualified name) so there should not be an implicit conversion from `SchemaRef` --> (implicitly unqualified) `DFSchema` to avoid subtle bugs where the qualifiers are stripped off.
   
   Going the other way, `DFSchema` --> `Schema` is 👍 
   
   
   cc @Dandandan @jackwener @mingmwang @andygrove  in case you have additional thoughts
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on issue #4680: Make DfSchema wrap SchemaRef

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #4680:
URL: https://github.com/apache/arrow-datafusion/issues/4680#issuecomment-1362227110

   This mostly works but runs into issues with the handling of schemas in datafusion-expr. In particular they have a fairly hard-coded assumption that they can pass around DFField, concatenate them and munge them back into a DFSchema. It would be possible to workaround this, but it would be a fairly non-trivial amount of code-churn.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Make DfSchema wrap SchemaRef [arrow-datafusion]

Posted by "matthewmturner (via GitHub)" <gi...@apache.org>.
matthewmturner commented on issue #4680:
URL: https://github.com/apache/arrow-datafusion/issues/4680#issuecomment-1889823268

   Ive been doing some [work](https://github.com/apache/arrow-datafusion/issues/5309) on planning performance and in order to lay a good foundation for the next steps I think this is necessary.  So my plan is to start working on this shortly.
   
   CC @alamb @tustvold @comphead @avantgardnerio 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Make DfSchema wrap SchemaRef [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #4680: Make DfSchema wrap SchemaRef
URL: https://github.com/apache/arrow-datafusion/issues/4680


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Make DfSchema wrap SchemaRef [arrow-datafusion]

Posted by "matthewmturner (via GitHub)" <gi...@apache.org>.
matthewmturner commented on issue #4680:
URL: https://github.com/apache/arrow-datafusion/issues/4680#issuecomment-1889878393

   @alamb yes i saw, thanks. havent decided yet whether i will pick up on that PR or create a new one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Make DfSchema wrap SchemaRef [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #4680:
URL: https://github.com/apache/arrow-datafusion/issues/4680#issuecomment-1889875164

   > Ive been doing some [work](https://github.com/apache/arrow-datafusion/issues/5309) on planning performance and in order to lay a good foundation for the next steps I think this is necessary. So my plan is to start working on this shortly.
   > 
   > CC @alamb @tustvold @comphead @avantgardnerio
   
   Thank you @matthewmturner 🙏 
   
   BTW I started hacking on some version of what this might look like  in https://github.com/apache/arrow-datafusion/pull/7944 but I haven't had a chance to finish it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Make DfSchema wrap SchemaRef [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #4680:
URL: https://github.com/apache/arrow-datafusion/issues/4680#issuecomment-1889890721

   Feel free to do anyting you want with the code / PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org