You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/20 16:04:21 UTC

[GitHub] [arrow] ch-sc commented on pull request #8715: ARROW-10656: [Rust] Use DataType comparison without values

ch-sc commented on pull request #8715:
URL: https://github.com/apache/arrow/pull/8715#issuecomment-731255930


   Thank you all for the quick feedback!
   
   > Is the only intended change here to not compare the name of fields? Or do you have other ideas in mind for the future?
   
   Well, I guess what approach to take when comparing depends on the scenario. When calling `try_new()` to create a new record batch I didn't expect this behavior. But I can see the benefits of including more information in the comparison logic; e.g. nullable. 
   In the implementation I followed what I thought would be the expected behavior. Also, it aligned with the comment in the code that I cited in the ticket: _"list types can have different names, but we only need the data types to be the same"_
   
   > Is it easy to explain in what circumstance you have DataTypes that are the same except for the field name?
   
   Keeping track of such metadata adds complexity to the user especially in nested scenarios. I would assume in the majority of cases this information is not relevant when comparing. BTW this is not restricted to field names only, but also dictionary ids and dictionary ordering.
   
   I would second the suggestion from @alamb:
   
   > 2. Update the PartialCmp implementation for DataType to ignore Field names for all DataType == DataType comparisons
   
   ...or have 2 implementations for including metadata or not. I saw a similar approach in C++ for RecordBatch: 
   ```c
   bool RecordBatch::Equals(const RecordBatch& other, bool check_metadata)
   ```
   
   However, we should be clear to the users what is actually compared and what is not. If we end up in a scenario where some attributes play a role and some do not this is really hard to understand for anyone.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org