You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/22 19:55:22 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request #1477: Fix SortExec discards field metadata on the output schema

alamb opened a new pull request #1477:
URL: https://github.com/apache/arrow-datafusion/pull/1477


   # Which issue does this PR close?
   
   Closes #1476 
   
    # Rationale for this change
   
   This is a regression in https://github.com/apache/arrow-datafusion/pull/1455
   
   The following code in Sort skips the field level metadata when when it constructs the output batch.
   
   ```rust
       let schema = Arc::new(Schema::new(
           schema
               .fields()
               .iter()
               .zip(batch.columns().iter().map(|col| col.data_type()))
               .map(|(field, ty)| Field::new(field.name(), ty.clone(), field.is_nullable()))
               .collect::<Vec<_>>(),
       ));
   ```
   
   Thus the output schema is not correct. 
   
   It also seems to be unnecessary (at least all the tests pass without it, so maybe @maxburke  can comment on what problem it was solving). 
   
   @hntd187 has also added some functions working with https://github.com/apache/arrow-rs/pull/1033 PR to add some additional functions for creating fields and I will work on this as well
   
   # What changes are included in this PR?
   1. Remove offending code
   2. add a test
   
   # Are there any user-facing changes?
   Bug fix
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan merged pull request #1477: Fix SortExec discards field metadata on the output schema

Posted by GitBox <gi...@apache.org>.
Dandandan merged pull request #1477:
URL: https://github.com/apache/arrow-datafusion/pull/1477


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1477: Fix SortExec discards field metadata on the output schema

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1477:
URL: https://github.com/apache/arrow-datafusion/pull/1477#discussion_r774136970



##########
File path: datafusion/src/physical_plan/sort.rs
##########
@@ -201,15 +201,6 @@ fn sort_batch(
         None,
     )?;
 
-    let schema = Arc::new(Schema::new(

Review comment:
       the fix is to delete this code, the rest of the file is a test

##########
File path: datafusion/src/physical_plan/sort.rs
##########
@@ -201,15 +201,6 @@ fn sort_batch(
         None,
     )?;
 
-    let schema = Arc::new(Schema::new(

Review comment:
       the fix is to delete this code, the rest of the PR is a test




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #1477: Fix SortExec discards field metadata on the output schema

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #1477:
URL: https://github.com/apache/arrow-datafusion/pull/1477#discussion_r774470776



##########
File path: datafusion/src/physical_plan/sort.rs
##########
@@ -201,15 +201,6 @@ fn sort_batch(
         None,
     )?;
 
-    let schema = Arc::new(Schema::new(

Review comment:
       Nice and clean 🧹🧼




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] hntd187 commented on pull request #1477: Fix SortExec discards field metadata on the output schema

Posted by GitBox <gi...@apache.org>.
hntd187 commented on pull request #1477:
URL: https://github.com/apache/arrow-datafusion/pull/1477#issuecomment-999875537


   This is just to correct the bug correct? We will deal with carrying over metadata properly in another PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1477: Fix SortExec discards field metadata on the output schema

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1477:
URL: https://github.com/apache/arrow-datafusion/pull/1477#issuecomment-999881951


   > This is just to correct the bug correct? We will deal with carrying over metadata properly in another PR?
   
   Yes -- I think DataFusion does carry over metadata now in many cases and this PR corrects a case where the metadata used to be carried over but is no longer
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] maxburke commented on pull request #1477: Fix SortExec discards field metadata on the output schema

Posted by GitBox <gi...@apache.org>.
maxburke commented on pull request #1477:
URL: https://github.com/apache/arrow-datafusion/pull/1477#issuecomment-1004187409


   @alamb If I recall the reason this change was made was because in our cases the schema was stripped of the timezone information but the underlying record batches preserved it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1477: Fix SortExec discards field metadata on the output schema

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1477:
URL: https://github.com/apache/arrow-datafusion/pull/1477#issuecomment-1004741399


   > @alamb If I recall the reason this change was made was because in our cases the schema was stripped of the timezone information but the underlying record batches preserved it.
   
   I am sorry if I broke something for you @maxburke  -- if you help me understand / write a reproducer for what want to do I am happy to help try and make it work


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org