You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/19 18:03:00 UTC

[GitHub] [arrow-datafusion] tustvold opened a new issue #364: Compare GroupByScalar

tustvold opened a new issue #364:
URL: https://github.com/apache/arrow-datafusion/issues/364


   Whilst looking to implement #141 I couldn't work out a way to compare `ScalarValue` types.
   
   There is, however, `GroupByScalar` which seems pretty compelling:
   
   * It lacks an internal Option, allowing the caller to handle null-ordering how it deems fit
   * It uses OrderedFloat, establishing a standards compliant ordering of floats
   * It only provides the value variants that make sense to compare
   * The plumbing to convert from a `ScalarValue` already exists
   
   It therefore seems to be what I'm looking for, however, it doesn't currently have a `PartialOrd` or `Ord` implementation and is crate-local...
   
   Based on this I have a couple of questions:
   
   * Is there some other mechanism I should use to compare fields from different rows from different record batches (at potentially different row offsets)
   * Can we add `#[derive(PartialOrd, Ord)]` to GroupByScalar - note this would establish an ordering based on the order of enum variants between unrelated variants
   * Can we expose `GroupByScalar` publicly, potentially renamed to something to reflect its more generic utility, perhaps `ComparableScalar` or something


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] tustvold commented on issue #364: Compare GroupByScalar

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #364:
URL: https://github.com/apache/arrow-datafusion/issues/364#issuecomment-844366872


   Would we change ScalarValue to store OrderedFloat and add `Eq` in addition to `PartialOrd` and `Ord` or keep using raw floats and only define `PartialOrd`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #364: Compare GroupByScalar

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #364:
URL: https://github.com/apache/arrow-datafusion/issues/364#issuecomment-844349894


   So maybe we can add compare to gby scalar (or ScalarValue) as part of this PR and then unify the two (remove GroupByScalar --> ScalarValue) as a follow on PR 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #364: Compare GroupByScalar

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on issue #364:
URL: https://github.com/apache/arrow-datafusion/issues/364#issuecomment-844347323


   > I have also often wondered "why do we have `GroupByScalar` at all" (aka why can't we just use `ScalarValue`)
   
   I agree.
   
   AFAIK it is a historical artifact. First it came the `GroupByScalar`, when the group by was first implemented; then the `ScalarValue` emerged to nullable uniformise scalar operations, particularly aggregations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan closed issue #364: Add Compare to GroupByScalar

Posted by GitBox <gi...@apache.org>.
Dandandan closed issue #364:
URL: https://github.com/apache/arrow-datafusion/issues/364


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb edited a comment on issue #364: Compare GroupByScalar

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on issue #364:
URL: https://github.com/apache/arrow-datafusion/issues/364#issuecomment-844342993


   I have also often wondered "why do we have `GroupByScalar` at all" (aka why can't we just use `ScalarValue`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #364: Compare GroupByScalar

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #364:
URL: https://github.com/apache/arrow-datafusion/issues/364#issuecomment-844373242


   > Would we change ScalarValue to store OrderedFloat and add Eq in addition to PartialOrd and Ord or keep using raw floats and only define PartialOrd?
   
   I would suggest we don't change ScalarValue in this PR and use GroupByScalar, 
   
   When we get to unifying GroupByScalar and ScalarValue I would imagine we would have to use `ordered_float` in ScalarValue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #364: Compare GroupByScalar

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #364:
URL: https://github.com/apache/arrow-datafusion/issues/364#issuecomment-844342993


   I have also often wondered "why do we have `GroupByScalar` at all (aka why can't we just use `ScalarValue`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #364: Add Compare to GroupByScalar

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on issue #364:
URL: https://github.com/apache/arrow-datafusion/issues/364#issuecomment-844401489


   I agree combining them makes a lot of sense :+1: 
   
   In aggregates I was also stumbling on some strange issues relating to evaluation of scalars while working on this: https://github.com/apache/arrow-datafusion/pull/339
   
   I think when we have a bit more unified code it makes it easier to solve these kind of issues.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #364: Add Compare to GroupByScalar

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #364:
URL: https://github.com/apache/arrow-datafusion/issues/364#issuecomment-844401489


   I agree combining them makes a lot of sense :+1: 
   
   In aggregates I was also stumbling on some strange issues relating to evaluation of scalars while working on this: https://github.com/apache/arrow-datafusion/pull/339
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #364: Compare GroupByScalar

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #364:
URL: https://github.com/apache/arrow-datafusion/issues/364#issuecomment-844342479


   Perhaps @Dandandan  or @jorgecarleitao or @jhorstmann  have some thoughts in this area


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #364: Add Compare to GroupByScalar

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #364:
URL: https://github.com/apache/arrow-datafusion/issues/364#issuecomment-845918101


   Filed https://github.com/apache/arrow-datafusion/issues/376 to track combining the vaulues


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb edited a comment on issue #364: Add Compare to GroupByScalar

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on issue #364:
URL: https://github.com/apache/arrow-datafusion/issues/364#issuecomment-845918101


   Filed https://github.com/apache/arrow-datafusion/issues/376 to track combining the two structs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org