You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/09/16 07:21:45 UTC

[GitHub] [iceberg] findepi opened a new pull request, #5774: Clarify hash specification for date and boolean

findepi opened a new pull request, #5774:
URL: https://github.com/apache/iceberg/pull/5774

   It was using `hashInt` as a way of saying "hash like `int` type". However, this turned out to be confused with "invoke `hashInt` on a hasher", which yields a different result.
   
   Make it explicit how the hash should be calculated by avoiding delegation and mentioning `hashLong` explicitly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on pull request #5774: Clarify hash specification for date and boolean

Posted by GitBox <gi...@apache.org>.
findepi commented on PR #5774:
URL: https://github.com/apache/iceberg/pull/5774#issuecomment-1269796173

   @rdblue i am still looking for more guidance from you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on pull request #5774: Clarify hash specification for date and boolean

Posted by "findepi (via GitHub)" <gi...@apache.org>.
findepi commented on PR #5774:
URL: https://github.com/apache/iceberg/pull/5774#issuecomment-1618265579

   @rdblue @nastra @bitsondatadev should i rebase or close?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on pull request #5774: Clarify hash specification for date and boolean

Posted by GitBox <gi...@apache.org>.
findepi commented on PR #5774:
URL: https://github.com/apache/iceberg/pull/5774#issuecomment-1249012730

   The float hash also uses this delegation pattern
   
   > float | hashDouble(double(v)) [4] | 1.0F → -142385009
   > -- | -- | --
   
   however, it's not being confused by readers, because there is no other "`hashDouble`" operation available.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bitsondatadev commented on pull request #5774: Clarify hash specification for date and boolean

Posted by "bitsondatadev (via GitHub)" <gi...@apache.org>.
bitsondatadev commented on PR #5774:
URL: https://github.com/apache/iceberg/pull/5774#issuecomment-1629912773

   @findepi 
   
   @rdblue says there's still some ambiguity. Rather than using `hashInt` that refers to the `hashLong(long(i))` you changed it to just `hashLong(...)`. But the cast is important.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on pull request #5774: Clarify hash specification for date and boolean

Posted by GitBox <gi...@apache.org>.
findepi commented on PR #5774:
URL: https://github.com/apache/iceberg/pull/5774#issuecomment-1263352085

   > The intent here was to refer to the function defined for the `int` type.
   
   I know. This PR isn't fixing anything, as nothing was wrong. It only tries to avoid potentially confusing wording.
   
   > t looks like people are confusing these functions with hash specific methods in the hash implementation. 
   
   Indeed.
   
   > this change makes the spec less specific because it removes the long(value) cast that is required for int and makes it implicit.
   
   Not sure where you see this. 
   Is it in `hashLong(daysFromUnixEpoch(v))`?
   The `daysFromUnixEpoch` it a hypothetic function, not defined anywhere, so a reader can just assume it returns a long number, if that's what `hashLong` expects.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on pull request #5774: Clarify hash specification for date and boolean

Posted by GitBox <gi...@apache.org>.
findepi commented on PR #5774:
URL: https://github.com/apache/iceberg/pull/5774#issuecomment-1250916468

   Addresses confusion as discussed here https://apache-iceberg.slack.com/archives/C03LG1D563F/p1662993090595709 and here https://apache-iceberg.slack.com/archives/C03LG1D563F/p1663206871379419 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5774: Clarify hash specification for date and boolean

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5774:
URL: https://github.com/apache/iceberg/pull/5774#issuecomment-1262836879

   The intent here was to refer to the function defined for the `int` type. It looks like people are confusing these functions with hash specific methods in the hash implementation. I think it's fine to clarify, but this change makes the spec less specific because it removes the `long(value)` cast that is required for `int` and makes it implicit.
   
   Can we rephrase the change here so that it is simply clear that the hash for date depends on the hash for int?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5774: Clarify hash specification for date and boolean

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5774:
URL: https://github.com/apache/iceberg/pull/5774#issuecomment-1263783052

   `daysFromUnixEpoch` is represented in most places as an int and I didn't want any confusion over how it should be represented. An int is perfectly fine. Since `hashInt` is also a hypothetical function, it's okay to use it.
   
   I think the fix is to name the functions in the spec as they are defined to make it clear that we're referring back to them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on pull request #5774: Clarify hash specification for date and boolean

Posted by "findepi (via GitHub)" <gi...@apache.org>.
findepi commented on PR #5774:
URL: https://github.com/apache/iceberg/pull/5774#issuecomment-1632745389

   @bitsondatadev you mean it should be `hashLong(long(daysFromUnixEpoch(v)))` instead of `hashLong(daysFromUnixEpoch(v))`? I don't see how the latter is ambiguous, but sure I can do that. I want confirmation though that I understood the right way


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on pull request #5774: Clarify hash specification for date and boolean

Posted by GitBox <gi...@apache.org>.
findepi commented on PR #5774:
URL: https://github.com/apache/iceberg/pull/5774#issuecomment-1263891363

   > Since `hashInt` is also a hypothetical function, it's okay to use it.
   
   it's also a name of `org.apache.iceberg.relocated.com.google.common.hash.HashFunction#hashInt` available on the `Hashing.murmur3_32_fixed` object.
   
   > daysFromUnixEpoch is represented in most places as an int 
   
   ```
   iceberg master$ git grep daysFromUnixEpoch
   format/spec.md:| **`date`**         | `hashInt(daysFromUnixEpoch(v))`           | `2017-11-16` → `-653330422`                |
   ```
   
   >  I think the fix is to name the functions in the spec as they are defined to make it clear that we're referring back to them.
   
   i am not sure i am following. can you please exemplify? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bitsondatadev commented on pull request #5774: Clarify hash specification for date and boolean

Posted by "bitsondatadev (via GitHub)" <gi...@apache.org>.
bitsondatadev commented on PR #5774:
URL: https://github.com/apache/iceberg/pull/5774#issuecomment-1619555534

   @findepi just saw this, let me follow up with @rdblue after the holiday!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org