You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by "leerho (via GitHub)" <gi...@apache.org> on 2024/02/29 20:11:35 UTC

[PR] Improvements realized while analyzing Issue 514 (datasketches-java)

leerho opened a new pull request, #516:
URL: https://github.com/apache/datasketches-java/pull/516

   While analyzing Issue 514, I realized that adding a few new asserts and/or exceptions would be very helpful to users and to us when debugging.  Specifically, they consist of 3 different detections:
   - Throwing an exception if a self-merge is detected. This is an easy-to-make user error and warrants an exception.
   
   These next two should be very rare, but are also directly in the update path, so I think an _assert_ is more appropriate.
   - Throwing an AssertionError if N (the number of items input into the sketch) exceeds Long.MAX_VALUE.
   - Throwing an AssertionError if the number of levels exceed 60
   
   
   I also realized that our built-in shuffle class could be improved by allowing generic arrays to be shuffled and also providing the option of a user-provided Random generator instead of only the internal default. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] Improvements realized while analyzing Issue 514 (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on PR #516:
URL: https://github.com/apache/datasketches-java/pull/516#issuecomment-1977756056

   This is a tough one. I discovered these issues while running malformed test code.  In all three cases, the sketch crashes.  
   If the user just runs their environment with "-ea" enabled, the asserts will run.
   Changing the asserts to exceptions means running a check always for a situation that should be exceedingly rare and is in the direct path of every insertion. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] Improvements realized while analyzing Issue 514 (datasketches-java)

Posted by "jmalkin (via GitHub)" <gi...@apache.org>.
jmalkin commented on PR #516:
URL: https://github.com/apache/datasketches-java/pull/516#issuecomment-1975908971

   I'm not sure about the assert. Aren't those removed for non-debug builds? Meaning the use cases where we'd expect the asserts to possibly trigger are likely to be the situations where the assert is removed? I don't think we'd expect people to insert that many items into a debug-mode test.
   
   The changes are fine if we agree it's ok not to have those check in productions systems.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] Improvements realized while analyzing Issue 514 (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho merged PR #516:
URL: https://github.com/apache/datasketches-java/pull/516


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] Improvements realized while analyzing Issue 514 (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on PR #516:
URL: https://github.com/apache/datasketches-java/pull/516#issuecomment-1980011529

   Removed two asserts and converted another one to throw ArithmeticException when the sum of two longs overflows.  Turns out that ` long Math.addExact(long,long) ` is the perfect solution. It tests for overflow and if true throws an ArithmeticException with the comment "long overflow", which is perfect.  And it is fast. I checked the Math source code.
   
   This is a new exception, but it is thrown just before the sketch crashes.  So I do not believe this, on its own warrants a major release. However, once we are all done, we may have other reasons for a major release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [PR] Improvements realized while analyzing Issue 514 (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on PR #516:
URL: https://github.com/apache/datasketches-java/pull/516#issuecomment-1977764690

   This is a tough one. I discovered these issues running buggy test code. In the two assert cases the sketch will crash without any checks.  If the user reruns the environment with"-ea" the asserts will display the meaningful error message.
   Converting to exceptions means the checks will always run for what should be extremely rare situations.   I hesitate to put too many checks directly in the update path.   There are many, many places in our code where we use longs and do not check for overflow, because longs are so huge.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org