You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2020/07/18 01:10:13 UTC

[GitHub] [incubator-datasketches-postgresql] leerho commented on issue #27: Should we treat `null` as empty sketch?

leerho commented on issue #27:
URL: https://github.com/apache/incubator-datasketches-postgresql/issues/27#issuecomment-660399175


   Although it is true that currently Java equates null with empty for union, intersection and difference.  However, we have been getting feedback that the impact of null=empty for intersection and difference operations seriously affects the result and by silently substituting empty the user would get weird results without any warning or indication why. It would also be difficult to track down and debug.  (Note: null=empty in union is rather harmless.)  So when we move to -java version 2.0.0, we will be changing that behavior so that a null with intersection and difference operations will throw an exception.  
   
   I realize that nulls occur quite frequently in all kinds of raw data.  But the significance of null is clearly dependent on the application.  Nonetheless, nulls propagate and what you show above is an example of what could be very undesirable propagation of a null.  And when doing numerical analysis of data, one needs to be very clear about the handling of nulls, otherwise, one could end up with garbage results and without any warning.  How nulls are handled in one application could be quite different in another application.
   
   For a foundation library, such as DataSketches, I think it is very risky for us to assume that null=empty everywhere.  And just because BigQuery assumes that, does not make it good policy.  
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org