You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2020/07/16 02:50:58 UTC

[GitHub] [incubator-datasketches-postgresql] phstudy opened a new issue #27: Should we treat `null` as empty sketch?

phstudy opened a new issue #27:
URL: https://github.com/apache/incubator-datasketches-postgresql/issues/27


   Union and intersection operations treat null value as empty sketch in datasketches-java. I think it's less meaning to return `null` when calculating `theta_sketch_union`, `theta_sketch_get_estimate` and `theta_sketch_intersection`.
   
   Should we follow the rule in PostgreSQL? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-postgresql] AlexanderSaydakov commented on issue #27: Should we treat `null` as empty sketch?

Posted by GitBox <gi...@apache.org>.
AlexanderSaydakov commented on issue #27:
URL: https://github.com/apache/incubator-datasketches-postgresql/issues/27#issuecomment-659188103


   Could you give a specific example of a behavior you don't like?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-postgresql] phstudy commented on issue #27: Should we treat `null` as empty sketch?

Posted by GitBox <gi...@apache.org>.
phstudy commented on issue #27:
URL: https://github.com/apache/incubator-datasketches-postgresql/issues/27#issuecomment-659317527


   I think estimation function should return `0` rather than `null`. It's much convenient, otherwise I have to write `case with` to handle the `null` value. The estimation function in BigQuery `HLL_COUNT` also returns `0` when input value is `null`.
   
   **Estimation function**
   |Sketch| Function  | Return value |
   | ------------- | ------------- |------------- |
   | Theta Sketch  |   theta_sketch_get_estimate(null) |null|
   | [HLL_COUNT](https://cloud.google.com/bigquery/docs/reference/standard-sql/hll_functions)  |  HLL_COUNT.EXTRACT(null) |0 |


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-postgresql] phstudy commented on issue #27: Should we treat `null` as empty sketch?

Posted by GitBox <gi...@apache.org>.
phstudy commented on issue #27:
URL: https://github.com/apache/incubator-datasketches-postgresql/issues/27#issuecomment-660430127


   I got it. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-postgresql] leerho commented on issue #27: Should we treat `null` as empty sketch?

Posted by GitBox <gi...@apache.org>.
leerho commented on issue #27:
URL: https://github.com/apache/incubator-datasketches-postgresql/issues/27#issuecomment-660399175


   Although it is true that currently Java equates null with empty for union, intersection and difference.  However, we have been getting feedback that the impact of null=empty for intersection and difference operations seriously affects the result and by silently substituting empty the user would get weird results without any warning or indication why. It would also be difficult to track down and debug.  (Note: null=empty in union is rather harmless.)  So when we move to -java version 2.0.0, we will be changing that behavior so that a null with intersection and difference operations will throw an exception.  
   
   I realize that nulls occur quite frequently in all kinds of raw data.  But the significance of null is clearly dependent on the application.  Nonetheless, nulls propagate and what you show above is an example of what could be very undesirable propagation of a null.  And when doing numerical analysis of data, one needs to be very clear about the handling of nulls, otherwise, one could end up with garbage results and without any warning.  How nulls are handled in one application could be quite different in another application.
   
   For a foundation library, such as DataSketches, I think it is very risky for us to assume that null=empty everywhere.  And just because BigQuery assumes that, does not make it good policy.  
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-postgresql] phstudy closed issue #27: Should we treat `null` as empty sketch?

Posted by GitBox <gi...@apache.org>.
phstudy closed issue #27:
URL: https://github.com/apache/incubator-datasketches-postgresql/issues/27


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org