You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2020/01/09 01:40:45 UTC

[GitHub] [incubator-datasketches-java] leerho commented on issue #288: getPMF of UpdateDoublesSketch is giving different results for same data

leerho commented on issue #288: getPMF of UpdateDoublesSketch is giving different results for same data
URL: https://github.com/apache/incubator-datasketches-java/issues/288#issuecomment-572341237
 
 
   Priyam,
   
   There are a number of problems with the code you provided:
   1.  First it is filled with errors.  It took me a while for me to figure
   out what you *might* be trying to do.
   2.  As Alex points out, you leave out so much information, that we are left
   to guessing what you are up to.
   3.  Why are you using a tuple sketch at all?  If you have a stream of
   double values that you want to understand the distribution of, why not just
   send them directly to one of the quantiles sketches.
   4.  Your code is effectively doing double sampling of your data, first by
   the tuple sketch (which by default keeps 4096 samples), and the sampling
   that with the quantiles sketch (which by default keeps only 128 values).
   This will make the error bounds on the quantile sketch meaningless and it
   will be much worse.  You should at least increase the K value of the
   quantile sketch to be much larger, preferably at least as large as the
   configured size of the tuple sketch.
   5.  The ArrayOfDoublesSketch is an aggregating sketch. This means that if
   there are any duplicate keys in your stream, the value retained in the
   sketch will be the sum.  Is this what you want? Only if you want to obtain
   the "distribution of sums" will this make any sense.
   
   If you can be much more clear about what you are trying to do and the
   nature of your input stream, we could be more helpful.
   
   Cheers,
   
   Lee.
   
   
   
   On Wed, Jan 8, 2020 at 10:20 AM Alexander Saydakov <no...@github.com>
   wrote:
   
   > You are using approximate algorithms, so the results can be different
   > every time. The question is how different are they? What accuracy do you
   > expect? To answer this question you need to be more specific. What do you
   > mean by "oscillating between 2-3 values"? What is the true distribution of
   > your input data? What approximation are you getting? Why do you think it is
   > too far off?
   >
   > —
   > You are receiving this because you are subscribed to this thread.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/incubator-datasketches-java/issues/288?email_source=notifications&email_token=ADCXRQW24TE4PI4UXK7BHFLQ4YKNLA5CNFSM4KEC7IDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEINPUCI#issuecomment-572193289>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/ADCXRQVGTHRV7PSIQEIEOADQ4YKNLANCNFSM4KEC7IDA>
   > .
   >
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org