You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2020/01/09 01:40:45 UTC
[GitHub] [incubator-datasketches-java] leerho commented on issue #288:
getPMF of UpdateDoublesSketch is giving different results for same data
leerho commented on issue #288: getPMF of UpdateDoublesSketch is giving different results for same data
URL: https://github.com/apache/incubator-datasketches-java/issues/288#issuecomment-572341237
Priyam,
There are a number of problems with the code you provided:
1. First it is filled with errors. It took me a while for me to figure
out what you *might* be trying to do.
2. As Alex points out, you leave out so much information, that we are left
to guessing what you are up to.
3. Why are you using a tuple sketch at all? If you have a stream of
double values that you want to understand the distribution of, why not just
send them directly to one of the quantiles sketches.
4. Your code is effectively doing double sampling of your data, first by
the tuple sketch (which by default keeps 4096 samples), and the sampling
that with the quantiles sketch (which by default keeps only 128 values).
This will make the error bounds on the quantile sketch meaningless and it
will be much worse. You should at least increase the K value of the
quantile sketch to be much larger, preferably at least as large as the
configured size of the tuple sketch.
5. The ArrayOfDoublesSketch is an aggregating sketch. This means that if
there are any duplicate keys in your stream, the value retained in the
sketch will be the sum. Is this what you want? Only if you want to obtain
the "distribution of sums" will this make any sense.
If you can be much more clear about what you are trying to do and the
nature of your input stream, we could be more helpful.
Cheers,
Lee.
On Wed, Jan 8, 2020 at 10:20 AM Alexander Saydakov <no...@github.com>
wrote:
> You are using approximate algorithms, so the results can be different
> every time. The question is how different are they? What accuracy do you
> expect? To answer this question you need to be more specific. What do you
> mean by "oscillating between 2-3 values"? What is the true distribution of
> your input data? What approximation are you getting? Why do you think it is
> too far off?
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/incubator-datasketches-java/issues/288?email_source=notifications&email_token=ADCXRQW24TE4PI4UXK7BHFLQ4YKNLA5CNFSM4KEC7IDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEINPUCI#issuecomment-572193289>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ADCXRQVGTHRV7PSIQEIEOADQ4YKNLANCNFSM4KEC7IDA>
> .
>
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org