You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Chris Riccomini <cr...@linkedin.com> on 2013/12/03 00:39:38 UTC

Computing Quantiles

Hey Guys,

I saw this floating around Twitter recently:

  https://github.com/tdunning/t-digest

Seems like it might be a good way to compute quantiles from a Samza task. Just throwing it out there in case anyone's interested.

One other thought would be to adapt this to a state store, so you could have predictable quantile computation (even in the face of failure). Keep in mind, though, that the algorithm is approximate, so you'd only get exactly the same approximate answer (hah!) in the case of failure. It does, however, take advantage of local disk.

Cheers,
Chris