You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pulsar.apache.org by Sijie Guo <gu...@gmail.com> on 2020/04/27 07:09:26 UTC

Re: Apache Pulsar - Performance on Tiered storage

The performance typically varies between different tiered storages. And
tiered storage is only used in scanning the historic data. So the dominated
factor is the sequential scan throughput. Currently, I don't think there
are any public articles about performance. Usually, we would recommend
people to get the performance results themselves so people can get the
first-hand results which are not biased.

Currently, Spark and Flink integration don't read directly from tiered
storage. Only Presto supports reading directly from tiered storage. So if
you want to see the performance, it is recommended to test using Presto
hence you get a better sense about tiered storage.

- Sijie

On Thu, Apr 23, 2020 at 11:00 AM Qiu, Min-1 <mi...@novartis.com> wrote:

> Hello
>
> I am very interested in Apache Pulsar but have not tried yet.  I searched
> internet but seems there are nobody talked about the read performance on
> the data in the cold tiered storage together with the data in the hot
> bookie.
>
> Do you have any of the articles or data on the performance on reading the
> s3 data?
> Like compare on read only from hot bookie vs read data from hot bookie + s3
> Like compare to other framework like Spark, Flink etc.
>
>
> Looking forwards to hearing from you.
>
> Thanks
>
> Min
>
>