You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Sumant Tambe <su...@gmail.com> on 2021/07/12 20:55:00 UTC

Re: [DISCUSS] KIP-405 + KAFKA-7739 - Implementation of Tiered Storage Integration with Azure Storage

Hi Israel,

Linkedin is interested in evaluating KIP-405 for HDFS and S3 in the short
term and Azure Blob Storage in the long run. You may already know that Linkedin
is migrating to Azure
<https://engineering.linkedin.com/blog/2019/building-next-infra>. We think
that Blobs will provide us with the optimal cost/availability/operability
trade-offs for Kafka in Azure.
What's the context behind your interest in KIP-405 and Azure Blobs? Do you
have any data/experience of using Azure blobs at scale?

Satish, any word on the KIP-405 and the RSM implementations?

Regards,
Sumant
Kafka Dev, Linkedin

On Mon, 15 Mar 2021 at 09:08, Satish Duggana <sa...@gmail.com>
wrote:

> Hi Israel,
> Thanks for your interest in tiered storage. As mentioned by Jun earlier, we
> decided not to have any implementations in Apache Kafka repo like Kafka
> connectors. We plan to have RSM implementations for HDFS, S3, GCP, and
> Azure storages in a separate repo. We will let you know once they are ready
> for review.
>
> Best,
> Satish.
>
> On Sat, 13 Mar 2021 at 01:27, Israel Ekpo <is...@gmail.com> wrote:
>
> > Thanks @Jun for the prompt response.
> >
> > That's ok and I think it is a great strategy just like the Connect
> > ecosystem.
> >
> > However, I am still in search for repos and samples that demonstrate
> > implementation for the KIP.
> >
> > I will keep searching but was just wondering if there were sample
> > implementations for S3 or HDFS I could take a look at.
> >
> > Thanks.
> >
> > On Fri, Mar 12, 2021 at 2:19 PM Jun Rao <ju...@confluent.io.invalid>
> wrote:
> >
> > > Hi, Israel,
> > >
> > > Thanks for your interest. As part of KIP-405, we have made the decision
> > not
> > > to host any plugins for external remote storage directly in Apache
> Kafka.
> > > Those plugins could be hosted outside of Apache Kafka.
> > >
> > > Jun
> > >
> > > On Thu, Mar 11, 2021 at 5:15 PM Israel Ekpo <is...@gmail.com>
> > wrote:
> > >
> > > > Thanks Satish, Sriharsha, Suresh and Ying for authoring this KIP and
> > > thanks
> > > > to everyone that participated in the review and discussion to take it
> > to
> > > > where it is today.
> > > >
> > > > I would like to contribute by working on integrating Azure Storage
> > (Blob
> > > > and ADLS) with Tiered Storage for this KIP
> > > >
> > > > I have created this issue to track this work
> > > > https://issues.apache.org/jira/browse/KAFKA-12458
> > > >
> > > > Are there any sample implementations for HDFS/S3 that I can reference
> > to
> > > > get started?
> > > >
> > > > When you have a moment, please share.
> > > >
> > > > Thanks.
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-405 + KAFKA-7739 - Implementation of Tiered Storage Integration with Azure Storage

Posted by Israel Ekpo <is...@gmail.com>.
Hi Sumant,

I have resumed work on the RSM implementations for Azure Storage. If you
are interested in discussing the implementation details, please reach out
to me directly and we can set up a call to discuss further.

Thanks.



On Mon, Jul 12, 2021 at 11:12 PM Satish Duggana <sa...@gmail.com>
wrote:

> Hi Sumant,
> We are tracking HDFS and S3 implementations as part of this project to
> see the proposed APIs are addressing file/object stores. As Jun
> mentioned in the earlier mail thread,  no RSM implementation will be
> part of Apache Kafka. This is in line with Kafka Connectors
> development. Devs/Users can build their own implementations and host
> them to share with others.
>
> We have HDFS/S3 RSMs which can be treated as reference
> implementations. These can be referenced to build your own
> implementations if needed.
>
> Thanks,
> Satish.
>
>
> On Tue, 13 Jul 2021 at 02:25, Sumant Tambe <su...@gmail.com> wrote:
> >
> > Hi Israel,
> >
> > Linkedin is interested in evaluating KIP-405 for HDFS and S3 in the short
> > term and Azure Blob Storage in the long run. You may already know that
> Linkedin
> > is migrating to Azure
> > <https://engineering.linkedin.com/blog/2019/building-next-infra>. We
> think
> > that Blobs will provide us with the optimal cost/availability/operability
> > trade-offs for Kafka in Azure.
> > What's the context behind your interest in KIP-405 and Azure Blobs? Do
> you
> > have any data/experience of using Azure blobs at scale?
> >
> > Satish, any word on the KIP-405 and the RSM implementations?
> >
> > Regards,
> > Sumant
> > Kafka Dev, Linkedin
> >
> > On Mon, 15 Mar 2021 at 09:08, Satish Duggana <sa...@gmail.com>
> > wrote:
> >
> > > Hi Israel,
> > > Thanks for your interest in tiered storage. As mentioned by Jun
> earlier, we
> > > decided not to have any implementations in Apache Kafka repo like Kafka
> > > connectors. We plan to have RSM implementations for HDFS, S3, GCP, and
> > > Azure storages in a separate repo. We will let you know once they are
> ready
> > > for review.
> > >
> > > Best,
> > > Satish.
> > >
> > > On Sat, 13 Mar 2021 at 01:27, Israel Ekpo <is...@gmail.com>
> wrote:
> > >
> > > > Thanks @Jun for the prompt response.
> > > >
> > > > That's ok and I think it is a great strategy just like the Connect
> > > > ecosystem.
> > > >
> > > > However, I am still in search for repos and samples that demonstrate
> > > > implementation for the KIP.
> > > >
> > > > I will keep searching but was just wondering if there were sample
> > > > implementations for S3 or HDFS I could take a look at.
> > > >
> > > > Thanks.
> > > >
> > > > On Fri, Mar 12, 2021 at 2:19 PM Jun Rao <ju...@confluent.io.invalid>
> > > wrote:
> > > >
> > > > > Hi, Israel,
> > > > >
> > > > > Thanks for your interest. As part of KIP-405, we have made the
> decision
> > > > not
> > > > > to host any plugins for external remote storage directly in Apache
> > > Kafka.
> > > > > Those plugins could be hosted outside of Apache Kafka.
> > > > >
> > > > > Jun
> > > > >
> > > > > On Thu, Mar 11, 2021 at 5:15 PM Israel Ekpo <is...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Thanks Satish, Sriharsha, Suresh and Ying for authoring this KIP
> and
> > > > > thanks
> > > > > > to everyone that participated in the review and discussion to
> take it
> > > > to
> > > > > > where it is today.
> > > > > >
> > > > > > I would like to contribute by working on integrating Azure
> Storage
> > > > (Blob
> > > > > > and ADLS) with Tiered Storage for this KIP
> > > > > >
> > > > > > I have created this issue to track this work
> > > > > > https://issues.apache.org/jira/browse/KAFKA-12458
> > > > > >
> > > > > > Are there any sample implementations for HDFS/S3 that I can
> reference
> > > > to
> > > > > > get started?
> > > > > >
> > > > > > When you have a moment, please share.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > >
> > > >
> > >
>

Re: [DISCUSS] KIP-405 + KAFKA-7739 - Implementation of Tiered Storage Integration with Azure Storage

Posted by Satish Duggana <sa...@gmail.com>.
Hi Sumant,
We are tracking HDFS and S3 implementations as part of this project to
see the proposed APIs are addressing file/object stores. As Jun
mentioned in the earlier mail thread,  no RSM implementation will be
part of Apache Kafka. This is in line with Kafka Connectors
development. Devs/Users can build their own implementations and host
them to share with others.

We have HDFS/S3 RSMs which can be treated as reference
implementations. These can be referenced to build your own
implementations if needed.

Thanks,
Satish.


On Tue, 13 Jul 2021 at 02:25, Sumant Tambe <su...@gmail.com> wrote:
>
> Hi Israel,
>
> Linkedin is interested in evaluating KIP-405 for HDFS and S3 in the short
> term and Azure Blob Storage in the long run. You may already know that Linkedin
> is migrating to Azure
> <https://engineering.linkedin.com/blog/2019/building-next-infra>. We think
> that Blobs will provide us with the optimal cost/availability/operability
> trade-offs for Kafka in Azure.
> What's the context behind your interest in KIP-405 and Azure Blobs? Do you
> have any data/experience of using Azure blobs at scale?
>
> Satish, any word on the KIP-405 and the RSM implementations?
>
> Regards,
> Sumant
> Kafka Dev, Linkedin
>
> On Mon, 15 Mar 2021 at 09:08, Satish Duggana <sa...@gmail.com>
> wrote:
>
> > Hi Israel,
> > Thanks for your interest in tiered storage. As mentioned by Jun earlier, we
> > decided not to have any implementations in Apache Kafka repo like Kafka
> > connectors. We plan to have RSM implementations for HDFS, S3, GCP, and
> > Azure storages in a separate repo. We will let you know once they are ready
> > for review.
> >
> > Best,
> > Satish.
> >
> > On Sat, 13 Mar 2021 at 01:27, Israel Ekpo <is...@gmail.com> wrote:
> >
> > > Thanks @Jun for the prompt response.
> > >
> > > That's ok and I think it is a great strategy just like the Connect
> > > ecosystem.
> > >
> > > However, I am still in search for repos and samples that demonstrate
> > > implementation for the KIP.
> > >
> > > I will keep searching but was just wondering if there were sample
> > > implementations for S3 or HDFS I could take a look at.
> > >
> > > Thanks.
> > >
> > > On Fri, Mar 12, 2021 at 2:19 PM Jun Rao <ju...@confluent.io.invalid>
> > wrote:
> > >
> > > > Hi, Israel,
> > > >
> > > > Thanks for your interest. As part of KIP-405, we have made the decision
> > > not
> > > > to host any plugins for external remote storage directly in Apache
> > Kafka.
> > > > Those plugins could be hosted outside of Apache Kafka.
> > > >
> > > > Jun
> > > >
> > > > On Thu, Mar 11, 2021 at 5:15 PM Israel Ekpo <is...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks Satish, Sriharsha, Suresh and Ying for authoring this KIP and
> > > > thanks
> > > > > to everyone that participated in the review and discussion to take it
> > > to
> > > > > where it is today.
> > > > >
> > > > > I would like to contribute by working on integrating Azure Storage
> > > (Blob
> > > > > and ADLS) with Tiered Storage for this KIP
> > > > >
> > > > > I have created this issue to track this work
> > > > > https://issues.apache.org/jira/browse/KAFKA-12458
> > > > >
> > > > > Are there any sample implementations for HDFS/S3 that I can reference
> > > to
> > > > > get started?
> > > > >
> > > > > When you have a moment, please share.
> > > > >
> > > > > Thanks.
> > > > >
> > > >
> > >
> >