You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Israel Ekpo <is...@gmail.com> on 2021/03/12 01:15:16 UTC

[DISCUSS] KIP-405 + KAFKA-7739 - Implementation of Tiered Storage Integration with Azure Storage

Thanks Satish, Sriharsha, Suresh and Ying for authoring this KIP and thanks
to everyone that participated in the review and discussion to take it to
where it is today.

I would like to contribute by working on integrating Azure Storage (Blob
and ADLS) with Tiered Storage for this KIP

I have created this issue to track this work
https://issues.apache.org/jira/browse/KAFKA-12458

Are there any sample implementations for HDFS/S3 that I can reference to
get started?

When you have a moment, please share.

Thanks.

Re: [DISCUSS] KIP-405 + KAFKA-7739 - Implementation of Tiered Storage Integration with Azure Storage

Posted by Israel Ekpo <is...@gmail.com>.
Hi Sumant,

I have resumed work on the RSM implementations for Azure Storage. If you
are interested in discussing the implementation details, please reach out
to me directly and we can set up a call to discuss further.

Thanks.



On Mon, Jul 12, 2021 at 11:12 PM Satish Duggana <sa...@gmail.com>
wrote:

> Hi Sumant,
> We are tracking HDFS and S3 implementations as part of this project to
> see the proposed APIs are addressing file/object stores. As Jun
> mentioned in the earlier mail thread,  no RSM implementation will be
> part of Apache Kafka. This is in line with Kafka Connectors
> development. Devs/Users can build their own implementations and host
> them to share with others.
>
> We have HDFS/S3 RSMs which can be treated as reference
> implementations. These can be referenced to build your own
> implementations if needed.
>
> Thanks,
> Satish.
>
>
> On Tue, 13 Jul 2021 at 02:25, Sumant Tambe <su...@gmail.com> wrote:
> >
> > Hi Israel,
> >
> > Linkedin is interested in evaluating KIP-405 for HDFS and S3 in the short
> > term and Azure Blob Storage in the long run. You may already know that
> Linkedin
> > is migrating to Azure
> > <https://engineering.linkedin.com/blog/2019/building-next-infra>. We
> think
> > that Blobs will provide us with the optimal cost/availability/operability
> > trade-offs for Kafka in Azure.
> > What's the context behind your interest in KIP-405 and Azure Blobs? Do
> you
> > have any data/experience of using Azure blobs at scale?
> >
> > Satish, any word on the KIP-405 and the RSM implementations?
> >
> > Regards,
> > Sumant
> > Kafka Dev, Linkedin
> >
> > On Mon, 15 Mar 2021 at 09:08, Satish Duggana <sa...@gmail.com>
> > wrote:
> >
> > > Hi Israel,
> > > Thanks for your interest in tiered storage. As mentioned by Jun
> earlier, we
> > > decided not to have any implementations in Apache Kafka repo like Kafka
> > > connectors. We plan to have RSM implementations for HDFS, S3, GCP, and
> > > Azure storages in a separate repo. We will let you know once they are
> ready
> > > for review.
> > >
> > > Best,
> > > Satish.
> > >
> > > On Sat, 13 Mar 2021 at 01:27, Israel Ekpo <is...@gmail.com>
> wrote:
> > >
> > > > Thanks @Jun for the prompt response.
> > > >
> > > > That's ok and I think it is a great strategy just like the Connect
> > > > ecosystem.
> > > >
> > > > However, I am still in search for repos and samples that demonstrate
> > > > implementation for the KIP.
> > > >
> > > > I will keep searching but was just wondering if there were sample
> > > > implementations for S3 or HDFS I could take a look at.
> > > >
> > > > Thanks.
> > > >
> > > > On Fri, Mar 12, 2021 at 2:19 PM Jun Rao <ju...@confluent.io.invalid>
> > > wrote:
> > > >
> > > > > Hi, Israel,
> > > > >
> > > > > Thanks for your interest. As part of KIP-405, we have made the
> decision
> > > > not
> > > > > to host any plugins for external remote storage directly in Apache
> > > Kafka.
> > > > > Those plugins could be hosted outside of Apache Kafka.
> > > > >
> > > > > Jun
> > > > >
> > > > > On Thu, Mar 11, 2021 at 5:15 PM Israel Ekpo <is...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Thanks Satish, Sriharsha, Suresh and Ying for authoring this KIP
> and
> > > > > thanks
> > > > > > to everyone that participated in the review and discussion to
> take it
> > > > to
> > > > > > where it is today.
> > > > > >
> > > > > > I would like to contribute by working on integrating Azure
> Storage
> > > > (Blob
> > > > > > and ADLS) with Tiered Storage for this KIP
> > > > > >
> > > > > > I have created this issue to track this work
> > > > > > https://issues.apache.org/jira/browse/KAFKA-12458
> > > > > >
> > > > > > Are there any sample implementations for HDFS/S3 that I can
> reference
> > > > to
> > > > > > get started?
> > > > > >
> > > > > > When you have a moment, please share.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > >
> > > >
> > >
>

Re: [DISCUSS] KIP-405 + KAFKA-7739 - Implementation of Tiered Storage Integration with Azure Storage

Posted by Satish Duggana <sa...@gmail.com>.
Hi Sumant,
We are tracking HDFS and S3 implementations as part of this project to
see the proposed APIs are addressing file/object stores. As Jun
mentioned in the earlier mail thread,  no RSM implementation will be
part of Apache Kafka. This is in line with Kafka Connectors
development. Devs/Users can build their own implementations and host
them to share with others.

We have HDFS/S3 RSMs which can be treated as reference
implementations. These can be referenced to build your own
implementations if needed.

Thanks,
Satish.


On Tue, 13 Jul 2021 at 02:25, Sumant Tambe <su...@gmail.com> wrote:
>
> Hi Israel,
>
> Linkedin is interested in evaluating KIP-405 for HDFS and S3 in the short
> term and Azure Blob Storage in the long run. You may already know that Linkedin
> is migrating to Azure
> <https://engineering.linkedin.com/blog/2019/building-next-infra>. We think
> that Blobs will provide us with the optimal cost/availability/operability
> trade-offs for Kafka in Azure.
> What's the context behind your interest in KIP-405 and Azure Blobs? Do you
> have any data/experience of using Azure blobs at scale?
>
> Satish, any word on the KIP-405 and the RSM implementations?
>
> Regards,
> Sumant
> Kafka Dev, Linkedin
>
> On Mon, 15 Mar 2021 at 09:08, Satish Duggana <sa...@gmail.com>
> wrote:
>
> > Hi Israel,
> > Thanks for your interest in tiered storage. As mentioned by Jun earlier, we
> > decided not to have any implementations in Apache Kafka repo like Kafka
> > connectors. We plan to have RSM implementations for HDFS, S3, GCP, and
> > Azure storages in a separate repo. We will let you know once they are ready
> > for review.
> >
> > Best,
> > Satish.
> >
> > On Sat, 13 Mar 2021 at 01:27, Israel Ekpo <is...@gmail.com> wrote:
> >
> > > Thanks @Jun for the prompt response.
> > >
> > > That's ok and I think it is a great strategy just like the Connect
> > > ecosystem.
> > >
> > > However, I am still in search for repos and samples that demonstrate
> > > implementation for the KIP.
> > >
> > > I will keep searching but was just wondering if there were sample
> > > implementations for S3 or HDFS I could take a look at.
> > >
> > > Thanks.
> > >
> > > On Fri, Mar 12, 2021 at 2:19 PM Jun Rao <ju...@confluent.io.invalid>
> > wrote:
> > >
> > > > Hi, Israel,
> > > >
> > > > Thanks for your interest. As part of KIP-405, we have made the decision
> > > not
> > > > to host any plugins for external remote storage directly in Apache
> > Kafka.
> > > > Those plugins could be hosted outside of Apache Kafka.
> > > >
> > > > Jun
> > > >
> > > > On Thu, Mar 11, 2021 at 5:15 PM Israel Ekpo <is...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks Satish, Sriharsha, Suresh and Ying for authoring this KIP and
> > > > thanks
> > > > > to everyone that participated in the review and discussion to take it
> > > to
> > > > > where it is today.
> > > > >
> > > > > I would like to contribute by working on integrating Azure Storage
> > > (Blob
> > > > > and ADLS) with Tiered Storage for this KIP
> > > > >
> > > > > I have created this issue to track this work
> > > > > https://issues.apache.org/jira/browse/KAFKA-12458
> > > > >
> > > > > Are there any sample implementations for HDFS/S3 that I can reference
> > > to
> > > > > get started?
> > > > >
> > > > > When you have a moment, please share.
> > > > >
> > > > > Thanks.
> > > > >
> > > >
> > >
> >

Re: [DISCUSS] KIP-405 + KAFKA-7739 - Implementation of Tiered Storage Integration with Azure Storage

Posted by Sumant Tambe <su...@gmail.com>.
Hi Israel,

Linkedin is interested in evaluating KIP-405 for HDFS and S3 in the short
term and Azure Blob Storage in the long run. You may already know that Linkedin
is migrating to Azure
<https://engineering.linkedin.com/blog/2019/building-next-infra>. We think
that Blobs will provide us with the optimal cost/availability/operability
trade-offs for Kafka in Azure.
What's the context behind your interest in KIP-405 and Azure Blobs? Do you
have any data/experience of using Azure blobs at scale?

Satish, any word on the KIP-405 and the RSM implementations?

Regards,
Sumant
Kafka Dev, Linkedin

On Mon, 15 Mar 2021 at 09:08, Satish Duggana <sa...@gmail.com>
wrote:

> Hi Israel,
> Thanks for your interest in tiered storage. As mentioned by Jun earlier, we
> decided not to have any implementations in Apache Kafka repo like Kafka
> connectors. We plan to have RSM implementations for HDFS, S3, GCP, and
> Azure storages in a separate repo. We will let you know once they are ready
> for review.
>
> Best,
> Satish.
>
> On Sat, 13 Mar 2021 at 01:27, Israel Ekpo <is...@gmail.com> wrote:
>
> > Thanks @Jun for the prompt response.
> >
> > That's ok and I think it is a great strategy just like the Connect
> > ecosystem.
> >
> > However, I am still in search for repos and samples that demonstrate
> > implementation for the KIP.
> >
> > I will keep searching but was just wondering if there were sample
> > implementations for S3 or HDFS I could take a look at.
> >
> > Thanks.
> >
> > On Fri, Mar 12, 2021 at 2:19 PM Jun Rao <ju...@confluent.io.invalid>
> wrote:
> >
> > > Hi, Israel,
> > >
> > > Thanks for your interest. As part of KIP-405, we have made the decision
> > not
> > > to host any plugins for external remote storage directly in Apache
> Kafka.
> > > Those plugins could be hosted outside of Apache Kafka.
> > >
> > > Jun
> > >
> > > On Thu, Mar 11, 2021 at 5:15 PM Israel Ekpo <is...@gmail.com>
> > wrote:
> > >
> > > > Thanks Satish, Sriharsha, Suresh and Ying for authoring this KIP and
> > > thanks
> > > > to everyone that participated in the review and discussion to take it
> > to
> > > > where it is today.
> > > >
> > > > I would like to contribute by working on integrating Azure Storage
> > (Blob
> > > > and ADLS) with Tiered Storage for this KIP
> > > >
> > > > I have created this issue to track this work
> > > > https://issues.apache.org/jira/browse/KAFKA-12458
> > > >
> > > > Are there any sample implementations for HDFS/S3 that I can reference
> > to
> > > > get started?
> > > >
> > > > When you have a moment, please share.
> > > >
> > > > Thanks.
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-405 + KAFKA-7739 - Implementation of Tiered Storage Integration with Azure Storage

Posted by Satish Duggana <sa...@gmail.com>.
Hi Israel,
Thanks for your interest in tiered storage. As mentioned by Jun earlier, we
decided not to have any implementations in Apache Kafka repo like Kafka
connectors. We plan to have RSM implementations for HDFS, S3, GCP, and
Azure storages in a separate repo. We will let you know once they are ready
for review.

Best,
Satish.

On Sat, 13 Mar 2021 at 01:27, Israel Ekpo <is...@gmail.com> wrote:

> Thanks @Jun for the prompt response.
>
> That's ok and I think it is a great strategy just like the Connect
> ecosystem.
>
> However, I am still in search for repos and samples that demonstrate
> implementation for the KIP.
>
> I will keep searching but was just wondering if there were sample
> implementations for S3 or HDFS I could take a look at.
>
> Thanks.
>
> On Fri, Mar 12, 2021 at 2:19 PM Jun Rao <ju...@confluent.io.invalid> wrote:
>
> > Hi, Israel,
> >
> > Thanks for your interest. As part of KIP-405, we have made the decision
> not
> > to host any plugins for external remote storage directly in Apache Kafka.
> > Those plugins could be hosted outside of Apache Kafka.
> >
> > Jun
> >
> > On Thu, Mar 11, 2021 at 5:15 PM Israel Ekpo <is...@gmail.com>
> wrote:
> >
> > > Thanks Satish, Sriharsha, Suresh and Ying for authoring this KIP and
> > thanks
> > > to everyone that participated in the review and discussion to take it
> to
> > > where it is today.
> > >
> > > I would like to contribute by working on integrating Azure Storage
> (Blob
> > > and ADLS) with Tiered Storage for this KIP
> > >
> > > I have created this issue to track this work
> > > https://issues.apache.org/jira/browse/KAFKA-12458
> > >
> > > Are there any sample implementations for HDFS/S3 that I can reference
> to
> > > get started?
> > >
> > > When you have a moment, please share.
> > >
> > > Thanks.
> > >
> >
>

Re: [DISCUSS] KIP-405 + KAFKA-7739 - Implementation of Tiered Storage Integration with Azure Storage

Posted by Israel Ekpo <is...@gmail.com>.
Thanks @Jun for the prompt response.

That's ok and I think it is a great strategy just like the Connect
ecosystem.

However, I am still in search for repos and samples that demonstrate
implementation for the KIP.

I will keep searching but was just wondering if there were sample
implementations for S3 or HDFS I could take a look at.

Thanks.

On Fri, Mar 12, 2021 at 2:19 PM Jun Rao <ju...@confluent.io.invalid> wrote:

> Hi, Israel,
>
> Thanks for your interest. As part of KIP-405, we have made the decision not
> to host any plugins for external remote storage directly in Apache Kafka.
> Those plugins could be hosted outside of Apache Kafka.
>
> Jun
>
> On Thu, Mar 11, 2021 at 5:15 PM Israel Ekpo <is...@gmail.com> wrote:
>
> > Thanks Satish, Sriharsha, Suresh and Ying for authoring this KIP and
> thanks
> > to everyone that participated in the review and discussion to take it to
> > where it is today.
> >
> > I would like to contribute by working on integrating Azure Storage (Blob
> > and ADLS) with Tiered Storage for this KIP
> >
> > I have created this issue to track this work
> > https://issues.apache.org/jira/browse/KAFKA-12458
> >
> > Are there any sample implementations for HDFS/S3 that I can reference to
> > get started?
> >
> > When you have a moment, please share.
> >
> > Thanks.
> >
>

Re: [DISCUSS] KIP-405 + KAFKA-7739 - Implementation of Tiered Storage Integration with Azure Storage

Posted by Jun Rao <ju...@confluent.io.INVALID>.
Hi, Israel,

Thanks for your interest. As part of KIP-405, we have made the decision not
to host any plugins for external remote storage directly in Apache Kafka.
Those plugins could be hosted outside of Apache Kafka.

Jun

On Thu, Mar 11, 2021 at 5:15 PM Israel Ekpo <is...@gmail.com> wrote:

> Thanks Satish, Sriharsha, Suresh and Ying for authoring this KIP and thanks
> to everyone that participated in the review and discussion to take it to
> where it is today.
>
> I would like to contribute by working on integrating Azure Storage (Blob
> and ADLS) with Tiered Storage for this KIP
>
> I have created this issue to track this work
> https://issues.apache.org/jira/browse/KAFKA-12458
>
> Are there any sample implementations for HDFS/S3 that I can reference to
> get started?
>
> When you have a moment, please share.
>
> Thanks.
>