You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Siva Samraj <sa...@gmail.com> on 2020/09/30 07:44:33 UTC

Offset Management in Spark

Hi all,

I am using Spark Structured Streaming (Version 2.3.2). I need to read from
Kafka Cluster and write into Kerberized Kafka.
Here I want to use Kafka as offset checkpointing after the record is
written into Kerberized Kafka.

Questions:

1. Can we use Kafka for checkpointing to manage offset or do we need to use
only HDFS/S3 only?

Please help.

Thanks

Re: Offset Management in Spark

Posted by Gabor Somogyi <ga...@gmail.com>.

Hi,

Structured Streaming stores offsets only in HDFS compatible filesystems.
Kafka and S3 are not such.
Custom offset storage was only an option in DStreams.

G

On Wed, Sep 30, 2020 at 9:45 AM Siva Samraj <sa...@gmail.com> wrote:

> Hi all,
>
> I am using Spark Structured Streaming (Version 2.3.2). I need to read from
> Kafka Cluster and write into Kerberized Kafka.
> Here I want to use Kafka as offset checkpointing after the record is
> written into Kerberized Kafka.
>
> Questions:
>
> 1. Can we use Kafka for checkpointing to manage offset or do we need to
> use only HDFS/S3 only?
>
> Please help.
>
> Thanks
>
>