You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Debraj Manna <su...@gmail.com> on 2020/01/16 17:44:21 UTC

Sort data across partitions and put it in another topic

Hi

I have a Kafka topic with X partitions. Each message has a timestamp, ts.
Can someone suggest me some way of sorting all the messages (based on ts)
across all partitions and putting it in a new topic with Y partitions (Y <
X ) using Kafka java client?

Thanks

Re: Sort data across partitions and put it in another topic

Posted by Daniyar Kulakhmetov <dk...@liftoff.io>.
I'm not familiar with Kafka Streams API, but I guess it's possible to use
it since the data need to be consumed from some source, processed, and the
results should be produced into some other destination.

The main point here, that you would need to specify which source partitions
should be used by a particular stream processor/consumer and to which a
single destination partition the result should be produced. The logic of
the processor can be either read all data from assigned partitions and sort
them or, maybe, read data by portions (I think the best is to read data by
setting offset to the same timestamp) and then produce sorted result to the
destination topic.

As for the merging everything into a single destination partition, it's
just a particular case when you would have only one large group.

On Thu, Jan 16, 2020 at 10:38 PM Debraj Manna <su...@gmail.com>
wrote:

> Thanks Daniyar for replying.
>
> Do kafka streams have any apis to do the partitioning and grouping that you
> are suggesting?
>
> Also if I have to merge everything into a single partition what should be
> the efficient way to do this?
>
> On Fri, Jan 17, 2020 at 6:03 AM Daniyar Kulakhmetov <
> dkulakhmetov@liftoff.io>
> wrote:
>
> > Since you not going to merge everything into one partition, you don't
> need
> > to sort all messages across all partitions (because messages are sorted
> > only within partition).
> > I'd suggest splitting X partitions to Y groups and then merge source
> > partitions within each group into their destination partition.
> >
> >
> > On Thu, Jan 16, 2020 at 10:20 AM Debraj Manna <su...@gmail.com>
> > wrote:
> >
> > > Just to add when this operation will be going on no new data will be
> > added
> > > to original Kafka topic. I am trying to avoid buffering all data to a
> > > temporary datastore to sort.
> > >
> > > On Thu, 16 Jan 2020, 23:14 Debraj Manna, <su...@gmail.com>
> > wrote:
> > >
> > > > Hi
> > > >
> > > > I have a Kafka topic with X partitions. Each message has a timestamp,
> > ts.
> > > > Can someone suggest me some way of sorting all the messages (based on
> > ts)
> > > > across all partitions and putting it in a new topic with Y partitions
> > (Y
> > > <
> > > > X ) using Kafka java client?
> > > >
> > > > Thanks
> > > >
> > > >
> > >
> >
>

Re: Sort data across partitions and put it in another topic

Posted by Debraj Manna <su...@gmail.com>.
Thanks Daniyar for replying.

Do kafka streams have any apis to do the partitioning and grouping that you
are suggesting?

Also if I have to merge everything into a single partition what should be
the efficient way to do this?

On Fri, Jan 17, 2020 at 6:03 AM Daniyar Kulakhmetov <dk...@liftoff.io>
wrote:

> Since you not going to merge everything into one partition, you don't need
> to sort all messages across all partitions (because messages are sorted
> only within partition).
> I'd suggest splitting X partitions to Y groups and then merge source
> partitions within each group into their destination partition.
>
>
> On Thu, Jan 16, 2020 at 10:20 AM Debraj Manna <su...@gmail.com>
> wrote:
>
> > Just to add when this operation will be going on no new data will be
> added
> > to original Kafka topic. I am trying to avoid buffering all data to a
> > temporary datastore to sort.
> >
> > On Thu, 16 Jan 2020, 23:14 Debraj Manna, <su...@gmail.com>
> wrote:
> >
> > > Hi
> > >
> > > I have a Kafka topic with X partitions. Each message has a timestamp,
> ts.
> > > Can someone suggest me some way of sorting all the messages (based on
> ts)
> > > across all partitions and putting it in a new topic with Y partitions
> (Y
> > <
> > > X ) using Kafka java client?
> > >
> > > Thanks
> > >
> > >
> >
>

Re: Sort data across partitions and put it in another topic

Posted by Daniyar Kulakhmetov <dk...@liftoff.io>.
Since you not going to merge everything into one partition, you don't need
to sort all messages across all partitions (because messages are sorted
only within partition).
I'd suggest splitting X partitions to Y groups and then merge source
partitions within each group into their destination partition.


On Thu, Jan 16, 2020 at 10:20 AM Debraj Manna <su...@gmail.com>
wrote:

> Just to add when this operation will be going on no new data will be added
> to original Kafka topic. I am trying to avoid buffering all data to a
> temporary datastore to sort.
>
> On Thu, 16 Jan 2020, 23:14 Debraj Manna, <su...@gmail.com> wrote:
>
> > Hi
> >
> > I have a Kafka topic with X partitions. Each message has a timestamp, ts.
> > Can someone suggest me some way of sorting all the messages (based on ts)
> > across all partitions and putting it in a new topic with Y partitions (Y
> <
> > X ) using Kafka java client?
> >
> > Thanks
> >
> >
>

Re: Sort data across partitions and put it in another topic

Posted by Debraj Manna <su...@gmail.com>.
Just to add when this operation will be going on no new data will be added
to original Kafka topic. I am trying to avoid buffering all data to a
temporary datastore to sort.

On Thu, 16 Jan 2020, 23:14 Debraj Manna, <su...@gmail.com> wrote:

> Hi
>
> I have a Kafka topic with X partitions. Each message has a timestamp, ts.
> Can someone suggest me some way of sorting all the messages (based on ts)
> across all partitions and putting it in a new topic with Y partitions (Y <
> X ) using Kafka java client?
>
> Thanks
>
>