You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by 杰杨 <fu...@live.com> on 2018/03/02 02:33:06 UTC

答复: which Kafka StateStore could I use ?

Yes .but the DB’s Concurrent quantity is  the limitation.
Now I can process 600 records/second
And I want enhance it

发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用

发件人: Guozhang Wang<ma...@gmail.com>
发送时间: 2018年3月2日 2:59
收件人: users@kafka.apache.org<ma...@kafka.apache.org>
主题: Re: which Kafka StateStore could I use ?

Hello Jie,

Just to understand your problem better, are you referring "db" for an
external storage engine outside Kafka Streams, and you are asking how to
only send one record per aggregation key (assuming you are doing some
aggregations with Streams' statestore) to that end storage engine?

Guozhang

On Wed, Feb 28, 2018 at 7:53 PM, 杰 杨 <fu...@live.com> wrote:

>
> HI：
> I use kafka streams for real-time data analysis
> and I meet a problem.
> now I process a record in kafka and compute it and send to db.
> but db concurrency level is not suit for me.
> so I want that
> 1）when there is not data in kakfa ,the statestore is  no results.
> 2) when there is a lot of data records in kafka the statestore save
> computed result and I need send its once to db.
> which StateStoe can I use for do that above
> ________________________________
> funkyyj@live.com
>

--
-- Guozhang

回复：答复: which Kafka StateStore could I use ?

Posted by 杰杨 <fu...@live.com>.

can you show some tips for this？

---原始邮件---
发件人: "Guozhang Wang "<wa...@gmail.com>
发送时间: 2018年3月3日 01:32:55
收件人: "users"<us...@kafka.apache.org>;
主题: Re: 答复: which Kafka StateStore could I use ?


Hello Jie,

By default Kafka Streams uses caching on top of its internal state stores
to de-dup output streams to the final destination (in your case the DB) so
that for a single key, fewer updates will be generated giving a small
working set. If your aggregation logic follows such key distribution, you
can try enlarge the cache size (by default it is only 50MB) and see if it
helps reduce the downstream traffic to your DB.


Guozhang


On Thu, Mar 1, 2018 at 6:33 PM, 杰 杨 <fu...@live.com> wrote:

> Yes .but the DB’s Concurrent quantity is  the limitation.
> Now I can process 600 records/second
> And I want enhance it
>
> 发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用
>
> 发件人: Guozhang Wang<ma...@gmail.com>
> 发送时间: 2018年3月2日 2:59
> 收件人: users@kafka.apache.org<ma...@kafka.apache.org>
> 主题: Re: which Kafka StateStore could I use ?
>
> Hello Jie,
>
> Just to understand your problem better, are you referring "db" for an
> external storage engine outside Kafka Streams, and you are asking how to
> only send one record per aggregation key (assuming you are doing some
> aggregations with Streams&apos; statestore) to that end storage engine?
>
>
> Guozhang
>
>
> On Wed, Feb 28, 2018 at 7:53 PM, 杰 杨 <fu...@live.com> wrote:
>
> >
> > HI：
> > I use kafka streams for real-time data analysis
> > and I meet a problem.
> > now I process a record in kafka and compute it and send to db.
> > but db concurrency level is not suit for me.
> > so I want that
> > 1）when there is not data in kakfa ,the statestore is  no results.
> > 2) when there is a lot of data records in kafka the statestore save
> > computed result and I need send its once to db.
> > which StateStoe can I use for do that above
> > ________________________________
> > funkyyj@live.com
> >
>
>
>
> --
> -- Guozhang
>
>


--
-- Guozhang

回复：答复: which Kafka StateStore could I use ?

Posted by 杰杨 <fu...@live.com>.

mongodb i used,but i need update 10 operator for one record
I process a record for 20 ms for one thread

---原始邮件---
发件人: "Ted Yu "<yu...@gmail.com>
发送时间: 2018年3月3日 01:37:13
收件人: "users"<us...@kafka.apache.org>;
主题: Re: 答复: which Kafka StateStore could I use ?


Jie:
Which DB are you using ?

600 records/second is very low rate.

Probably your DB needs some tuning.

Cheers

On Fri, Mar 2, 2018 at 9:32 AM, Guozhang Wang <wa...@gmail.com> wrote:

> Hello Jie,
>
> By default Kafka Streams uses caching on top of its internal state stores
> to de-dup output streams to the final destination (in your case the DB) so
> that for a single key, fewer updates will be generated giving a small
> working set. If your aggregation logic follows such key distribution, you
> can try enlarge the cache size (by default it is only 50MB) and see if it
> helps reduce the downstream traffic to your DB.
>
>
> Guozhang
>
>
> On Thu, Mar 1, 2018 at 6:33 PM, 杰 杨 <fu...@live.com> wrote:
>
> > Yes .but the DB’s Concurrent quantity is  the limitation.
> > Now I can process 600 records/second
> > And I want enhance it
> >
> > 发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用
> >
> > 发件人: Guozhang Wang<ma...@gmail.com>
> > 发送时间: 2018年3月2日 2:59
> > 收件人: users@kafka.apache.org<ma...@kafka.apache.org>
> > 主题: Re: which Kafka StateStore could I use ?
> >
> > Hello Jie,
> >
> > Just to understand your problem better, are you referring "db" for an
> > external storage engine outside Kafka Streams, and you are asking how to
> > only send one record per aggregation key (assuming you are doing some
> > aggregations with Streams&apos; statestore) to that end storage engine?
> >
> >
> > Guozhang
> >
> >
> > On Wed, Feb 28, 2018 at 7:53 PM, 杰 杨 <fu...@live.com> wrote:
> >
> > >
> > > HI：
> > > I use kafka streams for real-time data analysis
> > > and I meet a problem.
> > > now I process a record in kafka and compute it and send to db.
> > > but db concurrency level is not suit for me.
> > > so I want that
> > > 1）when there is not data in kakfa ,the statestore is  no results.
> > > 2) when there is a lot of data records in kafka the statestore save
> > > computed result and I need send its once to db.
> > > which StateStoe can I use for do that above
> > > ________________________________
> > > funkyyj@live.com
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
> >
>
>
> --
> -- Guozhang
>

Re: 答复: which Kafka StateStore could I use ?

Posted by Ted Yu <yu...@gmail.com>.

Jie:
Which DB are you using ?

600 records/second is very low rate.

Probably your DB needs some tuning.

Cheers

On Fri, Mar 2, 2018 at 9:32 AM, Guozhang Wang <wa...@gmail.com> wrote:

> Hello Jie,
>
> By default Kafka Streams uses caching on top of its internal state stores
> to de-dup output streams to the final destination (in your case the DB) so
> that for a single key, fewer updates will be generated giving a small
> working set. If your aggregation logic follows such key distribution, you
> can try enlarge the cache size (by default it is only 50MB) and see if it
> helps reduce the downstream traffic to your DB.
>
>
> Guozhang
>
>
> On Thu, Mar 1, 2018 at 6:33 PM, 杰 杨 <fu...@live.com> wrote:
>
> > Yes .but the DB’s Concurrent quantity is  the limitation.
> > Now I can process 600 records/second
> > And I want enhance it
> >
> > 发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用
> >
> > 发件人: Guozhang Wang<ma...@gmail.com>
> > 发送时间: 2018年3月2日 2:59
> > 收件人: users@kafka.apache.org<ma...@kafka.apache.org>
> > 主题: Re: which Kafka StateStore could I use ?
> >
> > Hello Jie,
> >
> > Just to understand your problem better, are you referring "db" for an
> > external storage engine outside Kafka Streams, and you are asking how to
> > only send one record per aggregation key (assuming you are doing some
> > aggregations with Streams' statestore) to that end storage engine?
> >
> >
> > Guozhang
> >
> >
> > On Wed, Feb 28, 2018 at 7:53 PM, 杰 杨 <fu...@live.com> wrote:
> >
> > >
> > > HI：
> > > I use kafka streams for real-time data analysis
> > > and I meet a problem.
> > > now I process a record in kafka and compute it and send to db.
> > > but db concurrency level is not suit for me.
> > > so I want that
> > > 1）when there is not data in kakfa ,the statestore is  no results.
> > > 2) when there is a lot of data records in kafka the statestore save
> > > computed result and I need send its once to db.
> > > which StateStoe can I use for do that above
> > > ________________________________
> > > funkyyj@live.com
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
> >
>
>
> --
> -- Guozhang
>

Re: 答复: which Kafka StateStore could I use ?

Posted by Guozhang Wang <wa...@gmail.com>.

Hello Jie,

By default Kafka Streams uses caching on top of its internal state stores
to de-dup output streams to the final destination (in your case the DB) so
that for a single key, fewer updates will be generated giving a small
working set. If your aggregation logic follows such key distribution, you
can try enlarge the cache size (by default it is only 50MB) and see if it
helps reduce the downstream traffic to your DB.


Guozhang


On Thu, Mar 1, 2018 at 6:33 PM, 杰 杨 <fu...@live.com> wrote:

> Yes .but the DB’s Concurrent quantity is  the limitation.
> Now I can process 600 records/second
> And I want enhance it
>
> 发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用
>
> 发件人: Guozhang Wang<ma...@gmail.com>
> 发送时间: 2018年3月2日 2:59
> 收件人: users@kafka.apache.org<ma...@kafka.apache.org>
> 主题: Re: which Kafka StateStore could I use ?
>
> Hello Jie,
>
> Just to understand your problem better, are you referring "db" for an
> external storage engine outside Kafka Streams, and you are asking how to
> only send one record per aggregation key (assuming you are doing some
> aggregations with Streams' statestore) to that end storage engine?
>
>
> Guozhang
>
>
> On Wed, Feb 28, 2018 at 7:53 PM, 杰 杨 <fu...@live.com> wrote:
>
> >
> > HI：
> > I use kafka streams for real-time data analysis
> > and I meet a problem.
> > now I process a record in kafka and compute it and send to db.
> > but db concurrency level is not suit for me.
> > so I want that
> > 1）when there is not data in kakfa ,the statestore is  no results.
> > 2) when there is a lot of data records in kafka the statestore save
> > computed result and I need send its once to db.
> > which StateStoe can I use for do that above
> > ________________________________
> > funkyyj@live.com
> >
>
>
>
> --
> -- Guozhang
>
>


-- 
-- Guozhang