You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Susheel Kumar <su...@gmail.com> on 2016/12/15 18:00:05 UTC

Kafka as a database/repository question

Hello Folks,

I am going thru an existing design where Kafka is planned to be utilised in
below manner


   1. Messages will pushed to Kafka by producers
   2. There will be updates to existing messages on ongoing basis.  The
   expectation is that all the updates are consolidated in Kafka and the
   latest and greatest version/copy is kept
   3. Consumers will read the messages from Kafka and push to Solr for
   ingestion purposes
   4. There will be no purging/removal of messages since it is expected to
   replay the messages in the future and perform full-re-ingestion.  So
   messages will be kept in Kafka for indefinite period similar to database
   where data once stored remains there and can be used later in teh future.


Do you see any pitfalls / any issue with this design especially wrt to
storing the messages indefinitely.


Thanks,
Susheel

Re: Kafka as a database/repository question

Posted by Susheel Kumar <su...@gmail.com>.

Sorry, I do not have any info on backup and recovery plan at this point of
time.  Please consider both cases (no backup AND back up)

On Thu, Dec 15, 2016 at 1:06 PM, Tauzell, Dave <Dave.Tauzell@surescripts.com
> wrote:

> What is the plan for backup and recovery of the kafka data?
>
> -Dave
>
> -----Original Message-----
> From: Susheel Kumar [mailto:susheel2777@gmail.com]
> Sent: Thursday, December 15, 2016 12:00 PM
> To: users@kafka.apache.org
> Subject: Kafka as a database/repository question
>
> Hello Folks,
>
> I am going thru an existing design where Kafka is planned to be utilised
> in below manner
>
>
>    1. Messages will pushed to Kafka by producers
>    2. There will be updates to existing messages on ongoing basis.  The
>    expectation is that all the updates are consolidated in Kafka and the
>    latest and greatest version/copy is kept
>    3. Consumers will read the messages from Kafka and push to Solr for
>    ingestion purposes
>    4. There will be no purging/removal of messages since it is expected to
>    replay the messages in the future and perform full-re-ingestion.  So
>    messages will be kept in Kafka for indefinite period similar to database
>    where data once stored remains there and can be used later in teh
> future.
>
>
> Do you see any pitfalls / any issue with this design especially wrt to
> storing the messages indefinitely.
>
>
> Thanks,
> Susheel
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>

RE: Kafka as a database/repository question

Posted by "Tauzell, Dave" <Da...@surescripts.com>.

What is the plan for backup and recovery of the kafka data?

-Dave

-----Original Message-----
From: Susheel Kumar [mailto:susheel2777@gmail.com]
Sent: Thursday, December 15, 2016 12:00 PM
To: users@kafka.apache.org
Subject: Kafka as a database/repository question

Hello Folks,

I am going thru an existing design where Kafka is planned to be utilised in below manner

   1. Messages will pushed to Kafka by producers
   2. There will be updates to existing messages on ongoing basis.  The
   expectation is that all the updates are consolidated in Kafka and the
   latest and greatest version/copy is kept
   3. Consumers will read the messages from Kafka and push to Solr for
   ingestion purposes
   4. There will be no purging/removal of messages since it is expected to
   replay the messages in the future and perform full-re-ingestion.  So
   messages will be kept in Kafka for indefinite period similar to database
   where data once stored remains there and can be used later in teh future.

Do you see any pitfalls / any issue with this design especially wrt to storing the messages indefinitely.

Thanks,
Susheel
This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.

Re: Kafka as a database/repository question

Posted by Susheel Kumar <su...@gmail.com>.

Thanks, Hans for the insight. Will use compacted topic.

On Thu, Dec 15, 2016 at 3:53 PM, Hans Jespersen <ha...@confluent.io> wrote:

> for #2 definitely use a compacted topic. Compaction will remove old
> messages and keep the last update for each key. To use this function you
> will need to publish messages as Key/Value pairs. Apache Kafka 0.10.1 has
> some important fixes to make compacted topics more reliable when scaling to
> large numbers of keys so make sure to use the latest release if this
> becomes a large amount of data.
>
> #3 sounds like a Kafka Sink Connector for Solr (something like this
> https://github.com/jcustenborder/kafka-connect-solr)
>
> #4 messages in compacted topics do not expire and are only removed when
> updated by a newer message of the same key.
>
> -hans
>
> /**
>  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
>  * hans@confluent.io (650)924-2670
>  */
>
> On Thu, Dec 15, 2016 at 10:16 AM, Kenny Gorman <ke...@eventador.io> wrote:
>
> > A couple thoughts..
> >
> > - If you plan on fetching old messages in a non-contiguous manner then
> > this may not be the best design. For instance, “give me messages from
> > mondays for the last 3 quarters” is better served with a database. But if
> > you want to say “give me messages from the last month until now” that
> works
> > great.
> >
> > - I am not sure what you mean by updating messages. You would need to
> have
> > some sort of key and push in new messages with that key. Then when you
> read
> > by key, the application should understand that the latest is the version
> it
> > should use.
> >
> > - Alternatively, you can consume to something like a DB and use SQL to
> > select what you want using regular SQL. We see this pattern a lot.
> >
> > - For storing messages indefinitely it’s mostly making sure the config
> > options are set appropriately and you have enough storage space. Set
> > replication to something that makes you comfortable, maybe take backups
> as
> > was mentioned.
> >
> > Hope this helps some
> >
> > Kenny Gorman
> > Founder
> > www.eventador.io
> >
> >
> > > On Dec 15, 2016, at 12:00 PM, Susheel Kumar <su...@gmail.com>
> > wrote:
> > >
> > > Hello Folks,
> > >
> > > I am going thru an existing design where Kafka is planned to be
> utilised
> > in
> > > below manner
> > >
> > >
> > >   1. Messages will pushed to Kafka by producers
> > >   2. There will be updates to existing messages on ongoing basis.  The
> > >   expectation is that all the updates are consolidated in Kafka and the
> > >   latest and greatest version/copy is kept
> > >   3. Consumers will read the messages from Kafka and push to Solr for
> > >   ingestion purposes
> > >   4. There will be no purging/removal of messages since it is expected
> to
> > >   replay the messages in the future and perform full-re-ingestion.  So
> > >   messages will be kept in Kafka for indefinite period similar to
> > database
> > >   where data once stored remains there and can be used later in teh
> > future.
> > >
> > >
> > > Do you see any pitfalls / any issue with this design especially wrt to
> > > storing the messages indefinitely.
> > >
> > >
> > > Thanks,
> > > Susheel
> >
> >
>

Re: Kafka as a database/repository question

Posted by Hans Jespersen <ha...@confluent.io>.

for #2 definitely use a compacted topic. Compaction will remove old
messages and keep the last update for each key. To use this function you
will need to publish messages as Key/Value pairs. Apache Kafka 0.10.1 has
some important fixes to make compacted topics more reliable when scaling to
large numbers of keys so make sure to use the latest release if this
becomes a large amount of data.

#3 sounds like a Kafka Sink Connector for Solr (something like this
https://github.com/jcustenborder/kafka-connect-solr)

#4 messages in compacted topics do not expire and are only removed when
updated by a newer message of the same key.

-hans

/**
 * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
 * hans@confluent.io (650)924-2670
 */

On Thu, Dec 15, 2016 at 10:16 AM, Kenny Gorman <ke...@eventador.io> wrote:

> A couple thoughts..
>
> - If you plan on fetching old messages in a non-contiguous manner then
> this may not be the best design. For instance, “give me messages from
> mondays for the last 3 quarters” is better served with a database. But if
> you want to say “give me messages from the last month until now” that works
> great.
>
> - I am not sure what you mean by updating messages. You would need to have
> some sort of key and push in new messages with that key. Then when you read
> by key, the application should understand that the latest is the version it
> should use.
>
> - Alternatively, you can consume to something like a DB and use SQL to
> select what you want using regular SQL. We see this pattern a lot.
>
> - For storing messages indefinitely it’s mostly making sure the config
> options are set appropriately and you have enough storage space. Set
> replication to something that makes you comfortable, maybe take backups as
> was mentioned.
>
> Hope this helps some
>
> Kenny Gorman
> Founder
> www.eventador.io
>
>
> > On Dec 15, 2016, at 12:00 PM, Susheel Kumar <su...@gmail.com>
> wrote:
> >
> > Hello Folks,
> >
> > I am going thru an existing design where Kafka is planned to be utilised
> in
> > below manner
> >
> >
> >   1. Messages will pushed to Kafka by producers
> >   2. There will be updates to existing messages on ongoing basis.  The
> >   expectation is that all the updates are consolidated in Kafka and the
> >   latest and greatest version/copy is kept
> >   3. Consumers will read the messages from Kafka and push to Solr for
> >   ingestion purposes
> >   4. There will be no purging/removal of messages since it is expected to
> >   replay the messages in the future and perform full-re-ingestion.  So
> >   messages will be kept in Kafka for indefinite period similar to
> database
> >   where data once stored remains there and can be used later in teh
> future.
> >
> >
> > Do you see any pitfalls / any issue with this design especially wrt to
> > storing the messages indefinitely.
> >
> >
> > Thanks,
> > Susheel
>
>

Re: Kafka as a database/repository question

Posted by Susheel Kumar <su...@gmail.com>.

Thanks, Kenny for confirming.  Message updates I mean to say that for same
document/message there will be updates coming in (for e.g. person details
may change).  As you mentioned using the proper key should make that happen
so good on that.

On Thu, Dec 15, 2016 at 1:16 PM, Kenny Gorman <ke...@eventador.io> wrote:

> A couple thoughts..
>
> - If you plan on fetching old messages in a non-contiguous manner then
> this may not be the best design. For instance, “give me messages from
> mondays for the last 3 quarters” is better served with a database. But if
> you want to say “give me messages from the last month until now” that works
> great.
>
> - I am not sure what you mean by updating messages. You would need to have
> some sort of key and push in new messages with that key. Then when you read
> by key, the application should understand that the latest is the version it
> should use.
>
> - Alternatively, you can consume to something like a DB and use SQL to
> select what you want using regular SQL. We see this pattern a lot.
>
> - For storing messages indefinitely it’s mostly making sure the config
> options are set appropriately and you have enough storage space. Set
> replication to something that makes you comfortable, maybe take backups as
> was mentioned.
>
> Hope this helps some
>
> Kenny Gorman
> Founder
> www.eventador.io
>
>
> > On Dec 15, 2016, at 12:00 PM, Susheel Kumar <su...@gmail.com>
> wrote:
> >
> > Hello Folks,
> >
> > I am going thru an existing design where Kafka is planned to be utilised
> in
> > below manner
> >
> >
> >   1. Messages will pushed to Kafka by producers
> >   2. There will be updates to existing messages on ongoing basis.  The
> >   expectation is that all the updates are consolidated in Kafka and the
> >   latest and greatest version/copy is kept
> >   3. Consumers will read the messages from Kafka and push to Solr for
> >   ingestion purposes
> >   4. There will be no purging/removal of messages since it is expected to
> >   replay the messages in the future and perform full-re-ingestion.  So
> >   messages will be kept in Kafka for indefinite period similar to
> database
> >   where data once stored remains there and can be used later in teh
> future.
> >
> >
> > Do you see any pitfalls / any issue with this design especially wrt to
> > storing the messages indefinitely.
> >
> >
> > Thanks,
> > Susheel
>
>

Re: Kafka as a database/repository question

Posted by Kenny Gorman <ke...@eventador.io>.

A couple thoughts..

- If you plan on fetching old messages in a non-contiguous manner then this may not be the best design. For instance, “give me messages from mondays for the last 3 quarters” is better served with a database. But if you want to say “give me messages from the last month until now” that works great.

- I am not sure what you mean by updating messages. You would need to have some sort of key and push in new messages with that key. Then when you read by key, the application should understand that the latest is the version it should use.

- Alternatively, you can consume to something like a DB and use SQL to select what you want using regular SQL. We see this pattern a lot.

- For storing messages indefinitely it’s mostly making sure the config options are set appropriately and you have enough storage space. Set replication to something that makes you comfortable, maybe take backups as was mentioned.

Hope this helps some

Kenny Gorman
Founder
www.eventador.io


> On Dec 15, 2016, at 12:00 PM, Susheel Kumar <su...@gmail.com> wrote:
> 
> Hello Folks,
> 
> I am going thru an existing design where Kafka is planned to be utilised in
> below manner
> 
> 
>   1. Messages will pushed to Kafka by producers
>   2. There will be updates to existing messages on ongoing basis.  The
>   expectation is that all the updates are consolidated in Kafka and the
>   latest and greatest version/copy is kept
>   3. Consumers will read the messages from Kafka and push to Solr for
>   ingestion purposes
>   4. There will be no purging/removal of messages since it is expected to
>   replay the messages in the future and perform full-re-ingestion.  So
>   messages will be kept in Kafka for indefinite period similar to database
>   where data once stored remains there and can be used later in teh future.
> 
> 
> Do you see any pitfalls / any issue with this design especially wrt to
> storing the messages indefinitely.
> 
> 
> Thanks,
> Susheel