You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Aris Alexis <sn...@gmail.com> on 2014/09/04 20:35:29 UTC

Use case

Hello,

I am building a big web application that I want to be massively scalable (I
am using cassandra and titan as a general db).

I want to implement the following:

real time web chat that is persisted so that user a in the future can
recall his chat with user b,c,d much like facebook.
mail like messages in the web application (not sure about this as it is
somewhat covered by the first one)
user activity streams
users subscribing to topics for example florida/musicevents

Could i use kafka for this? can you recommend another technology maybe?

Re: Use case

Posted by Philip O'Toole <ph...@yahoo.com.INVALID>.
Agreed. I can't see this being a good use for Kafka. 

Philip 

-----------------------------------------
http://www.philipotoole.com 


On Thursday, September 4, 2014 9:57 PM, Sharninder <sh...@gmail.com> wrote:
 


Since you want all chats and mail history persisted all the time, I
personally wouldn't recommend kafka for your requirement. Kafka is more
suitable as a streaming system where events expire after a certain time.
Look at something more general purpose like hbase for persisting data
indefinitely.

So, for example all activity streams can go into kafka from where consumers
will pick up messages to parse and put them to hbase or other clients.

--
Sharninder






On Fri, Sep 5, 2014 at 12:05 AM, Aris Alexis <sn...@gmail.com> wrote:

> Hello,
>
> I am building a big web application that I want to be massively scalable (I
> am using cassandra and titan as a general db).
>
> I want to implement the following:
>
> real time web chat that is persisted so that user a in the future can
> recall his chat with user b,c,d much like facebook.
> mail like messages in the web application (not sure about this as it is
> somewhat covered by the first one)
> user activity streams
> users subscribing to topics for example florida/musicevents
>
> Could i use kafka for this? can you recommend another technology maybe?
>

Re: Use case

Posted by Gwen Shapira <gs...@cloudera.com>.
Especially since the same code will need to process all these messages.
Topics are typically used to separate messages between apps, modules,
etc. If all your apps consume all the topics, there's something wrong
in the design.

Gwen

On Fri, Sep 5, 2014 at 9:31 AM, Philip O'Toole
<ph...@yahoo.com.invalid> wrote:
> Yes, IMHO, that is going to be way too many topics. Use a smaller number of topics, and embedded attributes like "tag" and "user" in the messages written to Kafka.
>
> Phiilp
>
>
> -----------------------------------------
> http://www.philipotoole.com
>
>
> On Friday, September 5, 2014 4:21 AM, Sharninder <sh...@gmail.com> wrote:
>
>
>
> I'm not really sure about your exact use-case but I don't think having a
> topic per user is very efficient. Deleting topics in kafka, at the moment,
> isn't really straightforward. You should rethink your date pipeline a bit.
>
> Also, just because kafka has the ability to store messages for a certain
> time, don't think of it as a data store. Kafka is a streaming system, think
> of it as a fast queue that gives you the ability to move your pointer back.
>
> --
> Sharninder
>
>
>
>
> On Fri, Sep 5, 2014 at 4:27 PM, Aris Alexis <ar...@gmail.com>
> wrote:
>
>> Thanks for the reply. If I use it only for activity streams like twitter:
>>
>> I would want a topic for each #tag and a topic for each user and maybe
>> foreach city. Would that be too many topics or it doesn't matter since most
>> of them will be deleted in a specified interval.
>>
>>
>>
>> Best Regards,
>> Aris Giachnis
>>
>>
>> On Fri, Sep 5, 2014 at 6:57 AM, Sharninder <sh...@gmail.com> wrote:
>>
>> > Since you want all chats and mail history persisted all the time, I
>> > personally wouldn't recommend kafka for your requirement. Kafka is more
>> > suitable as a streaming system where events expire after a certain time.
>> > Look at something more general purpose like hbase for persisting data
>> > indefinitely.
>> >
>> > So, for example all activity streams can go into kafka from where
>> consumers
>> > will pick up messages to parse and put them to hbase or other clients.
>> >
>> > --
>> > Sharninder
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Sep 5, 2014 at 12:05 AM, Aris Alexis <sn...@gmail.com>
>> > wrote:
>> >
>> > > Hello,
>> > >
>> > > I am building a big web application that I want to be massively
>> scalable
>> > (I
>> > > am using cassandra and titan as a general db).
>> > >
>> > > I want to implement the following:
>> > >
>> > > real time web chat that is persisted so that user a in the future can
>> > > recall his chat with user b,c,d much like facebook.
>> > > mail like messages in the web application (not sure about this as it is
>> > > somewhat covered by the first one)
>> > > user activity streams
>> > > users subscribing to topics for example florida/musicevents
>> > >
>> > > Could i use kafka for this? can you recommend another technology maybe?
>> > >
>> >
>>

Re: Use case

Posted by Philip O'Toole <ph...@yahoo.com.INVALID>.
Yes, IMHO, that is going to be way too many topics. Use a smaller number of topics, and embedded attributes like "tag" and "user" in the messages written to Kafka.

Phiilp

 
-----------------------------------------
http://www.philipotoole.com 


On Friday, September 5, 2014 4:21 AM, Sharninder <sh...@gmail.com> wrote:
 


I'm not really sure about your exact use-case but I don't think having a
topic per user is very efficient. Deleting topics in kafka, at the moment,
isn't really straightforward. You should rethink your date pipeline a bit.

Also, just because kafka has the ability to store messages for a certain
time, don't think of it as a data store. Kafka is a streaming system, think
of it as a fast queue that gives you the ability to move your pointer back.

--
Sharninder




On Fri, Sep 5, 2014 at 4:27 PM, Aris Alexis <ar...@gmail.com>
wrote:

> Thanks for the reply. If I use it only for activity streams like twitter:
>
> I would want a topic for each #tag and a topic for each user and maybe
> foreach city. Would that be too many topics or it doesn't matter since most
> of them will be deleted in a specified interval.
>
>
>
> Best Regards,
> Aris Giachnis
>
>
> On Fri, Sep 5, 2014 at 6:57 AM, Sharninder <sh...@gmail.com> wrote:
>
> > Since you want all chats and mail history persisted all the time, I
> > personally wouldn't recommend kafka for your requirement. Kafka is more
> > suitable as a streaming system where events expire after a certain time.
> > Look at something more general purpose like hbase for persisting data
> > indefinitely.
> >
> > So, for example all activity streams can go into kafka from where
> consumers
> > will pick up messages to parse and put them to hbase or other clients.
> >
> > --
> > Sharninder
> >
> >
> >
> >
> >
> > On Fri, Sep 5, 2014 at 12:05 AM, Aris Alexis <sn...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > I am building a big web application that I want to be massively
> scalable
> > (I
> > > am using cassandra and titan as a general db).
> > >
> > > I want to implement the following:
> > >
> > > real time web chat that is persisted so that user a in the future can
> > > recall his chat with user b,c,d much like facebook.
> > > mail like messages in the web application (not sure about this as it is
> > > somewhat covered by the first one)
> > > user activity streams
> > > users subscribing to topics for example florida/musicevents
> > >
> > > Could i use kafka for this? can you recommend another technology maybe?
> > >
> >
>

Re: Use case

Posted by Aris Alexis <sn...@gmail.com>.
Dear cristian

You seem to have spent considerable time in thinking about this.
I thank you a lot and will study it to take ideas.

If you didn't then wow congrats you have a fast cpu there :)

Aris
On Sep 5, 2014 7:25 PM, "Christian Csar" <ca...@gmail.com> wrote:

> The thought experiment I did ended up having a set of front end servers
> corresponding to a given chunk of the user id space, each of which was a
> separate subscriber to the same set of partitions. The you have one or
> more partitions corresponding to that same chunk of users. You want the
> chunk/set of partitions to be of a size where each of those front end
> servers can process all the messages in it and send out the
> chats/notifications/status change notifications perhaps/read receipts,
> to those users who happen to be connected to the particular front end node.
>
> You would need to handle some deduplication on the consumers/FE servers
> and would need to decide where to produce. Producing from every front
> end server to potentially every broker could be expensive in terms of
> connections and you might want to first relay the messages to the
> corresponding front end cluster, but since we don't use large numbers of
> producers it's hard for me to say.
>
> For persistence and offline delivery you can probably accept a delay in
> user receipt so you can use another set of consumers that persist the
> messages to the longer latency datastore on the backend and then get the
> last 50 or so messages with a bit of lag when the user first looks at
> history (see hipchat and hangouts lag).
>
> This gives you a smaller number of partitions and avoids the issue of
> having to keep too much history on the Kafka brokers. There are
> obviously a significant number of complexities to deal with. For example
> if you are using default consumer code that commits offsets into
> zookeeper it may be inadvisable at large scales you probably don't need
> to worry about reaching. And remember I had done this only as a thought
> experiment not a proper technical evaluation. I expect Kafka, used
> correctly, can make aspects of building such a chat system much much
> easier (you can avoid writing your own message replication system) but
> it is definitely not plug and play using topics for users.
>
> Christian
>
>
> On 09/05/2014 09:46 AM, Jonathan Weeks wrote:
> > +1
> >
> > Topic Deletion with 0.8.1.1 is extremely problematic, and coupled with
> the fact that rebalance/broker membership changes pay a cost per partition
> today, whereby excessive partitions extend downtime in the case of a
> failure; this means fewer topics (e.g. hundreds or thousands) is a best
> practice in the published version of kafka.
> >
> > There are also secondary impacts on topic count — e.g. useful
> operational tools such as: http://quantifind.com/KafkaOffsetMonitor/
> start to become problematic in terms of UX with a massive number of topics.
> >
> > Once topic deletion is a supported feature, the use-case outlined might
> be more tenable.
> >
> > Best Regards,
> >
> > -Jonathan
> >
> > On Sep 5, 2014, at 4:20 AM, Sharninder <sh...@gmail.com> wrote:
> >
> >> I'm not really sure about your exact use-case but I don't think having a
> >> topic per user is very efficient. Deleting topics in kafka, at the
> moment,
> >> isn't really straightforward. You should rethink your date pipeline a
> bit.
> >>
> >> Also, just because kafka has the ability to store messages for a certain
> >> time, don't think of it as a data store. Kafka is a streaming system,
> think
> >> of it as a fast queue that gives you the ability to move your pointer
> back.
> >>
> >> --
> >> Sharninder
> >>
> >>
> >>
> >> On Fri, Sep 5, 2014 at 4:27 PM, Aris Alexis <ar...@gmail.com>
> >> wrote:
> >>
> >>> Thanks for the reply. If I use it only for activity streams like
> twitter:
> >>>
> >>> I would want a topic for each #tag and a topic for each user and maybe
> >>> foreach city. Would that be too many topics or it doesn't matter since
> most
> >>> of them will be deleted in a specified interval.
> >>>
> >>>
> >>>
> >>> Best Regards,
> >>> Aris Giachnis
> >>>
> >>>
> >>> On Fri, Sep 5, 2014 at 6:57 AM, Sharninder <sh...@gmail.com>
> wrote:
> >>>
> >>>> Since you want all chats and mail history persisted all the time, I
> >>>> personally wouldn't recommend kafka for your requirement. Kafka is
> more
> >>>> suitable as a streaming system where events expire after a certain
> time.
> >>>> Look at something more general purpose like hbase for persisting data
> >>>> indefinitely.
> >>>>
> >>>> So, for example all activity streams can go into kafka from where
> >>> consumers
> >>>> will pick up messages to parse and put them to hbase or other clients.
> >>>>
> >>>> --
> >>>> Sharninder
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Sep 5, 2014 at 12:05 AM, Aris Alexis <sn...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> I am building a big web application that I want to be massively
> >>> scalable
> >>>> (I
> >>>>> am using cassandra and titan as a general db).
> >>>>>
> >>>>> I want to implement the following:
> >>>>>
> >>>>> real time web chat that is persisted so that user a in the future can
> >>>>> recall his chat with user b,c,d much like facebook.
> >>>>> mail like messages in the web application (not sure about this as it
> is
> >>>>> somewhat covered by the first one)
> >>>>> user activity streams
> >>>>> users subscribing to topics for example florida/musicevents
> >>>>>
> >>>>> Could i use kafka for this? can you recommend another technology
> maybe?
> >>>>>
> >>>>
> >>>
> >
> >
>
>
>

Re: Use case

Posted by Christian Csar <ca...@gmail.com>.
The thought experiment I did ended up having a set of front end servers
corresponding to a given chunk of the user id space, each of which was a
separate subscriber to the same set of partitions. The you have one or
more partitions corresponding to that same chunk of users. You want the
chunk/set of partitions to be of a size where each of those front end
servers can process all the messages in it and send out the
chats/notifications/status change notifications perhaps/read receipts,
to those users who happen to be connected to the particular front end node.

You would need to handle some deduplication on the consumers/FE servers
and would need to decide where to produce. Producing from every front
end server to potentially every broker could be expensive in terms of
connections and you might want to first relay the messages to the
corresponding front end cluster, but since we don't use large numbers of
producers it's hard for me to say.

For persistence and offline delivery you can probably accept a delay in
user receipt so you can use another set of consumers that persist the
messages to the longer latency datastore on the backend and then get the
last 50 or so messages with a bit of lag when the user first looks at
history (see hipchat and hangouts lag).

This gives you a smaller number of partitions and avoids the issue of
having to keep too much history on the Kafka brokers. There are
obviously a significant number of complexities to deal with. For example
if you are using default consumer code that commits offsets into
zookeeper it may be inadvisable at large scales you probably don't need
to worry about reaching. And remember I had done this only as a thought
experiment not a proper technical evaluation. I expect Kafka, used
correctly, can make aspects of building such a chat system much much
easier (you can avoid writing your own message replication system) but
it is definitely not plug and play using topics for users.

Christian


On 09/05/2014 09:46 AM, Jonathan Weeks wrote:
> +1
> 
> Topic Deletion with 0.8.1.1 is extremely problematic, and coupled with the fact that rebalance/broker membership changes pay a cost per partition today, whereby excessive partitions extend downtime in the case of a failure; this means fewer topics (e.g. hundreds or thousands) is a best practice in the published version of kafka. 
> 
> There are also secondary impacts on topic count — e.g. useful operational tools such as: http://quantifind.com/KafkaOffsetMonitor/ start to become problematic in terms of UX with a massive number of topics.
> 
> Once topic deletion is a supported feature, the use-case outlined might be more tenable.
> 
> Best Regards,
> 
> -Jonathan
> 
> On Sep 5, 2014, at 4:20 AM, Sharninder <sh...@gmail.com> wrote:
> 
>> I'm not really sure about your exact use-case but I don't think having a
>> topic per user is very efficient. Deleting topics in kafka, at the moment,
>> isn't really straightforward. You should rethink your date pipeline a bit.
>>
>> Also, just because kafka has the ability to store messages for a certain
>> time, don't think of it as a data store. Kafka is a streaming system, think
>> of it as a fast queue that gives you the ability to move your pointer back.
>>
>> --
>> Sharninder
>>
>>
>>
>> On Fri, Sep 5, 2014 at 4:27 PM, Aris Alexis <ar...@gmail.com>
>> wrote:
>>
>>> Thanks for the reply. If I use it only for activity streams like twitter:
>>>
>>> I would want a topic for each #tag and a topic for each user and maybe
>>> foreach city. Would that be too many topics or it doesn't matter since most
>>> of them will be deleted in a specified interval.
>>>
>>>
>>>
>>> Best Regards,
>>> Aris Giachnis
>>>
>>>
>>> On Fri, Sep 5, 2014 at 6:57 AM, Sharninder <sh...@gmail.com> wrote:
>>>
>>>> Since you want all chats and mail history persisted all the time, I
>>>> personally wouldn't recommend kafka for your requirement. Kafka is more
>>>> suitable as a streaming system where events expire after a certain time.
>>>> Look at something more general purpose like hbase for persisting data
>>>> indefinitely.
>>>>
>>>> So, for example all activity streams can go into kafka from where
>>> consumers
>>>> will pick up messages to parse and put them to hbase or other clients.
>>>>
>>>> --
>>>> Sharninder
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Sep 5, 2014 at 12:05 AM, Aris Alexis <sn...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am building a big web application that I want to be massively
>>> scalable
>>>> (I
>>>>> am using cassandra and titan as a general db).
>>>>>
>>>>> I want to implement the following:
>>>>>
>>>>> real time web chat that is persisted so that user a in the future can
>>>>> recall his chat with user b,c,d much like facebook.
>>>>> mail like messages in the web application (not sure about this as it is
>>>>> somewhat covered by the first one)
>>>>> user activity streams
>>>>> users subscribing to topics for example florida/musicevents
>>>>>
>>>>> Could i use kafka for this? can you recommend another technology maybe?
>>>>>
>>>>
>>>
> 
> 



Re: Use case

Posted by Jonathan Weeks <jo...@gmail.com>.
+1

Topic Deletion with 0.8.1.1 is extremely problematic, and coupled with the fact that rebalance/broker membership changes pay a cost per partition today, whereby excessive partitions extend downtime in the case of a failure; this means fewer topics (e.g. hundreds or thousands) is a best practice in the published version of kafka. 

There are also secondary impacts on topic count — e.g. useful operational tools such as: http://quantifind.com/KafkaOffsetMonitor/ start to become problematic in terms of UX with a massive number of topics.

Once topic deletion is a supported feature, the use-case outlined might be more tenable.

Best Regards,

-Jonathan

On Sep 5, 2014, at 4:20 AM, Sharninder <sh...@gmail.com> wrote:

> I'm not really sure about your exact use-case but I don't think having a
> topic per user is very efficient. Deleting topics in kafka, at the moment,
> isn't really straightforward. You should rethink your date pipeline a bit.
> 
> Also, just because kafka has the ability to store messages for a certain
> time, don't think of it as a data store. Kafka is a streaming system, think
> of it as a fast queue that gives you the ability to move your pointer back.
> 
> --
> Sharninder
> 
> 
> 
> On Fri, Sep 5, 2014 at 4:27 PM, Aris Alexis <ar...@gmail.com>
> wrote:
> 
>> Thanks for the reply. If I use it only for activity streams like twitter:
>> 
>> I would want a topic for each #tag and a topic for each user and maybe
>> foreach city. Would that be too many topics or it doesn't matter since most
>> of them will be deleted in a specified interval.
>> 
>> 
>> 
>> Best Regards,
>> Aris Giachnis
>> 
>> 
>> On Fri, Sep 5, 2014 at 6:57 AM, Sharninder <sh...@gmail.com> wrote:
>> 
>>> Since you want all chats and mail history persisted all the time, I
>>> personally wouldn't recommend kafka for your requirement. Kafka is more
>>> suitable as a streaming system where events expire after a certain time.
>>> Look at something more general purpose like hbase for persisting data
>>> indefinitely.
>>> 
>>> So, for example all activity streams can go into kafka from where
>> consumers
>>> will pick up messages to parse and put them to hbase or other clients.
>>> 
>>> --
>>> Sharninder
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Sep 5, 2014 at 12:05 AM, Aris Alexis <sn...@gmail.com>
>>> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I am building a big web application that I want to be massively
>> scalable
>>> (I
>>>> am using cassandra and titan as a general db).
>>>> 
>>>> I want to implement the following:
>>>> 
>>>> real time web chat that is persisted so that user a in the future can
>>>> recall his chat with user b,c,d much like facebook.
>>>> mail like messages in the web application (not sure about this as it is
>>>> somewhat covered by the first one)
>>>> user activity streams
>>>> users subscribing to topics for example florida/musicevents
>>>> 
>>>> Could i use kafka for this? can you recommend another technology maybe?
>>>> 
>>> 
>> 


Re: Use case

Posted by Sharninder <sh...@gmail.com>.
I'm not really sure about your exact use-case but I don't think having a
topic per user is very efficient. Deleting topics in kafka, at the moment,
isn't really straightforward. You should rethink your date pipeline a bit.

Also, just because kafka has the ability to store messages for a certain
time, don't think of it as a data store. Kafka is a streaming system, think
of it as a fast queue that gives you the ability to move your pointer back.

--
Sharninder



On Fri, Sep 5, 2014 at 4:27 PM, Aris Alexis <ar...@gmail.com>
wrote:

> Thanks for the reply. If I use it only for activity streams like twitter:
>
> I would want a topic for each #tag and a topic for each user and maybe
> foreach city. Would that be too many topics or it doesn't matter since most
> of them will be deleted in a specified interval.
>
>
>
> Best Regards,
> Aris Giachnis
>
>
> On Fri, Sep 5, 2014 at 6:57 AM, Sharninder <sh...@gmail.com> wrote:
>
> > Since you want all chats and mail history persisted all the time, I
> > personally wouldn't recommend kafka for your requirement. Kafka is more
> > suitable as a streaming system where events expire after a certain time.
> > Look at something more general purpose like hbase for persisting data
> > indefinitely.
> >
> > So, for example all activity streams can go into kafka from where
> consumers
> > will pick up messages to parse and put them to hbase or other clients.
> >
> > --
> > Sharninder
> >
> >
> >
> >
> >
> > On Fri, Sep 5, 2014 at 12:05 AM, Aris Alexis <sn...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > I am building a big web application that I want to be massively
> scalable
> > (I
> > > am using cassandra and titan as a general db).
> > >
> > > I want to implement the following:
> > >
> > > real time web chat that is persisted so that user a in the future can
> > > recall his chat with user b,c,d much like facebook.
> > > mail like messages in the web application (not sure about this as it is
> > > somewhat covered by the first one)
> > > user activity streams
> > > users subscribing to topics for example florida/musicevents
> > >
> > > Could i use kafka for this? can you recommend another technology maybe?
> > >
> >
>

Re: Use case

Posted by Aris Alexis <sn...@gmail.com>.
Thanks for the reply. If I use it only for activity streams like twitter:

I would want a topic for each #tag and a topic for each user and maybe
foreach city. Would that be too many topics or it doesn't matter since most
of them will be deleted in a specified interval.



Best Regards,
Aris Giachnis


On Fri, Sep 5, 2014 at 12:57 PM, Aris Alexis <ar...@gmail.com>
wrote:

> Thanks for the reply. If I use it only for activity streams like twitter:
>
> I would want a topic for each #tag and a topic for each user and maybe
> foreach city. Would that be too many topics or it doesn't matter since most
> of them will be deleted in a specified interval.
>
>
>
> Best Regards,
> Aris Giachnis
>
>
> On Fri, Sep 5, 2014 at 6:57 AM, Sharninder <sh...@gmail.com> wrote:
>
>> Since you want all chats and mail history persisted all the time, I
>> personally wouldn't recommend kafka for your requirement. Kafka is more
>> suitable as a streaming system where events expire after a certain time.
>> Look at something more general purpose like hbase for persisting data
>> indefinitely.
>>
>> So, for example all activity streams can go into kafka from where
>> consumers
>> will pick up messages to parse and put them to hbase or other clients.
>>
>> --
>> Sharninder
>>
>>
>>
>>
>>
>> On Fri, Sep 5, 2014 at 12:05 AM, Aris Alexis <sn...@gmail.com>
>> wrote:
>>
>> > Hello,
>> >
>> > I am building a big web application that I want to be massively
>> scalable (I
>> > am using cassandra and titan as a general db).
>> >
>> > I want to implement the following:
>> >
>> > real time web chat that is persisted so that user a in the future can
>> > recall his chat with user b,c,d much like facebook.
>> > mail like messages in the web application (not sure about this as it is
>> > somewhat covered by the first one)
>> > user activity streams
>> > users subscribing to topics for example florida/musicevents
>> >
>> > Could i use kafka for this? can you recommend another technology maybe?
>> >
>>
>
>

Re: Use case

Posted by Aris Alexis <ar...@gmail.com>.
Thanks for the reply. If I use it only for activity streams like twitter:

I would want a topic for each #tag and a topic for each user and maybe
foreach city. Would that be too many topics or it doesn't matter since most
of them will be deleted in a specified interval.



Best Regards,
Aris Giachnis


On Fri, Sep 5, 2014 at 6:57 AM, Sharninder <sh...@gmail.com> wrote:

> Since you want all chats and mail history persisted all the time, I
> personally wouldn't recommend kafka for your requirement. Kafka is more
> suitable as a streaming system where events expire after a certain time.
> Look at something more general purpose like hbase for persisting data
> indefinitely.
>
> So, for example all activity streams can go into kafka from where consumers
> will pick up messages to parse and put them to hbase or other clients.
>
> --
> Sharninder
>
>
>
>
>
> On Fri, Sep 5, 2014 at 12:05 AM, Aris Alexis <sn...@gmail.com>
> wrote:
>
> > Hello,
> >
> > I am building a big web application that I want to be massively scalable
> (I
> > am using cassandra and titan as a general db).
> >
> > I want to implement the following:
> >
> > real time web chat that is persisted so that user a in the future can
> > recall his chat with user b,c,d much like facebook.
> > mail like messages in the web application (not sure about this as it is
> > somewhat covered by the first one)
> > user activity streams
> > users subscribing to topics for example florida/musicevents
> >
> > Could i use kafka for this? can you recommend another technology maybe?
> >
>

Re: Use case

Posted by Sharninder <sh...@gmail.com>.
Since you want all chats and mail history persisted all the time, I
personally wouldn't recommend kafka for your requirement. Kafka is more
suitable as a streaming system where events expire after a certain time.
Look at something more general purpose like hbase for persisting data
indefinitely.

So, for example all activity streams can go into kafka from where consumers
will pick up messages to parse and put them to hbase or other clients.

--
Sharninder





On Fri, Sep 5, 2014 at 12:05 AM, Aris Alexis <sn...@gmail.com> wrote:

> Hello,
>
> I am building a big web application that I want to be massively scalable (I
> am using cassandra and titan as a general db).
>
> I want to implement the following:
>
> real time web chat that is persisted so that user a in the future can
> recall his chat with user b,c,d much like facebook.
> mail like messages in the web application (not sure about this as it is
> somewhat covered by the first one)
> user activity streams
> users subscribing to topics for example florida/musicevents
>
> Could i use kafka for this? can you recommend another technology maybe?
>

Re: Use case

Posted by "cacsar@gmail.com" <ca...@gmail.com>.
I believe there are architectures for the chat system that can use Kafka in
a sensible fashion to achieve certain of the difficult aspects. However
doing partition per user would not be advisable, nor I imagine would
relying on Kafka's storage for checking for past or expired messages. (I've
done the thought exercise on how to build a scalable chat system with kafka
before).

Christian


On Thu, Sep 4, 2014 at 11:35 AM, Aris Alexis <sn...@gmail.com> wrote:

> Hello,
>
> I am building a big web application that I want to be massively scalable (I
> am using cassandra and titan as a general db).
>
> I want to implement the following:
>
> real time web chat that is persisted so that user a in the future can
> recall his chat with user b,c,d much like facebook.
> mail like messages in the web application (not sure about this as it is
> somewhat covered by the first one)
> user activity streams
> users subscribing to topics for example florida/musicevents
>
> Could i use kafka for this? can you recommend another technology maybe?
>