You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Matt Stone <ms...@nexeohr.com> on 2018/03/02 20:21:19 UTC

Consultant Help

We are looking for a consultant or contractor that can come onsite to our Ogden, Utah location in the US, to help with a Kafka set up and maintenance project.  What we need is someone with the knowledge and experience to build out the Kafka environment from scratch.  

We are thinking they would need to be onsite for 6-12 months  to set it up, and mentor some of our team so they can get up to speed to do the maintenance once the contractor is gone.  If anyone has the experience setting up Kafka from scratch in a Linux environment, maintain node clusters, and help train others on the team how to do it, and you are interested in a long term project working at the client site, I would love to start up  a discussion, to see if we could use you for the role. 

I would also be interested in hearing about any consulting firms that might have resources that could help with this role. 

Matt Stone


-----Original Message-----
From: Matt Daum [mailto:matt@setfive.com] 
Sent: Friday, March 2, 2018 1:11 PM
To: users@kafka.apache.org
Subject: Re: Kafka Setup for Daily counts on wide array of keys

Actually it looks like the better way would be to output the counts to a new topic then ingest that topic into the DB itself.  Is that the correct way?

On Fri, Mar 2, 2018 at 9:24 AM, Matt Daum <ma...@setfive.com> wrote:

> I am new to Kafka but I think I have a good use case for it.  I am 
> trying to build daily counts of requests based on a number of 
> different attributes in a high throughput system (~1 million 
> requests/sec. across all  8 servers).  The different attributes are 
> unbounded in terms of values, and some will spread across 100's of 
> millions values.  This is my current through process, let me know 
> where I could be more efficient or if there is a better way to do it.
>
> I'll create an AVRO object "Impression" which has all the attributes 
> of the inbound request.  My application servers then will on each 
> request create and send this to a single kafka topic.
>
> I'll then have a consumer which creates a stream from the topic.  From 
> there I'll use the windowed timeframes and groupBy to group by the 
> attributes on each given day.  At the end of the day I'd need to read 
> out the data store to an external system for storage.  Since I won't 
> know all the values I'd need something similar to the KVStore.all() 
> but for WindowedKV Stores.  This appears that it'd be possible in 1.1 
> with this
> commit: https://github.com/apache/kafka/commit/
> 1d1c8575961bf6bce7decb049be7f10ca76bd0c5 .
>
> Is this the best approach to doing this?  Or would I be better using 
> the stream to listen and then an external DB like Aerospike to store 
> the counts and read out of it directly end of day.
>
> Thanks for the help!
> Daum
>

Re: Consultant Help

Posted by Martin Gainty <mg...@hotmail.com>.



________________________________
From: Svante Karlsson <sv...@csi.se>
Sent: Friday, March 2, 2018 3:50 PM
To: users@kafka.apache.org
Subject: Re: Consultant Help

try https://www.confluent.io/ - that's what they do
[https://www.confluent.io/wp-content/uploads/Untitled-design-12.png]<https://www.confluent.io/>

Confluent: Apache Kafka & Streaming Platform for the ...<https://www.confluent.io/>
www.confluent.io
Confluent, founded by the creators of Apache Kafka, delivers a complete execution of Kafka for the Enterprise, to help you run your business in real time.




/svante
mg>Svante..you and I know Kafka as well but a relo to SLC is a tall order indeed
mg>Currently looking for onsite resource from LDS resource I used to work for in SLC
mg>please contact me offline for appropriate details

2018-03-02 21:21 GMT+01:00 Matt Stone <ms...@nexeohr.com>:

> We are looking for a consultant or contractor that can come onsite to our
> Ogden, Utah location in the US, to help with a Kafka set up and maintenance
> project.  What we need is someone with the knowledge and experience to
> build out the Kafka environment from scratch.
>
> We are thinking they would need to be onsite for 6-12 months  to set it
> up, and mentor some of our team so they can get up to speed to do the
> maintenance once the contractor is gone.  If anyone has the experience
> setting up Kafka from scratch in a Linux environment, maintain node
> clusters, and help train others on the team how to do it, and you are
> interested in a long term project working at the client site, I would love
> to start up  a discussion, to see if we could use you for the role.
>
> I would also be interested in hearing about any consulting firms that
> might have resources that could help with this role.
>
> Matt Stone
>
>
> -----Original Message-----
> From: Matt Daum [mailto:matt@setfive.com]
> Sent: Friday, March 2, 2018 1:11 PM
> To: users@kafka.apache.org
> Subject: Re: Kafka Setup for Daily counts on wide array of keys
>
> Actually it looks like the better way would be to output the counts to a
> new topic then ingest that topic into the DB itself.  Is that the correct
> way?
>
> On Fri, Mar 2, 2018 at 9:24 AM, Matt Daum <ma...@setfive.com> wrote:
>
> > I am new to Kafka but I think I have a good use case for it.  I am
> > trying to build daily counts of requests based on a number of
> > different attributes in a high throughput system (~1 million
> > requests/sec. across all  8 servers).  The different attributes are
> > unbounded in terms of values, and some will spread across 100's of
> > millions values.  This is my current through process, let me know
> > where I could be more efficient or if there is a better way to do it.
> >
> > I'll create an AVRO object "Impression" which has all the attributes
> > of the inbound request.  My application servers then will on each
> > request create and send this to a single kafka topic.
> >
> > I'll then have a consumer which creates a stream from the topic.  From
> > there I'll use the windowed timeframes and groupBy to group by the
> > attributes on each given day.  At the end of the day I'd need to read
> > out the data store to an external system for storage.  Since I won't
> > know all the values I'd need something similar to the KVStore.all()
> > but for WindowedKV Stores.  This appears that it'd be possible in 1.1
> > with this
> > commit: https://github.com/apache/kafka/commit/
[https://avatars0.githubusercontent.com/u/13164074?s=200&v=4]<https://github.com/apache/kafka/commit/>

MINOR: Reduce ZK reads and ensure ZK watch is set for listener update… · apache/kafka@8df96a4<https://github.com/apache/kafka/commit/>
github.com
… (#4670) Ensures that ZK watch is set for each live broker for listener update notifications in the controller. Also avoids reading all brokers from ZooKeeper when a broker metadata is modified b...



> > 1d1c8575961bf6bce7decb049be7f10ca76bd0c5 .
> >
> > Is this the best approach to doing this?  Or would I be better using
> > the stream to listen and then an external DB like Aerospike to store
> > the counts and read out of it directly end of day.
> >
> > Thanks for the help!
> > Daum
> >
>

RE: Consultant Help

Posted by Matt Stone <ms...@nexeohr.com>.
Thank you I will look into that. 

-----Original Message-----
From: Svante Karlsson [mailto:svante.karlsson@csi.se] 
Sent: Friday, March 2, 2018 1:50 PM
To: users@kafka.apache.org
Subject: Re: Consultant Help

try https://www.confluent.io/ - that's what they do

/svante

2018-03-02 21:21 GMT+01:00 Matt Stone <ms...@nexeohr.com>:

> We are looking for a consultant or contractor that can come onsite to 
> our Ogden, Utah location in the US, to help with a Kafka set up and 
> maintenance project.  What we need is someone with the knowledge and 
> experience to build out the Kafka environment from scratch.
>
> We are thinking they would need to be onsite for 6-12 months  to set 
> it up, and mentor some of our team so they can get up to speed to do 
> the maintenance once the contractor is gone.  If anyone has the 
> experience setting up Kafka from scratch in a Linux environment, 
> maintain node clusters, and help train others on the team how to do 
> it, and you are interested in a long term project working at the 
> client site, I would love to start up  a discussion, to see if we could use you for the role.
>
> I would also be interested in hearing about any consulting firms that 
> might have resources that could help with this role.
>
> Matt Stone
>
>
> -----Original Message-----
> From: Matt Daum [mailto:matt@setfive.com]
> Sent: Friday, March 2, 2018 1:11 PM
> To: users@kafka.apache.org
> Subject: Re: Kafka Setup for Daily counts on wide array of keys
>
> Actually it looks like the better way would be to output the counts to 
> a new topic then ingest that topic into the DB itself.  Is that the 
> correct way?
>
> On Fri, Mar 2, 2018 at 9:24 AM, Matt Daum <ma...@setfive.com> wrote:
>
> > I am new to Kafka but I think I have a good use case for it.  I am 
> > trying to build daily counts of requests based on a number of 
> > different attributes in a high throughput system (~1 million 
> > requests/sec. across all  8 servers).  The different attributes are 
> > unbounded in terms of values, and some will spread across 100's of 
> > millions values.  This is my current through process, let me know 
> > where I could be more efficient or if there is a better way to do it.
> >
> > I'll create an AVRO object "Impression" which has all the attributes 
> > of the inbound request.  My application servers then will on each 
> > request create and send this to a single kafka topic.
> >
> > I'll then have a consumer which creates a stream from the topic.  
> > From there I'll use the windowed timeframes and groupBy to group by 
> > the attributes on each given day.  At the end of the day I'd need to 
> > read out the data store to an external system for storage.  Since I 
> > won't know all the values I'd need something similar to the 
> > KVStore.all() but for WindowedKV Stores.  This appears that it'd be 
> > possible in 1.1 with this
> > commit: https://github.com/apache/kafka/commit/
> > 1d1c8575961bf6bce7decb049be7f10ca76bd0c5 .
> >
> > Is this the best approach to doing this?  Or would I be better using 
> > the stream to listen and then an external DB like Aerospike to store 
> > the counts and read out of it directly end of day.
> >
> > Thanks for the help!
> > Daum
> >
>

Re: Consultant Help

Posted by Svante Karlsson <sv...@csi.se>.
try https://www.confluent.io/ - that's what they do

/svante

2018-03-02 21:21 GMT+01:00 Matt Stone <ms...@nexeohr.com>:

> We are looking for a consultant or contractor that can come onsite to our
> Ogden, Utah location in the US, to help with a Kafka set up and maintenance
> project.  What we need is someone with the knowledge and experience to
> build out the Kafka environment from scratch.
>
> We are thinking they would need to be onsite for 6-12 months  to set it
> up, and mentor some of our team so they can get up to speed to do the
> maintenance once the contractor is gone.  If anyone has the experience
> setting up Kafka from scratch in a Linux environment, maintain node
> clusters, and help train others on the team how to do it, and you are
> interested in a long term project working at the client site, I would love
> to start up  a discussion, to see if we could use you for the role.
>
> I would also be interested in hearing about any consulting firms that
> might have resources that could help with this role.
>
> Matt Stone
>
>
> -----Original Message-----
> From: Matt Daum [mailto:matt@setfive.com]
> Sent: Friday, March 2, 2018 1:11 PM
> To: users@kafka.apache.org
> Subject: Re: Kafka Setup for Daily counts on wide array of keys
>
> Actually it looks like the better way would be to output the counts to a
> new topic then ingest that topic into the DB itself.  Is that the correct
> way?
>
> On Fri, Mar 2, 2018 at 9:24 AM, Matt Daum <ma...@setfive.com> wrote:
>
> > I am new to Kafka but I think I have a good use case for it.  I am
> > trying to build daily counts of requests based on a number of
> > different attributes in a high throughput system (~1 million
> > requests/sec. across all  8 servers).  The different attributes are
> > unbounded in terms of values, and some will spread across 100's of
> > millions values.  This is my current through process, let me know
> > where I could be more efficient or if there is a better way to do it.
> >
> > I'll create an AVRO object "Impression" which has all the attributes
> > of the inbound request.  My application servers then will on each
> > request create and send this to a single kafka topic.
> >
> > I'll then have a consumer which creates a stream from the topic.  From
> > there I'll use the windowed timeframes and groupBy to group by the
> > attributes on each given day.  At the end of the day I'd need to read
> > out the data store to an external system for storage.  Since I won't
> > know all the values I'd need something similar to the KVStore.all()
> > but for WindowedKV Stores.  This appears that it'd be possible in 1.1
> > with this
> > commit: https://github.com/apache/kafka/commit/
> > 1d1c8575961bf6bce7decb049be7f10ca76bd0c5 .
> >
> > Is this the best approach to doing this?  Or would I be better using
> > the stream to listen and then an external DB like Aerospike to store
> > the counts and read out of it directly end of day.
> >
> > Thanks for the help!
> > Daum
> >
>