You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by David Arthur <mu...@gmail.com> on 2012/09/10 15:49:09 UTC

Re: Kafka REST interface

Bump. 

Anyone have feedback on this approach?

-David

On Aug 24, 2012, at 12:37 PM, David Arthur wrote:

> Here is an initial pass at a Kafka REST proxy (in Scala)
> 
> https://github.com/mumrah/kafka/blob/rest/contrib/rest-proxy/src/main/scala/RESTServer.scala
> 
> The basic gist is:
> * Jetty for webserver
> * Messages are strings
> * GET /topic/group to get a message (timeout after 1s)
> * POST /topic, the request body is the message
> * One consumer thread per topic+group
> 
> Be wary, many things are hard coded at this point (port numbers, etc). Obviously, this will need to change. Also, I haven't the slightest idea how to setup/use sbt properly, so I just checked in the libs.
> 
> Feedback is welcome in this thread or on Github.  Be gentle please, this is my first go at Scala
> 
> -David
> 
> On Aug 12, 2012, at 10:39 AM, Taylor Gautier wrote:
> 
>> Jay I agree with you 100%.
>> 
>> At Tagged we have implemented a proxy for various internal reasons (
>> primarily to act as a high performance relay from PHP to Kafka). It's
>> implemented in Node.js (JavaScript)
>> 
>> Currently it services UDP packets encoded in binary but it could
>> easily be modified to accept http also since Node support for http is
>> pretty simple.
>> 
>> If others are interested in maintaining something like this we could
>> consider adding this to the public domain along side the already
>> existing Node.js client implementation.
>> 
>> 
>> 
>> On Aug 10, 2012, at 3:51 PM, Jay Kreps <ja...@gmail.com> wrote:
>> 
>>> My personal preference would be to have only a single protocol in kafka
>>> core. I have been down the multiple protocol route and my experience was
>>> that it adds a lot of burden for each change that needs to be made and a
>>> lot of complexity to abstract over the different protocols. From the point
>>> of view of a user they are generally a bit agnostic as to how bytes are
>>> sent back and forth provided it is reliable and easily implementable in any
>>> language. Generally they care more about the quality of the client in their
>>> language of choice.
>>> 
>>> My belief is that the main benefit of REST is ease of implementing a
>>> client. But currently the biggest barrier is really the use of zk and
>>> fairly thick consumer design. So I think the current thinking is that we
>>> should focus on thinning that out and removing the client-side zk
>>> dependency. I actually don't think TCP is a huge burden if the protocol is
>>> simple, and there are actually some advantages (for example the consumer
>>> needs to consume from multiple servers so select/poll/epoll is natural but
>>> this is not always available from HTTP client libraries).
>>> 
>>> Basically this is an area where I think it is best to pick one way and
>>> really make it really bullet proof rather than providing lots of options.
>>> In some sense each option tends to increase the complexity of testing
>>> (since now there are many combinations to try) and also of implementation
>>> (since now a lot things that were concrete now need to be abstracted away).
>>> 
>>> So from this perspective I would prefer a standalone proxy that could
>>> evolve independently rather than retro-fitting the current socket server to
>>> handle other protocols. There will be some overhead for the extra hop, but
>>> then there is some overhead for HTTP itself.
>>> 
>>> This is just my personal opinion, it would be great to hear what other
>>> think.
>>> 
>>> -Jay
>>> 
>>> On Mon, Aug 6, 2012 at 5:39 AM, David Arthur <mu...@gmail.com> wrote:
>>> 
>>>> I'd be happy to collaborate on this, though it's been a while since I've
>>>> used PHP.
>>>> 
>>>> From what it looks like, what you have is a true proxy that runs outside
>>>> of Kafka and translates some REST routes into Kafka client calls. This
>>>> sounds more in line with what the project page describes. What I have
>>>> proposed is more like a translation layer between some REST routes and
>>>> FetchRequests. In this case the client is responsible for managing offsets.
>>>> Using the consumer groups and ZooKeeper would be another nice way of
>>>> consuming messages (which is probably more like what you have).
>>>> 
>>>> Any maintainers have feedback on this?
>>>> 
>>>> On Aug 3, 2012, at 4:13 PM, Jonathan Creasy wrote:
>>>> 
>>>>> I have an internal one working and was hoping to have it open sourced in
>>>>> the next week. The one at Box is based on the CodeIgniter framework, we
>>>>> have about 45 RESTful interfaces built on this framework so I just put
>>>>> together another one for Kafka.
>>>>> 
>>>>> 
>>>>> Here are my notes, these were pre-dev so may be a little different than
>>>>> what we ended up with.
>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Restful+API+Proposal
>>>>> 
>>>>> I will read yours later this afternoon, we should work together.
>>>>> 
>>>>> -Jonathan
>>>>> 
>>>>> 
>>>>> On Fri, Aug 3, 2012 at 7:41 AM, David Arthur <mu...@gmail.com> wrote:
>>>>> 
>>>>>> I'd like to tackle this project (assuming it hasn't been started yet).
>>>>>> 
>>>>>> I wrote up some initial thoughts here: https://gist.github.com/3248179
>>>>>> 
>>>>>> TLDR;  use Range header for specifying offsets, simple URIs like
>>>>>> /kafka/topics/[topic]/[partition], use for a simple transport of bytes
>>>>>> and/or represent the messages as some media type (text, json, xml)
>>>>>> 
>>>>>> Feedback is most welcome (in the Gist or in this thread).
>>>>>> 
>>>>>> Cheers!
>>>>>> 
>>>>>> -David
>>>> 
>>>> 
> 


Re: Kafka REST interface

Posted by David Arthur <mu...@gmail.com>.
I've opened KAFKA-639 to track this feature

On Nov 28, 2012, at 3:32 PM, David Arthur wrote:

> Are you referring to the HTTP side of things? If not, I'm not sure exactly what you mean by threaded I/O in this context
> 
> -David
> 
> On Nov 21, 2012, at 10:54 AM, Taylor Gautier wrote:
> 
>> It would make sense to use nio rather than threaded io. 
>> 
>> 
>> 
>> On Nov 20, 2012, at 2:06 PM, David Arthur <mu...@gmail.com> wrote:
>> 
>>> BTW, here are some cURL calls from my test environment:
>>> 
>>> https://gist.github.com/e59b9c8ee4ae56dad44f
>>> 
>>> 
>>> On Nov 20, 2012, at 4:08 PM, David Arthur wrote:
>>> 
>>>> Another bump for this thread...
>>>> 
>>>> For those just joining, this prototype is a simple HTTP server that proxies the complex consumer code through two HTTP endpoints. 
>>>> 
>>>> https://github.com/mumrah/kafka/blob/rest/contrib/rest-proxy/src/main/scala/RESTServer.scala
>>>> 
>>>> E.g., 
>>>> 
>>>>  curl http://localhost:8888/my-topic -X POST -d 'Here is a message'
>>>> 
>>>> and 
>>>> 
>>>>  curl http://localhost:8888/my-topic/my-group -X GET
>>>> 
>>>> 
>>>> This is not an attempt to expose the FetchRequest/ProduceRequest protocol over HTTP.
>>>> 
>>>> Few questions:
>>>> 
>>>> * Would including offsets be useful here? Since it is utilizing the ZK-backed consumer code, I would think not
>>>> * I have chosen to create one thread per topic+group (mostly for simplicity sake). Multiple REST servers could be run and load balanced across to increase the consumer parallelism. Maybe it would make sense for an individual REST server to create more than one thread per topic+group?
>>>> 
>>>> Cheers
>>>> -David
>>>> 
>>>> On Sep 10, 2012, at 9:49 AM, David Arthur wrote:
>>>> 
>>>>> Bump. 
>>>>> 
>>>>> Anyone have feedback on this approach?
>>>>> 
>>>>> -David
>>>>> 
>>>>> On Aug 24, 2012, at 12:37 PM, David Arthur wrote:
>>>>> 
>>>>>> Here is an initial pass at a Kafka REST proxy (in Scala)
>>>>>> 
>>>>>> https://github.com/mumrah/kafka/blob/rest/contrib/rest-proxy/src/main/scala/RESTServer.scala
>>>>>> 
>>>>>> The basic gist is:
>>>>>> * Jetty for webserver
>>>>>> * Messages are strings
>>>>>> * GET /topic/group to get a message (timeout after 1s)
>>>>>> * POST /topic, the request body is the message
>>>>>> * One consumer thread per topic+group
>>>>>> 
>>>>>> Be wary, many things are hard coded at this point (port numbers, etc). Obviously, this will need to change. Also, I haven't the slightest idea how to setup/use sbt properly, so I just checked in the libs.
>>>>>> 
>>>>>> Feedback is welcome in this thread or on Github.  Be gentle please, this is my first go at Scala
>>>>>> 
>>>>>> -David
>>>>>> 
>>>>>> On Aug 12, 2012, at 10:39 AM, Taylor Gautier wrote:
>>>>>> 
>>>>>>> Jay I agree with you 100%.
>>>>>>> 
>>>>>>> At Tagged we have implemented a proxy for various internal reasons (
>>>>>>> primarily to act as a high performance relay from PHP to Kafka). It's
>>>>>>> implemented in Node.js (JavaScript)
>>>>>>> 
>>>>>>> Currently it services UDP packets encoded in binary but it could
>>>>>>> easily be modified to accept http also since Node support for http is
>>>>>>> pretty simple.
>>>>>>> 
>>>>>>> If others are interested in maintaining something like this we could
>>>>>>> consider adding this to the public domain along side the already
>>>>>>> existing Node.js client implementation.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Aug 10, 2012, at 3:51 PM, Jay Kreps <ja...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> My personal preference would be to have only a single protocol in kafka
>>>>>>>> core. I have been down the multiple protocol route and my experience was
>>>>>>>> that it adds a lot of burden for each change that needs to be made and a
>>>>>>>> lot of complexity to abstract over the different protocols. From the point
>>>>>>>> of view of a user they are generally a bit agnostic as to how bytes are
>>>>>>>> sent back and forth provided it is reliable and easily implementable in any
>>>>>>>> language. Generally they care more about the quality of the client in their
>>>>>>>> language of choice.
>>>>>>>> 
>>>>>>>> My belief is that the main benefit of REST is ease of implementing a
>>>>>>>> client. But currently the biggest barrier is really the use of zk and
>>>>>>>> fairly thick consumer design. So I think the current thinking is that we
>>>>>>>> should focus on thinning that out and removing the client-side zk
>>>>>>>> dependency. I actually don't think TCP is a huge burden if the protocol is
>>>>>>>> simple, and there are actually some advantages (for example the consumer
>>>>>>>> needs to consume from multiple servers so select/poll/epoll is natural but
>>>>>>>> this is not always available from HTTP client libraries).
>>>>>>>> 
>>>>>>>> Basically this is an area where I think it is best to pick one way and
>>>>>>>> really make it really bullet proof rather than providing lots of options.
>>>>>>>> In some sense each option tends to increase the complexity of testing
>>>>>>>> (since now there are many combinations to try) and also of implementation
>>>>>>>> (since now a lot things that were concrete now need to be abstracted away).
>>>>>>>> 
>>>>>>>> So from this perspective I would prefer a standalone proxy that could
>>>>>>>> evolve independently rather than retro-fitting the current socket server to
>>>>>>>> handle other protocols. There will be some overhead for the extra hop, but
>>>>>>>> then there is some overhead for HTTP itself.
>>>>>>>> 
>>>>>>>> This is just my personal opinion, it would be great to hear what other
>>>>>>>> think.
>>>>>>>> 
>>>>>>>> -Jay
>>>>>>>> 
>>>>>>>> On Mon, Aug 6, 2012 at 5:39 AM, David Arthur <mu...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> I'd be happy to collaborate on this, though it's been a while since I've
>>>>>>>>> used PHP.
>>>>>>>>> 
>>>>>>>>> From what it looks like, what you have is a true proxy that runs outside
>>>>>>>>> of Kafka and translates some REST routes into Kafka client calls. This
>>>>>>>>> sounds more in line with what the project page describes. What I have
>>>>>>>>> proposed is more like a translation layer between some REST routes and
>>>>>>>>> FetchRequests. In this case the client is responsible for managing offsets.
>>>>>>>>> Using the consumer groups and ZooKeeper would be another nice way of
>>>>>>>>> consuming messages (which is probably more like what you have).
>>>>>>>>> 
>>>>>>>>> Any maintainers have feedback on this?
>>>>>>>>> 
>>>>>>>>> On Aug 3, 2012, at 4:13 PM, Jonathan Creasy wrote:
>>>>>>>>> 
>>>>>>>>>> I have an internal one working and was hoping to have it open sourced in
>>>>>>>>>> the next week. The one at Box is based on the CodeIgniter framework, we
>>>>>>>>>> have about 45 RESTful interfaces built on this framework so I just put
>>>>>>>>>> together another one for Kafka.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Here are my notes, these were pre-dev so may be a little different than
>>>>>>>>>> what we ended up with.
>>>>>>>>>> 
>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Restful+API+Proposal
>>>>>>>>>> 
>>>>>>>>>> I will read yours later this afternoon, we should work together.
>>>>>>>>>> 
>>>>>>>>>> -Jonathan
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Fri, Aug 3, 2012 at 7:41 AM, David Arthur <mu...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I'd like to tackle this project (assuming it hasn't been started yet).
>>>>>>>>>>> 
>>>>>>>>>>> I wrote up some initial thoughts here: https://gist.github.com/3248179
>>>>>>>>>>> 
>>>>>>>>>>> TLDR;  use Range header for specifying offsets, simple URIs like
>>>>>>>>>>> /kafka/topics/[topic]/[partition], use for a simple transport of bytes
>>>>>>>>>>> and/or represent the messages as some media type (text, json, xml)
>>>>>>>>>>> 
>>>>>>>>>>> Feedback is most welcome (in the Gist or in this thread).
>>>>>>>>>>> 
>>>>>>>>>>> Cheers!
>>>>>>>>>>> 
>>>>>>>>>>> -David
>>> 
> 


Re: Kafka REST interface

Posted by David Arthur <mu...@gmail.com>.
Are you referring to the HTTP side of things? If not, I'm not sure exactly what you mean by threaded I/O in this context

-David

On Nov 21, 2012, at 10:54 AM, Taylor Gautier wrote:

> It would make sense to use nio rather than threaded io. 
> 
> 
> 
> On Nov 20, 2012, at 2:06 PM, David Arthur <mu...@gmail.com> wrote:
> 
>> BTW, here are some cURL calls from my test environment:
>> 
>> https://gist.github.com/e59b9c8ee4ae56dad44f
>> 
>> 
>> On Nov 20, 2012, at 4:08 PM, David Arthur wrote:
>> 
>>> Another bump for this thread...
>>> 
>>> For those just joining, this prototype is a simple HTTP server that proxies the complex consumer code through two HTTP endpoints. 
>>> 
>>> https://github.com/mumrah/kafka/blob/rest/contrib/rest-proxy/src/main/scala/RESTServer.scala
>>> 
>>> E.g., 
>>> 
>>>   curl http://localhost:8888/my-topic -X POST -d 'Here is a message'
>>> 
>>> and 
>>> 
>>>   curl http://localhost:8888/my-topic/my-group -X GET
>>> 
>>> 
>>> This is not an attempt to expose the FetchRequest/ProduceRequest protocol over HTTP.
>>> 
>>> Few questions:
>>> 
>>> * Would including offsets be useful here? Since it is utilizing the ZK-backed consumer code, I would think not
>>> * I have chosen to create one thread per topic+group (mostly for simplicity sake). Multiple REST servers could be run and load balanced across to increase the consumer parallelism. Maybe it would make sense for an individual REST server to create more than one thread per topic+group?
>>> 
>>> Cheers
>>> -David
>>> 
>>> On Sep 10, 2012, at 9:49 AM, David Arthur wrote:
>>> 
>>>> Bump. 
>>>> 
>>>> Anyone have feedback on this approach?
>>>> 
>>>> -David
>>>> 
>>>> On Aug 24, 2012, at 12:37 PM, David Arthur wrote:
>>>> 
>>>>> Here is an initial pass at a Kafka REST proxy (in Scala)
>>>>> 
>>>>> https://github.com/mumrah/kafka/blob/rest/contrib/rest-proxy/src/main/scala/RESTServer.scala
>>>>> 
>>>>> The basic gist is:
>>>>> * Jetty for webserver
>>>>> * Messages are strings
>>>>> * GET /topic/group to get a message (timeout after 1s)
>>>>> * POST /topic, the request body is the message
>>>>> * One consumer thread per topic+group
>>>>> 
>>>>> Be wary, many things are hard coded at this point (port numbers, etc). Obviously, this will need to change. Also, I haven't the slightest idea how to setup/use sbt properly, so I just checked in the libs.
>>>>> 
>>>>> Feedback is welcome in this thread or on Github.  Be gentle please, this is my first go at Scala
>>>>> 
>>>>> -David
>>>>> 
>>>>> On Aug 12, 2012, at 10:39 AM, Taylor Gautier wrote:
>>>>> 
>>>>>> Jay I agree with you 100%.
>>>>>> 
>>>>>> At Tagged we have implemented a proxy for various internal reasons (
>>>>>> primarily to act as a high performance relay from PHP to Kafka). It's
>>>>>> implemented in Node.js (JavaScript)
>>>>>> 
>>>>>> Currently it services UDP packets encoded in binary but it could
>>>>>> easily be modified to accept http also since Node support for http is
>>>>>> pretty simple.
>>>>>> 
>>>>>> If others are interested in maintaining something like this we could
>>>>>> consider adding this to the public domain along side the already
>>>>>> existing Node.js client implementation.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Aug 10, 2012, at 3:51 PM, Jay Kreps <ja...@gmail.com> wrote:
>>>>>> 
>>>>>>> My personal preference would be to have only a single protocol in kafka
>>>>>>> core. I have been down the multiple protocol route and my experience was
>>>>>>> that it adds a lot of burden for each change that needs to be made and a
>>>>>>> lot of complexity to abstract over the different protocols. From the point
>>>>>>> of view of a user they are generally a bit agnostic as to how bytes are
>>>>>>> sent back and forth provided it is reliable and easily implementable in any
>>>>>>> language. Generally they care more about the quality of the client in their
>>>>>>> language of choice.
>>>>>>> 
>>>>>>> My belief is that the main benefit of REST is ease of implementing a
>>>>>>> client. But currently the biggest barrier is really the use of zk and
>>>>>>> fairly thick consumer design. So I think the current thinking is that we
>>>>>>> should focus on thinning that out and removing the client-side zk
>>>>>>> dependency. I actually don't think TCP is a huge burden if the protocol is
>>>>>>> simple, and there are actually some advantages (for example the consumer
>>>>>>> needs to consume from multiple servers so select/poll/epoll is natural but
>>>>>>> this is not always available from HTTP client libraries).
>>>>>>> 
>>>>>>> Basically this is an area where I think it is best to pick one way and
>>>>>>> really make it really bullet proof rather than providing lots of options.
>>>>>>> In some sense each option tends to increase the complexity of testing
>>>>>>> (since now there are many combinations to try) and also of implementation
>>>>>>> (since now a lot things that were concrete now need to be abstracted away).
>>>>>>> 
>>>>>>> So from this perspective I would prefer a standalone proxy that could
>>>>>>> evolve independently rather than retro-fitting the current socket server to
>>>>>>> handle other protocols. There will be some overhead for the extra hop, but
>>>>>>> then there is some overhead for HTTP itself.
>>>>>>> 
>>>>>>> This is just my personal opinion, it would be great to hear what other
>>>>>>> think.
>>>>>>> 
>>>>>>> -Jay
>>>>>>> 
>>>>>>> On Mon, Aug 6, 2012 at 5:39 AM, David Arthur <mu...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> I'd be happy to collaborate on this, though it's been a while since I've
>>>>>>>> used PHP.
>>>>>>>> 
>>>>>>>> From what it looks like, what you have is a true proxy that runs outside
>>>>>>>> of Kafka and translates some REST routes into Kafka client calls. This
>>>>>>>> sounds more in line with what the project page describes. What I have
>>>>>>>> proposed is more like a translation layer between some REST routes and
>>>>>>>> FetchRequests. In this case the client is responsible for managing offsets.
>>>>>>>> Using the consumer groups and ZooKeeper would be another nice way of
>>>>>>>> consuming messages (which is probably more like what you have).
>>>>>>>> 
>>>>>>>> Any maintainers have feedback on this?
>>>>>>>> 
>>>>>>>> On Aug 3, 2012, at 4:13 PM, Jonathan Creasy wrote:
>>>>>>>> 
>>>>>>>>> I have an internal one working and was hoping to have it open sourced in
>>>>>>>>> the next week. The one at Box is based on the CodeIgniter framework, we
>>>>>>>>> have about 45 RESTful interfaces built on this framework so I just put
>>>>>>>>> together another one for Kafka.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Here are my notes, these were pre-dev so may be a little different than
>>>>>>>>> what we ended up with.
>>>>>>>>> 
>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Restful+API+Proposal
>>>>>>>>> 
>>>>>>>>> I will read yours later this afternoon, we should work together.
>>>>>>>>> 
>>>>>>>>> -Jonathan
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, Aug 3, 2012 at 7:41 AM, David Arthur <mu...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>>> I'd like to tackle this project (assuming it hasn't been started yet).
>>>>>>>>>> 
>>>>>>>>>> I wrote up some initial thoughts here: https://gist.github.com/3248179
>>>>>>>>>> 
>>>>>>>>>> TLDR;  use Range header for specifying offsets, simple URIs like
>>>>>>>>>> /kafka/topics/[topic]/[partition], use for a simple transport of bytes
>>>>>>>>>> and/or represent the messages as some media type (text, json, xml)
>>>>>>>>>> 
>>>>>>>>>> Feedback is most welcome (in the Gist or in this thread).
>>>>>>>>>> 
>>>>>>>>>> Cheers!
>>>>>>>>>> 
>>>>>>>>>> -David
>> 


Re: Kafka REST interface

Posted by Taylor Gautier <tg...@gmail.com>.
It would make sense to use nio rather than threaded io. 



On Nov 20, 2012, at 2:06 PM, David Arthur <mu...@gmail.com> wrote:

> BTW, here are some cURL calls from my test environment:
> 
> https://gist.github.com/e59b9c8ee4ae56dad44f
> 
> 
> On Nov 20, 2012, at 4:08 PM, David Arthur wrote:
> 
>> Another bump for this thread...
>> 
>> For those just joining, this prototype is a simple HTTP server that proxies the complex consumer code through two HTTP endpoints. 
>> 
>> https://github.com/mumrah/kafka/blob/rest/contrib/rest-proxy/src/main/scala/RESTServer.scala
>> 
>> E.g., 
>> 
>>    curl http://localhost:8888/my-topic -X POST -d 'Here is a message'
>> 
>> and 
>> 
>>    curl http://localhost:8888/my-topic/my-group -X GET
>> 
>> 
>> This is not an attempt to expose the FetchRequest/ProduceRequest protocol over HTTP.
>> 
>> Few questions:
>> 
>> * Would including offsets be useful here? Since it is utilizing the ZK-backed consumer code, I would think not
>> * I have chosen to create one thread per topic+group (mostly for simplicity sake). Multiple REST servers could be run and load balanced across to increase the consumer parallelism. Maybe it would make sense for an individual REST server to create more than one thread per topic+group?
>> 
>> Cheers
>> -David
>> 
>> On Sep 10, 2012, at 9:49 AM, David Arthur wrote:
>> 
>>> Bump. 
>>> 
>>> Anyone have feedback on this approach?
>>> 
>>> -David
>>> 
>>> On Aug 24, 2012, at 12:37 PM, David Arthur wrote:
>>> 
>>>> Here is an initial pass at a Kafka REST proxy (in Scala)
>>>> 
>>>> https://github.com/mumrah/kafka/blob/rest/contrib/rest-proxy/src/main/scala/RESTServer.scala
>>>> 
>>>> The basic gist is:
>>>> * Jetty for webserver
>>>> * Messages are strings
>>>> * GET /topic/group to get a message (timeout after 1s)
>>>> * POST /topic, the request body is the message
>>>> * One consumer thread per topic+group
>>>> 
>>>> Be wary, many things are hard coded at this point (port numbers, etc). Obviously, this will need to change. Also, I haven't the slightest idea how to setup/use sbt properly, so I just checked in the libs.
>>>> 
>>>> Feedback is welcome in this thread or on Github.  Be gentle please, this is my first go at Scala
>>>> 
>>>> -David
>>>> 
>>>> On Aug 12, 2012, at 10:39 AM, Taylor Gautier wrote:
>>>> 
>>>>> Jay I agree with you 100%.
>>>>> 
>>>>> At Tagged we have implemented a proxy for various internal reasons (
>>>>> primarily to act as a high performance relay from PHP to Kafka). It's
>>>>> implemented in Node.js (JavaScript)
>>>>> 
>>>>> Currently it services UDP packets encoded in binary but it could
>>>>> easily be modified to accept http also since Node support for http is
>>>>> pretty simple.
>>>>> 
>>>>> If others are interested in maintaining something like this we could
>>>>> consider adding this to the public domain along side the already
>>>>> existing Node.js client implementation.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Aug 10, 2012, at 3:51 PM, Jay Kreps <ja...@gmail.com> wrote:
>>>>> 
>>>>>> My personal preference would be to have only a single protocol in kafka
>>>>>> core. I have been down the multiple protocol route and my experience was
>>>>>> that it adds a lot of burden for each change that needs to be made and a
>>>>>> lot of complexity to abstract over the different protocols. From the point
>>>>>> of view of a user they are generally a bit agnostic as to how bytes are
>>>>>> sent back and forth provided it is reliable and easily implementable in any
>>>>>> language. Generally they care more about the quality of the client in their
>>>>>> language of choice.
>>>>>> 
>>>>>> My belief is that the main benefit of REST is ease of implementing a
>>>>>> client. But currently the biggest barrier is really the use of zk and
>>>>>> fairly thick consumer design. So I think the current thinking is that we
>>>>>> should focus on thinning that out and removing the client-side zk
>>>>>> dependency. I actually don't think TCP is a huge burden if the protocol is
>>>>>> simple, and there are actually some advantages (for example the consumer
>>>>>> needs to consume from multiple servers so select/poll/epoll is natural but
>>>>>> this is not always available from HTTP client libraries).
>>>>>> 
>>>>>> Basically this is an area where I think it is best to pick one way and
>>>>>> really make it really bullet proof rather than providing lots of options.
>>>>>> In some sense each option tends to increase the complexity of testing
>>>>>> (since now there are many combinations to try) and also of implementation
>>>>>> (since now a lot things that were concrete now need to be abstracted away).
>>>>>> 
>>>>>> So from this perspective I would prefer a standalone proxy that could
>>>>>> evolve independently rather than retro-fitting the current socket server to
>>>>>> handle other protocols. There will be some overhead for the extra hop, but
>>>>>> then there is some overhead for HTTP itself.
>>>>>> 
>>>>>> This is just my personal opinion, it would be great to hear what other
>>>>>> think.
>>>>>> 
>>>>>> -Jay
>>>>>> 
>>>>>> On Mon, Aug 6, 2012 at 5:39 AM, David Arthur <mu...@gmail.com> wrote:
>>>>>> 
>>>>>>> I'd be happy to collaborate on this, though it's been a while since I've
>>>>>>> used PHP.
>>>>>>> 
>>>>>>> From what it looks like, what you have is a true proxy that runs outside
>>>>>>> of Kafka and translates some REST routes into Kafka client calls. This
>>>>>>> sounds more in line with what the project page describes. What I have
>>>>>>> proposed is more like a translation layer between some REST routes and
>>>>>>> FetchRequests. In this case the client is responsible for managing offsets.
>>>>>>> Using the consumer groups and ZooKeeper would be another nice way of
>>>>>>> consuming messages (which is probably more like what you have).
>>>>>>> 
>>>>>>> Any maintainers have feedback on this?
>>>>>>> 
>>>>>>> On Aug 3, 2012, at 4:13 PM, Jonathan Creasy wrote:
>>>>>>> 
>>>>>>>> I have an internal one working and was hoping to have it open sourced in
>>>>>>>> the next week. The one at Box is based on the CodeIgniter framework, we
>>>>>>>> have about 45 RESTful interfaces built on this framework so I just put
>>>>>>>> together another one for Kafka.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Here are my notes, these were pre-dev so may be a little different than
>>>>>>>> what we ended up with.
>>>>>>>> 
>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Restful+API+Proposal
>>>>>>>> 
>>>>>>>> I will read yours later this afternoon, we should work together.
>>>>>>>> 
>>>>>>>> -Jonathan
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Aug 3, 2012 at 7:41 AM, David Arthur <mu...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> I'd like to tackle this project (assuming it hasn't been started yet).
>>>>>>>>> 
>>>>>>>>> I wrote up some initial thoughts here: https://gist.github.com/3248179
>>>>>>>>> 
>>>>>>>>> TLDR;  use Range header for specifying offsets, simple URIs like
>>>>>>>>> /kafka/topics/[topic]/[partition], use for a simple transport of bytes
>>>>>>>>> and/or represent the messages as some media type (text, json, xml)
>>>>>>>>> 
>>>>>>>>> Feedback is most welcome (in the Gist or in this thread).
>>>>>>>>> 
>>>>>>>>> Cheers!
>>>>>>>>> 
>>>>>>>>> -David
> 

Re: Kafka REST interface

Posted by David Arthur <mu...@gmail.com>.
BTW, here are some cURL calls from my test environment:

https://gist.github.com/e59b9c8ee4ae56dad44f


On Nov 20, 2012, at 4:08 PM, David Arthur wrote:

> Another bump for this thread...
> 
> For those just joining, this prototype is a simple HTTP server that proxies the complex consumer code through two HTTP endpoints. 
> 
> https://github.com/mumrah/kafka/blob/rest/contrib/rest-proxy/src/main/scala/RESTServer.scala
> 
> E.g., 
>   
>     curl http://localhost:8888/my-topic -X POST -d 'Here is a message'
> 
> and 
> 
>     curl http://localhost:8888/my-topic/my-group -X GET
> 
> 
> This is not an attempt to expose the FetchRequest/ProduceRequest protocol over HTTP.
> 
> Few questions:
> 
> * Would including offsets be useful here? Since it is utilizing the ZK-backed consumer code, I would think not
> * I have chosen to create one thread per topic+group (mostly for simplicity sake). Multiple REST servers could be run and load balanced across to increase the consumer parallelism. Maybe it would make sense for an individual REST server to create more than one thread per topic+group?
> 
> Cheers
> -David
> 
> On Sep 10, 2012, at 9:49 AM, David Arthur wrote:
> 
>> Bump. 
>> 
>> Anyone have feedback on this approach?
>> 
>> -David
>> 
>> On Aug 24, 2012, at 12:37 PM, David Arthur wrote:
>> 
>>> Here is an initial pass at a Kafka REST proxy (in Scala)
>>> 
>>> https://github.com/mumrah/kafka/blob/rest/contrib/rest-proxy/src/main/scala/RESTServer.scala
>>> 
>>> The basic gist is:
>>> * Jetty for webserver
>>> * Messages are strings
>>> * GET /topic/group to get a message (timeout after 1s)
>>> * POST /topic, the request body is the message
>>> * One consumer thread per topic+group
>>> 
>>> Be wary, many things are hard coded at this point (port numbers, etc). Obviously, this will need to change. Also, I haven't the slightest idea how to setup/use sbt properly, so I just checked in the libs.
>>> 
>>> Feedback is welcome in this thread or on Github.  Be gentle please, this is my first go at Scala
>>> 
>>> -David
>>> 
>>> On Aug 12, 2012, at 10:39 AM, Taylor Gautier wrote:
>>> 
>>>> Jay I agree with you 100%.
>>>> 
>>>> At Tagged we have implemented a proxy for various internal reasons (
>>>> primarily to act as a high performance relay from PHP to Kafka). It's
>>>> implemented in Node.js (JavaScript)
>>>> 
>>>> Currently it services UDP packets encoded in binary but it could
>>>> easily be modified to accept http also since Node support for http is
>>>> pretty simple.
>>>> 
>>>> If others are interested in maintaining something like this we could
>>>> consider adding this to the public domain along side the already
>>>> existing Node.js client implementation.
>>>> 
>>>> 
>>>> 
>>>> On Aug 10, 2012, at 3:51 PM, Jay Kreps <ja...@gmail.com> wrote:
>>>> 
>>>>> My personal preference would be to have only a single protocol in kafka
>>>>> core. I have been down the multiple protocol route and my experience was
>>>>> that it adds a lot of burden for each change that needs to be made and a
>>>>> lot of complexity to abstract over the different protocols. From the point
>>>>> of view of a user they are generally a bit agnostic as to how bytes are
>>>>> sent back and forth provided it is reliable and easily implementable in any
>>>>> language. Generally they care more about the quality of the client in their
>>>>> language of choice.
>>>>> 
>>>>> My belief is that the main benefit of REST is ease of implementing a
>>>>> client. But currently the biggest barrier is really the use of zk and
>>>>> fairly thick consumer design. So I think the current thinking is that we
>>>>> should focus on thinning that out and removing the client-side zk
>>>>> dependency. I actually don't think TCP is a huge burden if the protocol is
>>>>> simple, and there are actually some advantages (for example the consumer
>>>>> needs to consume from multiple servers so select/poll/epoll is natural but
>>>>> this is not always available from HTTP client libraries).
>>>>> 
>>>>> Basically this is an area where I think it is best to pick one way and
>>>>> really make it really bullet proof rather than providing lots of options.
>>>>> In some sense each option tends to increase the complexity of testing
>>>>> (since now there are many combinations to try) and also of implementation
>>>>> (since now a lot things that were concrete now need to be abstracted away).
>>>>> 
>>>>> So from this perspective I would prefer a standalone proxy that could
>>>>> evolve independently rather than retro-fitting the current socket server to
>>>>> handle other protocols. There will be some overhead for the extra hop, but
>>>>> then there is some overhead for HTTP itself.
>>>>> 
>>>>> This is just my personal opinion, it would be great to hear what other
>>>>> think.
>>>>> 
>>>>> -Jay
>>>>> 
>>>>> On Mon, Aug 6, 2012 at 5:39 AM, David Arthur <mu...@gmail.com> wrote:
>>>>> 
>>>>>> I'd be happy to collaborate on this, though it's been a while since I've
>>>>>> used PHP.
>>>>>> 
>>>>>> From what it looks like, what you have is a true proxy that runs outside
>>>>>> of Kafka and translates some REST routes into Kafka client calls. This
>>>>>> sounds more in line with what the project page describes. What I have
>>>>>> proposed is more like a translation layer between some REST routes and
>>>>>> FetchRequests. In this case the client is responsible for managing offsets.
>>>>>> Using the consumer groups and ZooKeeper would be another nice way of
>>>>>> consuming messages (which is probably more like what you have).
>>>>>> 
>>>>>> Any maintainers have feedback on this?
>>>>>> 
>>>>>> On Aug 3, 2012, at 4:13 PM, Jonathan Creasy wrote:
>>>>>> 
>>>>>>> I have an internal one working and was hoping to have it open sourced in
>>>>>>> the next week. The one at Box is based on the CodeIgniter framework, we
>>>>>>> have about 45 RESTful interfaces built on this framework so I just put
>>>>>>> together another one for Kafka.
>>>>>>> 
>>>>>>> 
>>>>>>> Here are my notes, these were pre-dev so may be a little different than
>>>>>>> what we ended up with.
>>>>>>> 
>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Restful+API+Proposal
>>>>>>> 
>>>>>>> I will read yours later this afternoon, we should work together.
>>>>>>> 
>>>>>>> -Jonathan
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Aug 3, 2012 at 7:41 AM, David Arthur <mu...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> I'd like to tackle this project (assuming it hasn't been started yet).
>>>>>>>> 
>>>>>>>> I wrote up some initial thoughts here: https://gist.github.com/3248179
>>>>>>>> 
>>>>>>>> TLDR;  use Range header for specifying offsets, simple URIs like
>>>>>>>> /kafka/topics/[topic]/[partition], use for a simple transport of bytes
>>>>>>>> and/or represent the messages as some media type (text, json, xml)
>>>>>>>> 
>>>>>>>> Feedback is most welcome (in the Gist or in this thread).
>>>>>>>> 
>>>>>>>> Cheers!
>>>>>>>> 
>>>>>>>> -David
>>>>>> 
>>>>>> 
>>> 
>> 
> 


Re: Kafka REST interface

Posted by David Arthur <mu...@gmail.com>.
Another bump for this thread...

For those just joining, this prototype is a simple HTTP server that proxies the complex consumer code through two HTTP endpoints. 

https://github.com/mumrah/kafka/blob/rest/contrib/rest-proxy/src/main/scala/RESTServer.scala

E.g., 
  
    curl http://localhost:8888/my-topic -X POST -d 'Here is a message'

and 

    curl http://localhost:8888/my-topic/my-group -X GET


This is not an attempt to expose the FetchRequest/ProduceRequest protocol over HTTP.

Few questions:

* Would including offsets be useful here? Since it is utilizing the ZK-backed consumer code, I would think not
* I have chosen to create one thread per topic+group (mostly for simplicity sake). Multiple REST servers could be run and load balanced across to increase the consumer parallelism. Maybe it would make sense for an individual REST server to create more than one thread per topic+group?

Cheers
-David

On Sep 10, 2012, at 9:49 AM, David Arthur wrote:

> Bump. 
> 
> Anyone have feedback on this approach?
> 
> -David
> 
> On Aug 24, 2012, at 12:37 PM, David Arthur wrote:
> 
>> Here is an initial pass at a Kafka REST proxy (in Scala)
>> 
>> https://github.com/mumrah/kafka/blob/rest/contrib/rest-proxy/src/main/scala/RESTServer.scala
>> 
>> The basic gist is:
>> * Jetty for webserver
>> * Messages are strings
>> * GET /topic/group to get a message (timeout after 1s)
>> * POST /topic, the request body is the message
>> * One consumer thread per topic+group
>> 
>> Be wary, many things are hard coded at this point (port numbers, etc). Obviously, this will need to change. Also, I haven't the slightest idea how to setup/use sbt properly, so I just checked in the libs.
>> 
>> Feedback is welcome in this thread or on Github.  Be gentle please, this is my first go at Scala
>> 
>> -David
>> 
>> On Aug 12, 2012, at 10:39 AM, Taylor Gautier wrote:
>> 
>>> Jay I agree with you 100%.
>>> 
>>> At Tagged we have implemented a proxy for various internal reasons (
>>> primarily to act as a high performance relay from PHP to Kafka). It's
>>> implemented in Node.js (JavaScript)
>>> 
>>> Currently it services UDP packets encoded in binary but it could
>>> easily be modified to accept http also since Node support for http is
>>> pretty simple.
>>> 
>>> If others are interested in maintaining something like this we could
>>> consider adding this to the public domain along side the already
>>> existing Node.js client implementation.
>>> 
>>> 
>>> 
>>> On Aug 10, 2012, at 3:51 PM, Jay Kreps <ja...@gmail.com> wrote:
>>> 
>>>> My personal preference would be to have only a single protocol in kafka
>>>> core. I have been down the multiple protocol route and my experience was
>>>> that it adds a lot of burden for each change that needs to be made and a
>>>> lot of complexity to abstract over the different protocols. From the point
>>>> of view of a user they are generally a bit agnostic as to how bytes are
>>>> sent back and forth provided it is reliable and easily implementable in any
>>>> language. Generally they care more about the quality of the client in their
>>>> language of choice.
>>>> 
>>>> My belief is that the main benefit of REST is ease of implementing a
>>>> client. But currently the biggest barrier is really the use of zk and
>>>> fairly thick consumer design. So I think the current thinking is that we
>>>> should focus on thinning that out and removing the client-side zk
>>>> dependency. I actually don't think TCP is a huge burden if the protocol is
>>>> simple, and there are actually some advantages (for example the consumer
>>>> needs to consume from multiple servers so select/poll/epoll is natural but
>>>> this is not always available from HTTP client libraries).
>>>> 
>>>> Basically this is an area where I think it is best to pick one way and
>>>> really make it really bullet proof rather than providing lots of options.
>>>> In some sense each option tends to increase the complexity of testing
>>>> (since now there are many combinations to try) and also of implementation
>>>> (since now a lot things that were concrete now need to be abstracted away).
>>>> 
>>>> So from this perspective I would prefer a standalone proxy that could
>>>> evolve independently rather than retro-fitting the current socket server to
>>>> handle other protocols. There will be some overhead for the extra hop, but
>>>> then there is some overhead for HTTP itself.
>>>> 
>>>> This is just my personal opinion, it would be great to hear what other
>>>> think.
>>>> 
>>>> -Jay
>>>> 
>>>> On Mon, Aug 6, 2012 at 5:39 AM, David Arthur <mu...@gmail.com> wrote:
>>>> 
>>>>> I'd be happy to collaborate on this, though it's been a while since I've
>>>>> used PHP.
>>>>> 
>>>>> From what it looks like, what you have is a true proxy that runs outside
>>>>> of Kafka and translates some REST routes into Kafka client calls. This
>>>>> sounds more in line with what the project page describes. What I have
>>>>> proposed is more like a translation layer between some REST routes and
>>>>> FetchRequests. In this case the client is responsible for managing offsets.
>>>>> Using the consumer groups and ZooKeeper would be another nice way of
>>>>> consuming messages (which is probably more like what you have).
>>>>> 
>>>>> Any maintainers have feedback on this?
>>>>> 
>>>>> On Aug 3, 2012, at 4:13 PM, Jonathan Creasy wrote:
>>>>> 
>>>>>> I have an internal one working and was hoping to have it open sourced in
>>>>>> the next week. The one at Box is based on the CodeIgniter framework, we
>>>>>> have about 45 RESTful interfaces built on this framework so I just put
>>>>>> together another one for Kafka.
>>>>>> 
>>>>>> 
>>>>>> Here are my notes, these were pre-dev so may be a little different than
>>>>>> what we ended up with.
>>>>>> 
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Restful+API+Proposal
>>>>>> 
>>>>>> I will read yours later this afternoon, we should work together.
>>>>>> 
>>>>>> -Jonathan
>>>>>> 
>>>>>> 
>>>>>> On Fri, Aug 3, 2012 at 7:41 AM, David Arthur <mu...@gmail.com> wrote:
>>>>>> 
>>>>>>> I'd like to tackle this project (assuming it hasn't been started yet).
>>>>>>> 
>>>>>>> I wrote up some initial thoughts here: https://gist.github.com/3248179
>>>>>>> 
>>>>>>> TLDR;  use Range header for specifying offsets, simple URIs like
>>>>>>> /kafka/topics/[topic]/[partition], use for a simple transport of bytes
>>>>>>> and/or represent the messages as some media type (text, json, xml)
>>>>>>> 
>>>>>>> Feedback is most welcome (in the Gist or in this thread).
>>>>>>> 
>>>>>>> Cheers!
>>>>>>> 
>>>>>>> -David
>>>>> 
>>>>> 
>> 
>