You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Michael Marshall <mm...@apache.org> on 2022/10/18 14:50:32 UTC

Re: [DISCUSS] PIP-208: HTTP Sink

Great discussion. I have one minor comment that is tangentially related.

> On building the project `pulsar-io-http-{version}.nar` will be built and
> added to the `pulsar-all` distribution.

The pulsar-all docker image is pretty big. I assume we will continue
to build and package additional connectors. It would be great to
figure out how to make it smaller at some point.

Thanks,
Michael


On Tue, Sep 27, 2022 at 9:27 AM Christophe Bornet
<bo...@gmail.com> wrote:
>
> Sure you can test with the Sink of my PR branch.
> Otherwise I'll do the test after ApacheCon.
>
> Le mar. 27 sept. 2022 à 12:57, tison <wa...@gmail.com> a écrit :
>
> > Yes. It's a potential use case for validating the implementation. If you
> > don't have time to try it out, I can schedule some time to demo it with a
> > prototype HTTP sink or after the patch gets merged :)
> >
> > Best,
> > tison.
> >
> >
> > Christophe Bornet <bo...@gmail.com> 于2022年9月27日周二 18:51写道:
> >
> > > Hi Tison,
> > >
> > > Very interesting and shows the value of such a HTTP Sink.
> > > The Pulsar HTTP Sink should work OOTB with ClickHouse. I don't have time
> > to
> > > do the test right now, so would someone want to do it ?
> > >
> > > Best regards.
> > >
> > > Christophe Bornet
> > >
> > > Le mar. 27 sept. 2022 à 12:31, tison <wa...@gmail.com> a écrit :
> > >
> > > > Hi Christophe,
> > > >
> > > > Thanks for starting this proposal. It looks cool.
> > > >
> > > > I'd suggest one real-world integration test you can make use of:
> > > > https://clickhouse.com/docs/en/integrations/kafka/kafka-connect-http
> > > > (replace source kafka with pulsar).
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > Enrico Olivelli <eo...@gmail.com> 于2022年9月27日周二 18:04写道:
> > > >
> > > > > Thanks for your answers.
> > > > > I am fine with the current proposal.
> > > > > We can enhance it as follow up work
> > > > >
> > > > > Enrico
> > > > >
> > > > > Il giorno ven 23 set 2022 alle ore 19:20 Christophe Bornet
> > > > > <bo...@gmail.com> ha scritto:
> > > > > >
> > > > > > Thanks for your feedback Enrico.
> > > > > > My answers to your comments below
> > > > > >
> > > > > > BR
> > > > > >
> > > > > > Christophe
> > > > > >
> > > > > > Le mar. 20 sept. 2022 à 14:16, Enrico Olivelli <
> > eolivelli@gmail.com>
> > > a
> > > > > > écrit :
> > > > > >
> > > > > > > Christophe,
> > > > > > > very good initiative!
> > > > > > >
> > > > > > > I support it
> > > > > > > Some comments inline below
> > > > > > >
> > > > > > >
> > > > > > > Enrico
> > > > > > >
> > > > > > > Il giorno lun 19 set 2022 alle ore 19:10 Christophe Bornet
> > > > > > > <bo...@gmail.com> ha scritto:
> > > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I have drafted PIP-208: HTTP Sink
> > > > > > > >
> > > > > > > > PIP link:
> > > > > > > > https://github.com/apache/pulsar/issues/17719
> > > > > > > >
> > > > > > > > Here's a copy of the contents of the GH issue for your
> > > references:
> > > > > > > >
> > > > > > > > ### Motivation
> > > > > > > >
> > > > > > > > Currently, when you want to consume from Pulsar topics in
> > > > > applications
> > > > > > > > written in languages that don't have a Pulsar driver supported,
> > > you
> > > > > need
> > > > > > > to
> > > > > > > > run some type of proxy like the WebSocket Proxy or Pulsar Beam.
> > > In
> > > > > > > > production this needs additional effort to deploy, scale, load
> > > > > balance,
> > > > > > > > monitor, and so on...
> > > > > > > > Pulsar IO is a framework that deals with all these operational
> > > > > subjects
> > > > > > > and
> > > > > > > > can be leveraged to provide a way to push messages to external
> > > > > systems
> > > > > > > > using HTTP, a protocol supported by every existing language and
> > > OS.
> > > > > > > >
> > > > > > > > ### Goal
> > > > > > > >
> > > > > > > > This proposal defines an HTTP Sink that sends the messages to a
> > > > > > > configured
> > > > > > > > URL.
> > > > > > > > It takes inspiration from [Pulsar Beam](
> > > > > > > > https://github.com/kafkaesque-io/pulsar-beam) and the
> > [Confluent
> > > > > HTTP
> > > > > > > Sink
> > > > > > > > connector](
> > > > > > > >
> > > > >
> > https://docs.confluent.io/kafka-connectors/http/current/overview.html
> > > ).
> > > > > > > >
> > > > > > > >
> > > > > > > > ### Implementation
> > > > > > > >
> > > > > > > > A `pulsar-io-http` module will be added to `pulsar-io`.
> > > > > > > > On building the project `pulsar-io-http-{version}.nar` will be
> > > > built
> > > > > and
> > > > > > > > added to the `pulsar-all` distribution.
> > > > > > > > The name of the Sink will be `http`.
> > > > > > > >
> > > > > > > > The HTTP Sink pushes records to any HTTP server with the record
> > > > > value in
> > > > > > > > the body of a POST method.
> > > > > > > > The body of the HTTP request is the JSON representation of the
> > > > record
> > > > > > > value.
> > > > > > >
> > > > > > > What do you mean ?
> > > > > > > I think that this should depend on the Schema.
> > > > > > >
> > > > > > > BYTES SCHEMA -> I would push the raw message payload
> > > > > > > PRIMITIVE VALUES (long, integer, string) - > I would push the
> > JSON
> > > > > > > represantation
> > > > > > > JSON SCHEMA ->  push the raw message payload
> > > > > > > AVRO -> ?  convert to JSON ?
> > > > > > > PROTOBUF -> ? convert to JSON ?
> > > > > > > KEY-VALUE ?
> > > > > > >
> > > > > > > Probably we need some flag to define the behaviour for the non
> > > > trivial
> > > > > > > cases.
> > > > > > >
> > > > > > > The current impl chooses to serialize as JSON because it's a well
> > > > > > supported content-type on the server frameworks.
> > > > > > It's also to be consistent with existing HTTP Sinks such as Pulsar
> > > Bean
> > > > > and
> > > > > > Confluent HTTP Sink Connector.
> > > > > > The possibility to adapt the content-type to the schema is elegant
> > > and
> > > > > will
> > > > > > probably result in shorter payloads (but less readable) and I think
> > > it
> > > > > > could be done as a follow-up option.
> > > > > > It has indeed the problem of being difficult to do for KV schema.
> > > > > > For the content-type mappings I would do:
> > > > > > BYTES SCHEMA -> application/octet-stream (raw bytes)
> > > > > > PRIMITIVE VALUES (long, integer, string) - > text/plain
> > > > > > JSON ->  application/json
> > > > > > AVRO -> avro/binary
> > > > > > PROTOBUF -> probably application/octet-stream ?
> > > > > > KEY-VALUE ?
> > > > > >
> > > > > > Would also need to indicate the Schema-Type in the HTTP headers.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Some headers are added to the HTTP request:
> > > > > > > > * `PulsarTopic`: the topic of the record
> > > > > > > > * `PulsarKey`: the key of the record
> > > > > > > > * `PulsarEventTime`: the event time of the record
> > > > > > > > * `PulsarPublishTime`: the publish time of the record
> > > > > > > > * `PulsarMessageId`: the ID of the message contained in the
> > > record
> > > > > > > > * `PulsarProperties-*`: each record property is passed with the
> > > > > property
> > > > > > > > name prefixed by `PulsarProperties-`
> > > > > > > >
> > > > > > >
> > > > > > > Can we make the "Content-Type" configurable ?
> > > > > > >
> > > > > > Yes we can. But do we do it for the first iteration ?
> > > > > > If we do it, I would have an option to add some fix headers and the
> > > > user
> > > > > > can override the content-type.
> > > > > > If we go for a variable content-type depending on the schema, then
> > we
> > > > > could
> > > > > > have a map<SchemaType, content-type>
> > > > > >
> > > > > > > Can we make the HTTP METHOD configurable ?
> > > > > > >
> > > > > > Yes we can. But do we do it for the first iteration ?
> > > > > >
> > > > > > >
> > > > > > > > ### Alternatives
> > > > > > > >
> > > > > > > > Creating a separated project for this Sink is rejected since:
> > > > > > > > * this Sink is very useful for developers to test the Pulsar IO
> > > > > > > framework,
> > > > > > > > transform functions, and to make demos.
> > > > > > > > * the code has a very small footprint with no external
> > > > dependencies.
> > > > > > > > * it should be visible at the same level as other sinks
> > > > > > >
> > > > > > > 100% agreed !
> > > > > > >
> > > > > > > >
> > > > > > > > I'm looking forward the discussion.
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > >
> > > > > > > > Christophe Bornet
> > > > > > >
> > > > >
> > > >
> > >
> >

Re: [DISCUSS] PIP-208: HTTP Sink

Posted by Christophe Bornet <bo...@gmail.com>.
>
> I think we could shrink the connectors a lot by removing from the NAR
> archives dependencies that are already present in the
> pulsar-functions-instance.
>
I mean java-instance.jar

Le mer. 19 oct. 2022 à 12:29, Christophe Bornet <bo...@gmail.com> a
écrit :

> The pulsar-all docker image is pretty big. I assume we will continue
>> to build and package additional connectors. It would be great to
>> figure out how to make it smaller at some point.
>>
> I think we could shrink the connectors a lot by removing from the NAR
> archives dependencies that are already present in the
> pulsar-functions-instance.
>
> Le mar. 18 oct. 2022 à 16:51, Michael Marshall <mm...@apache.org> a
> écrit :
>
>> Great discussion. I have one minor comment that is tangentially related.
>>
>> > On building the project `pulsar-io-http-{version}.nar` will be built and
>> > added to the `pulsar-all` distribution.
>>
>> The pulsar-all docker image is pretty big. I assume we will continue
>> to build and package additional connectors. It would be great to
>> figure out how to make it smaller at some point.
>>
>> Thanks,
>> Michael
>>
>>
>> On Tue, Sep 27, 2022 at 9:27 AM Christophe Bornet
>> <bo...@gmail.com> wrote:
>> >
>> > Sure you can test with the Sink of my PR branch.
>> > Otherwise I'll do the test after ApacheCon.
>> >
>> > Le mar. 27 sept. 2022 à 12:57, tison <wa...@gmail.com> a écrit :
>> >
>> > > Yes. It's a potential use case for validating the implementation. If
>> you
>> > > don't have time to try it out, I can schedule some time to demo it
>> with a
>> > > prototype HTTP sink or after the patch gets merged :)
>> > >
>> > > Best,
>> > > tison.
>> > >
>> > >
>> > > Christophe Bornet <bo...@gmail.com> 于2022年9月27日周二 18:51写道:
>> > >
>> > > > Hi Tison,
>> > > >
>> > > > Very interesting and shows the value of such a HTTP Sink.
>> > > > The Pulsar HTTP Sink should work OOTB with ClickHouse. I don't have
>> time
>> > > to
>> > > > do the test right now, so would someone want to do it ?
>> > > >
>> > > > Best regards.
>> > > >
>> > > > Christophe Bornet
>> > > >
>> > > > Le mar. 27 sept. 2022 à 12:31, tison <wa...@gmail.com> a
>> écrit :
>> > > >
>> > > > > Hi Christophe,
>> > > > >
>> > > > > Thanks for starting this proposal. It looks cool.
>> > > > >
>> > > > > I'd suggest one real-world integration test you can make use of:
>> > > > >
>> https://clickhouse.com/docs/en/integrations/kafka/kafka-connect-http
>> > > > > (replace source kafka with pulsar).
>> > > > >
>> > > > > Best,
>> > > > > tison.
>> > > > >
>> > > > >
>> > > > > Enrico Olivelli <eo...@gmail.com> 于2022年9月27日周二 18:04写道:
>> > > > >
>> > > > > > Thanks for your answers.
>> > > > > > I am fine with the current proposal.
>> > > > > > We can enhance it as follow up work
>> > > > > >
>> > > > > > Enrico
>> > > > > >
>> > > > > > Il giorno ven 23 set 2022 alle ore 19:20 Christophe Bornet
>> > > > > > <bo...@gmail.com> ha scritto:
>> > > > > > >
>> > > > > > > Thanks for your feedback Enrico.
>> > > > > > > My answers to your comments below
>> > > > > > >
>> > > > > > > BR
>> > > > > > >
>> > > > > > > Christophe
>> > > > > > >
>> > > > > > > Le mar. 20 sept. 2022 à 14:16, Enrico Olivelli <
>> > > eolivelli@gmail.com>
>> > > > a
>> > > > > > > écrit :
>> > > > > > >
>> > > > > > > > Christophe,
>> > > > > > > > very good initiative!
>> > > > > > > >
>> > > > > > > > I support it
>> > > > > > > > Some comments inline below
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Enrico
>> > > > > > > >
>> > > > > > > > Il giorno lun 19 set 2022 alle ore 19:10 Christophe Bornet
>> > > > > > > > <bo...@gmail.com> ha scritto:
>> > > > > > > > >
>> > > > > > > > > Hi all,
>> > > > > > > > >
>> > > > > > > > > I have drafted PIP-208: HTTP Sink
>> > > > > > > > >
>> > > > > > > > > PIP link:
>> > > > > > > > > https://github.com/apache/pulsar/issues/17719
>> > > > > > > > >
>> > > > > > > > > Here's a copy of the contents of the GH issue for your
>> > > > references:
>> > > > > > > > >
>> > > > > > > > > ### Motivation
>> > > > > > > > >
>> > > > > > > > > Currently, when you want to consume from Pulsar topics in
>> > > > > > applications
>> > > > > > > > > written in languages that don't have a Pulsar driver
>> supported,
>> > > > you
>> > > > > > need
>> > > > > > > > to
>> > > > > > > > > run some type of proxy like the WebSocket Proxy or Pulsar
>> Beam.
>> > > > In
>> > > > > > > > > production this needs additional effort to deploy, scale,
>> load
>> > > > > > balance,
>> > > > > > > > > monitor, and so on...
>> > > > > > > > > Pulsar IO is a framework that deals with all these
>> operational
>> > > > > > subjects
>> > > > > > > > and
>> > > > > > > > > can be leveraged to provide a way to push messages to
>> external
>> > > > > > systems
>> > > > > > > > > using HTTP, a protocol supported by every existing
>> language and
>> > > > OS.
>> > > > > > > > >
>> > > > > > > > > ### Goal
>> > > > > > > > >
>> > > > > > > > > This proposal defines an HTTP Sink that sends the
>> messages to a
>> > > > > > > > configured
>> > > > > > > > > URL.
>> > > > > > > > > It takes inspiration from [Pulsar Beam](
>> > > > > > > > > https://github.com/kafkaesque-io/pulsar-beam) and the
>> > > [Confluent
>> > > > > > HTTP
>> > > > > > > > Sink
>> > > > > > > > > connector](
>> > > > > > > > >
>> > > > > >
>> > > https://docs.confluent.io/kafka-connectors/http/current/overview.html
>> > > > ).
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > ### Implementation
>> > > > > > > > >
>> > > > > > > > > A `pulsar-io-http` module will be added to `pulsar-io`.
>> > > > > > > > > On building the project `pulsar-io-http-{version}.nar`
>> will be
>> > > > > built
>> > > > > > and
>> > > > > > > > > added to the `pulsar-all` distribution.
>> > > > > > > > > The name of the Sink will be `http`.
>> > > > > > > > >
>> > > > > > > > > The HTTP Sink pushes records to any HTTP server with the
>> record
>> > > > > > value in
>> > > > > > > > > the body of a POST method.
>> > > > > > > > > The body of the HTTP request is the JSON representation
>> of the
>> > > > > record
>> > > > > > > > value.
>> > > > > > > >
>> > > > > > > > What do you mean ?
>> > > > > > > > I think that this should depend on the Schema.
>> > > > > > > >
>> > > > > > > > BYTES SCHEMA -> I would push the raw message payload
>> > > > > > > > PRIMITIVE VALUES (long, integer, string) - > I would push
>> the
>> > > JSON
>> > > > > > > > represantation
>> > > > > > > > JSON SCHEMA ->  push the raw message payload
>> > > > > > > > AVRO -> ?  convert to JSON ?
>> > > > > > > > PROTOBUF -> ? convert to JSON ?
>> > > > > > > > KEY-VALUE ?
>> > > > > > > >
>> > > > > > > > Probably we need some flag to define the behaviour for the
>> non
>> > > > > trivial
>> > > > > > > > cases.
>> > > > > > > >
>> > > > > > > > The current impl chooses to serialize as JSON because it's
>> a well
>> > > > > > > supported content-type on the server frameworks.
>> > > > > > > It's also to be consistent with existing HTTP Sinks such as
>> Pulsar
>> > > > Bean
>> > > > > > and
>> > > > > > > Confluent HTTP Sink Connector.
>> > > > > > > The possibility to adapt the content-type to the schema is
>> elegant
>> > > > and
>> > > > > > will
>> > > > > > > probably result in shorter payloads (but less readable) and I
>> think
>> > > > it
>> > > > > > > could be done as a follow-up option.
>> > > > > > > It has indeed the problem of being difficult to do for KV
>> schema.
>> > > > > > > For the content-type mappings I would do:
>> > > > > > > BYTES SCHEMA -> application/octet-stream (raw bytes)
>> > > > > > > PRIMITIVE VALUES (long, integer, string) - > text/plain
>> > > > > > > JSON ->  application/json
>> > > > > > > AVRO -> avro/binary
>> > > > > > > PROTOBUF -> probably application/octet-stream ?
>> > > > > > > KEY-VALUE ?
>> > > > > > >
>> > > > > > > Would also need to indicate the Schema-Type in the HTTP
>> headers.
>> > > > > > >
>> > > > > > >
>> > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Some headers are added to the HTTP request:
>> > > > > > > > > * `PulsarTopic`: the topic of the record
>> > > > > > > > > * `PulsarKey`: the key of the record
>> > > > > > > > > * `PulsarEventTime`: the event time of the record
>> > > > > > > > > * `PulsarPublishTime`: the publish time of the record
>> > > > > > > > > * `PulsarMessageId`: the ID of the message contained in
>> the
>> > > > record
>> > > > > > > > > * `PulsarProperties-*`: each record property is passed
>> with the
>> > > > > > property
>> > > > > > > > > name prefixed by `PulsarProperties-`
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > Can we make the "Content-Type" configurable ?
>> > > > > > > >
>> > > > > > > Yes we can. But do we do it for the first iteration ?
>> > > > > > > If we do it, I would have an option to add some fix headers
>> and the
>> > > > > user
>> > > > > > > can override the content-type.
>> > > > > > > If we go for a variable content-type depending on the schema,
>> then
>> > > we
>> > > > > > could
>> > > > > > > have a map<SchemaType, content-type>
>> > > > > > >
>> > > > > > > > Can we make the HTTP METHOD configurable ?
>> > > > > > > >
>> > > > > > > Yes we can. But do we do it for the first iteration ?
>> > > > > > >
>> > > > > > > >
>> > > > > > > > > ### Alternatives
>> > > > > > > > >
>> > > > > > > > > Creating a separated project for this Sink is rejected
>> since:
>> > > > > > > > > * this Sink is very useful for developers to test the
>> Pulsar IO
>> > > > > > > > framework,
>> > > > > > > > > transform functions, and to make demos.
>> > > > > > > > > * the code has a very small footprint with no external
>> > > > > dependencies.
>> > > > > > > > > * it should be visible at the same level as other sinks
>> > > > > > > >
>> > > > > > > > 100% agreed !
>> > > > > > > >
>> > > > > > > > >
>> > > > > > > > > I'm looking forward the discussion.
>> > > > > > > > >
>> > > > > > > > > Best regards,
>> > > > > > > > >
>> > > > > > > > > Christophe Bornet
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>>
>
> Le mar. 18 oct. 2022 à 16:51, Michael Marshall <mm...@apache.org> a
> écrit :
>
>> Great discussion. I have one minor comment that is tangentially related.
>>
>> > On building the project `pulsar-io-http-{version}.nar` will be built and
>> > added to the `pulsar-all` distribution.
>>
>> The pulsar-all docker image is pretty big. I assume we will continue
>> to build and package additional connectors. It would be great to
>> figure out how to make it smaller at some point.
>>
>> Thanks,
>> Michael
>>
>>
>> On Tue, Sep 27, 2022 at 9:27 AM Christophe Bornet
>> <bo...@gmail.com> wrote:
>> >
>> > Sure you can test with the Sink of my PR branch.
>> > Otherwise I'll do the test after ApacheCon.
>> >
>> > Le mar. 27 sept. 2022 à 12:57, tison <wa...@gmail.com> a écrit :
>> >
>> > > Yes. It's a potential use case for validating the implementation. If
>> you
>> > > don't have time to try it out, I can schedule some time to demo it
>> with a
>> > > prototype HTTP sink or after the patch gets merged :)
>> > >
>> > > Best,
>> > > tison.
>> > >
>> > >
>> > > Christophe Bornet <bo...@gmail.com> 于2022年9月27日周二 18:51写道:
>> > >
>> > > > Hi Tison,
>> > > >
>> > > > Very interesting and shows the value of such a HTTP Sink.
>> > > > The Pulsar HTTP Sink should work OOTB with ClickHouse. I don't have
>> time
>> > > to
>> > > > do the test right now, so would someone want to do it ?
>> > > >
>> > > > Best regards.
>> > > >
>> > > > Christophe Bornet
>> > > >
>> > > > Le mar. 27 sept. 2022 à 12:31, tison <wa...@gmail.com> a
>> écrit :
>> > > >
>> > > > > Hi Christophe,
>> > > > >
>> > > > > Thanks for starting this proposal. It looks cool.
>> > > > >
>> > > > > I'd suggest one real-world integration test you can make use of:
>> > > > >
>> https://clickhouse.com/docs/en/integrations/kafka/kafka-connect-http
>> > > > > (replace source kafka with pulsar).
>> > > > >
>> > > > > Best,
>> > > > > tison.
>> > > > >
>> > > > >
>> > > > > Enrico Olivelli <eo...@gmail.com> 于2022年9月27日周二 18:04写道:
>> > > > >
>> > > > > > Thanks for your answers.
>> > > > > > I am fine with the current proposal.
>> > > > > > We can enhance it as follow up work
>> > > > > >
>> > > > > > Enrico
>> > > > > >
>> > > > > > Il giorno ven 23 set 2022 alle ore 19:20 Christophe Bornet
>> > > > > > <bo...@gmail.com> ha scritto:
>> > > > > > >
>> > > > > > > Thanks for your feedback Enrico.
>> > > > > > > My answers to your comments below
>> > > > > > >
>> > > > > > > BR
>> > > > > > >
>> > > > > > > Christophe
>> > > > > > >
>> > > > > > > Le mar. 20 sept. 2022 à 14:16, Enrico Olivelli <
>> > > eolivelli@gmail.com>
>> > > > a
>> > > > > > > écrit :
>> > > > > > >
>> > > > > > > > Christophe,
>> > > > > > > > very good initiative!
>> > > > > > > >
>> > > > > > > > I support it
>> > > > > > > > Some comments inline below
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Enrico
>> > > > > > > >
>> > > > > > > > Il giorno lun 19 set 2022 alle ore 19:10 Christophe Bornet
>> > > > > > > > <bo...@gmail.com> ha scritto:
>> > > > > > > > >
>> > > > > > > > > Hi all,
>> > > > > > > > >
>> > > > > > > > > I have drafted PIP-208: HTTP Sink
>> > > > > > > > >
>> > > > > > > > > PIP link:
>> > > > > > > > > https://github.com/apache/pulsar/issues/17719
>> > > > > > > > >
>> > > > > > > > > Here's a copy of the contents of the GH issue for your
>> > > > references:
>> > > > > > > > >
>> > > > > > > > > ### Motivation
>> > > > > > > > >
>> > > > > > > > > Currently, when you want to consume from Pulsar topics in
>> > > > > > applications
>> > > > > > > > > written in languages that don't have a Pulsar driver
>> supported,
>> > > > you
>> > > > > > need
>> > > > > > > > to
>> > > > > > > > > run some type of proxy like the WebSocket Proxy or Pulsar
>> Beam.
>> > > > In
>> > > > > > > > > production this needs additional effort to deploy, scale,
>> load
>> > > > > > balance,
>> > > > > > > > > monitor, and so on...
>> > > > > > > > > Pulsar IO is a framework that deals with all these
>> operational
>> > > > > > subjects
>> > > > > > > > and
>> > > > > > > > > can be leveraged to provide a way to push messages to
>> external
>> > > > > > systems
>> > > > > > > > > using HTTP, a protocol supported by every existing
>> language and
>> > > > OS.
>> > > > > > > > >
>> > > > > > > > > ### Goal
>> > > > > > > > >
>> > > > > > > > > This proposal defines an HTTP Sink that sends the
>> messages to a
>> > > > > > > > configured
>> > > > > > > > > URL.
>> > > > > > > > > It takes inspiration from [Pulsar Beam](
>> > > > > > > > > https://github.com/kafkaesque-io/pulsar-beam) and the
>> > > [Confluent
>> > > > > > HTTP
>> > > > > > > > Sink
>> > > > > > > > > connector](
>> > > > > > > > >
>> > > > > >
>> > > https://docs.confluent.io/kafka-connectors/http/current/overview.html
>> > > > ).
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > ### Implementation
>> > > > > > > > >
>> > > > > > > > > A `pulsar-io-http` module will be added to `pulsar-io`.
>> > > > > > > > > On building the project `pulsar-io-http-{version}.nar`
>> will be
>> > > > > built
>> > > > > > and
>> > > > > > > > > added to the `pulsar-all` distribution.
>> > > > > > > > > The name of the Sink will be `http`.
>> > > > > > > > >
>> > > > > > > > > The HTTP Sink pushes records to any HTTP server with the
>> record
>> > > > > > value in
>> > > > > > > > > the body of a POST method.
>> > > > > > > > > The body of the HTTP request is the JSON representation
>> of the
>> > > > > record
>> > > > > > > > value.
>> > > > > > > >
>> > > > > > > > What do you mean ?
>> > > > > > > > I think that this should depend on the Schema.
>> > > > > > > >
>> > > > > > > > BYTES SCHEMA -> I would push the raw message payload
>> > > > > > > > PRIMITIVE VALUES (long, integer, string) - > I would push
>> the
>> > > JSON
>> > > > > > > > represantation
>> > > > > > > > JSON SCHEMA ->  push the raw message payload
>> > > > > > > > AVRO -> ?  convert to JSON ?
>> > > > > > > > PROTOBUF -> ? convert to JSON ?
>> > > > > > > > KEY-VALUE ?
>> > > > > > > >
>> > > > > > > > Probably we need some flag to define the behaviour for the
>> non
>> > > > > trivial
>> > > > > > > > cases.
>> > > > > > > >
>> > > > > > > > The current impl chooses to serialize as JSON because it's
>> a well
>> > > > > > > supported content-type on the server frameworks.
>> > > > > > > It's also to be consistent with existing HTTP Sinks such as
>> Pulsar
>> > > > Bean
>> > > > > > and
>> > > > > > > Confluent HTTP Sink Connector.
>> > > > > > > The possibility to adapt the content-type to the schema is
>> elegant
>> > > > and
>> > > > > > will
>> > > > > > > probably result in shorter payloads (but less readable) and I
>> think
>> > > > it
>> > > > > > > could be done as a follow-up option.
>> > > > > > > It has indeed the problem of being difficult to do for KV
>> schema.
>> > > > > > > For the content-type mappings I would do:
>> > > > > > > BYTES SCHEMA -> application/octet-stream (raw bytes)
>> > > > > > > PRIMITIVE VALUES (long, integer, string) - > text/plain
>> > > > > > > JSON ->  application/json
>> > > > > > > AVRO -> avro/binary
>> > > > > > > PROTOBUF -> probably application/octet-stream ?
>> > > > > > > KEY-VALUE ?
>> > > > > > >
>> > > > > > > Would also need to indicate the Schema-Type in the HTTP
>> headers.
>> > > > > > >
>> > > > > > >
>> > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Some headers are added to the HTTP request:
>> > > > > > > > > * `PulsarTopic`: the topic of the record
>> > > > > > > > > * `PulsarKey`: the key of the record
>> > > > > > > > > * `PulsarEventTime`: the event time of the record
>> > > > > > > > > * `PulsarPublishTime`: the publish time of the record
>> > > > > > > > > * `PulsarMessageId`: the ID of the message contained in
>> the
>> > > > record
>> > > > > > > > > * `PulsarProperties-*`: each record property is passed
>> with the
>> > > > > > property
>> > > > > > > > > name prefixed by `PulsarProperties-`
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > Can we make the "Content-Type" configurable ?
>> > > > > > > >
>> > > > > > > Yes we can. But do we do it for the first iteration ?
>> > > > > > > If we do it, I would have an option to add some fix headers
>> and the
>> > > > > user
>> > > > > > > can override the content-type.
>> > > > > > > If we go for a variable content-type depending on the schema,
>> then
>> > > we
>> > > > > > could
>> > > > > > > have a map<SchemaType, content-type>
>> > > > > > >
>> > > > > > > > Can we make the HTTP METHOD configurable ?
>> > > > > > > >
>> > > > > > > Yes we can. But do we do it for the first iteration ?
>> > > > > > >
>> > > > > > > >
>> > > > > > > > > ### Alternatives
>> > > > > > > > >
>> > > > > > > > > Creating a separated project for this Sink is rejected
>> since:
>> > > > > > > > > * this Sink is very useful for developers to test the
>> Pulsar IO
>> > > > > > > > framework,
>> > > > > > > > > transform functions, and to make demos.
>> > > > > > > > > * the code has a very small footprint with no external
>> > > > > dependencies.
>> > > > > > > > > * it should be visible at the same level as other sinks
>> > > > > > > >
>> > > > > > > > 100% agreed !
>> > > > > > > >
>> > > > > > > > >
>> > > > > > > > > I'm looking forward the discussion.
>> > > > > > > > >
>> > > > > > > > > Best regards,
>> > > > > > > > >
>> > > > > > > > > Christophe Bornet
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>>
>

Re: [DISCUSS] PIP-208: HTTP Sink

Posted by Christophe Bornet <bo...@gmail.com>.
>
> The pulsar-all docker image is pretty big. I assume we will continue
> to build and package additional connectors. It would be great to
> figure out how to make it smaller at some point.
>
I think we could shrink the connectors a lot by removing from the NAR
archives dependencies that are already present in the
pulsar-functions-instance.

Le mar. 18 oct. 2022 à 16:51, Michael Marshall <mm...@apache.org> a
écrit :

> Great discussion. I have one minor comment that is tangentially related.
>
> > On building the project `pulsar-io-http-{version}.nar` will be built and
> > added to the `pulsar-all` distribution.
>
> The pulsar-all docker image is pretty big. I assume we will continue
> to build and package additional connectors. It would be great to
> figure out how to make it smaller at some point.
>
> Thanks,
> Michael
>
>
> On Tue, Sep 27, 2022 at 9:27 AM Christophe Bornet
> <bo...@gmail.com> wrote:
> >
> > Sure you can test with the Sink of my PR branch.
> > Otherwise I'll do the test after ApacheCon.
> >
> > Le mar. 27 sept. 2022 à 12:57, tison <wa...@gmail.com> a écrit :
> >
> > > Yes. It's a potential use case for validating the implementation. If
> you
> > > don't have time to try it out, I can schedule some time to demo it
> with a
> > > prototype HTTP sink or after the patch gets merged :)
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Christophe Bornet <bo...@gmail.com> 于2022年9月27日周二 18:51写道:
> > >
> > > > Hi Tison,
> > > >
> > > > Very interesting and shows the value of such a HTTP Sink.
> > > > The Pulsar HTTP Sink should work OOTB with ClickHouse. I don't have
> time
> > > to
> > > > do the test right now, so would someone want to do it ?
> > > >
> > > > Best regards.
> > > >
> > > > Christophe Bornet
> > > >
> > > > Le mar. 27 sept. 2022 à 12:31, tison <wa...@gmail.com> a écrit
> :
> > > >
> > > > > Hi Christophe,
> > > > >
> > > > > Thanks for starting this proposal. It looks cool.
> > > > >
> > > > > I'd suggest one real-world integration test you can make use of:
> > > > >
> https://clickhouse.com/docs/en/integrations/kafka/kafka-connect-http
> > > > > (replace source kafka with pulsar).
> > > > >
> > > > > Best,
> > > > > tison.
> > > > >
> > > > >
> > > > > Enrico Olivelli <eo...@gmail.com> 于2022年9月27日周二 18:04写道:
> > > > >
> > > > > > Thanks for your answers.
> > > > > > I am fine with the current proposal.
> > > > > > We can enhance it as follow up work
> > > > > >
> > > > > > Enrico
> > > > > >
> > > > > > Il giorno ven 23 set 2022 alle ore 19:20 Christophe Bornet
> > > > > > <bo...@gmail.com> ha scritto:
> > > > > > >
> > > > > > > Thanks for your feedback Enrico.
> > > > > > > My answers to your comments below
> > > > > > >
> > > > > > > BR
> > > > > > >
> > > > > > > Christophe
> > > > > > >
> > > > > > > Le mar. 20 sept. 2022 à 14:16, Enrico Olivelli <
> > > eolivelli@gmail.com>
> > > > a
> > > > > > > écrit :
> > > > > > >
> > > > > > > > Christophe,
> > > > > > > > very good initiative!
> > > > > > > >
> > > > > > > > I support it
> > > > > > > > Some comments inline below
> > > > > > > >
> > > > > > > >
> > > > > > > > Enrico
> > > > > > > >
> > > > > > > > Il giorno lun 19 set 2022 alle ore 19:10 Christophe Bornet
> > > > > > > > <bo...@gmail.com> ha scritto:
> > > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I have drafted PIP-208: HTTP Sink
> > > > > > > > >
> > > > > > > > > PIP link:
> > > > > > > > > https://github.com/apache/pulsar/issues/17719
> > > > > > > > >
> > > > > > > > > Here's a copy of the contents of the GH issue for your
> > > > references:
> > > > > > > > >
> > > > > > > > > ### Motivation
> > > > > > > > >
> > > > > > > > > Currently, when you want to consume from Pulsar topics in
> > > > > > applications
> > > > > > > > > written in languages that don't have a Pulsar driver
> supported,
> > > > you
> > > > > > need
> > > > > > > > to
> > > > > > > > > run some type of proxy like the WebSocket Proxy or Pulsar
> Beam.
> > > > In
> > > > > > > > > production this needs additional effort to deploy, scale,
> load
> > > > > > balance,
> > > > > > > > > monitor, and so on...
> > > > > > > > > Pulsar IO is a framework that deals with all these
> operational
> > > > > > subjects
> > > > > > > > and
> > > > > > > > > can be leveraged to provide a way to push messages to
> external
> > > > > > systems
> > > > > > > > > using HTTP, a protocol supported by every existing
> language and
> > > > OS.
> > > > > > > > >
> > > > > > > > > ### Goal
> > > > > > > > >
> > > > > > > > > This proposal defines an HTTP Sink that sends the messages
> to a
> > > > > > > > configured
> > > > > > > > > URL.
> > > > > > > > > It takes inspiration from [Pulsar Beam](
> > > > > > > > > https://github.com/kafkaesque-io/pulsar-beam) and the
> > > [Confluent
> > > > > > HTTP
> > > > > > > > Sink
> > > > > > > > > connector](
> > > > > > > > >
> > > > > >
> > > https://docs.confluent.io/kafka-connectors/http/current/overview.html
> > > > ).
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > ### Implementation
> > > > > > > > >
> > > > > > > > > A `pulsar-io-http` module will be added to `pulsar-io`.
> > > > > > > > > On building the project `pulsar-io-http-{version}.nar`
> will be
> > > > > built
> > > > > > and
> > > > > > > > > added to the `pulsar-all` distribution.
> > > > > > > > > The name of the Sink will be `http`.
> > > > > > > > >
> > > > > > > > > The HTTP Sink pushes records to any HTTP server with the
> record
> > > > > > value in
> > > > > > > > > the body of a POST method.
> > > > > > > > > The body of the HTTP request is the JSON representation of
> the
> > > > > record
> > > > > > > > value.
> > > > > > > >
> > > > > > > > What do you mean ?
> > > > > > > > I think that this should depend on the Schema.
> > > > > > > >
> > > > > > > > BYTES SCHEMA -> I would push the raw message payload
> > > > > > > > PRIMITIVE VALUES (long, integer, string) - > I would push the
> > > JSON
> > > > > > > > represantation
> > > > > > > > JSON SCHEMA ->  push the raw message payload
> > > > > > > > AVRO -> ?  convert to JSON ?
> > > > > > > > PROTOBUF -> ? convert to JSON ?
> > > > > > > > KEY-VALUE ?
> > > > > > > >
> > > > > > > > Probably we need some flag to define the behaviour for the
> non
> > > > > trivial
> > > > > > > > cases.
> > > > > > > >
> > > > > > > > The current impl chooses to serialize as JSON because it's a
> well
> > > > > > > supported content-type on the server frameworks.
> > > > > > > It's also to be consistent with existing HTTP Sinks such as
> Pulsar
> > > > Bean
> > > > > > and
> > > > > > > Confluent HTTP Sink Connector.
> > > > > > > The possibility to adapt the content-type to the schema is
> elegant
> > > > and
> > > > > > will
> > > > > > > probably result in shorter payloads (but less readable) and I
> think
> > > > it
> > > > > > > could be done as a follow-up option.
> > > > > > > It has indeed the problem of being difficult to do for KV
> schema.
> > > > > > > For the content-type mappings I would do:
> > > > > > > BYTES SCHEMA -> application/octet-stream (raw bytes)
> > > > > > > PRIMITIVE VALUES (long, integer, string) - > text/plain
> > > > > > > JSON ->  application/json
> > > > > > > AVRO -> avro/binary
> > > > > > > PROTOBUF -> probably application/octet-stream ?
> > > > > > > KEY-VALUE ?
> > > > > > >
> > > > > > > Would also need to indicate the Schema-Type in the HTTP
> headers.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Some headers are added to the HTTP request:
> > > > > > > > > * `PulsarTopic`: the topic of the record
> > > > > > > > > * `PulsarKey`: the key of the record
> > > > > > > > > * `PulsarEventTime`: the event time of the record
> > > > > > > > > * `PulsarPublishTime`: the publish time of the record
> > > > > > > > > * `PulsarMessageId`: the ID of the message contained in the
> > > > record
> > > > > > > > > * `PulsarProperties-*`: each record property is passed
> with the
> > > > > > property
> > > > > > > > > name prefixed by `PulsarProperties-`
> > > > > > > > >
> > > > > > > >
> > > > > > > > Can we make the "Content-Type" configurable ?
> > > > > > > >
> > > > > > > Yes we can. But do we do it for the first iteration ?
> > > > > > > If we do it, I would have an option to add some fix headers
> and the
> > > > > user
> > > > > > > can override the content-type.
> > > > > > > If we go for a variable content-type depending on the schema,
> then
> > > we
> > > > > > could
> > > > > > > have a map<SchemaType, content-type>
> > > > > > >
> > > > > > > > Can we make the HTTP METHOD configurable ?
> > > > > > > >
> > > > > > > Yes we can. But do we do it for the first iteration ?
> > > > > > >
> > > > > > > >
> > > > > > > > > ### Alternatives
> > > > > > > > >
> > > > > > > > > Creating a separated project for this Sink is rejected
> since:
> > > > > > > > > * this Sink is very useful for developers to test the
> Pulsar IO
> > > > > > > > framework,
> > > > > > > > > transform functions, and to make demos.
> > > > > > > > > * the code has a very small footprint with no external
> > > > > dependencies.
> > > > > > > > > * it should be visible at the same level as other sinks
> > > > > > > >
> > > > > > > > 100% agreed !
> > > > > > > >
> > > > > > > > >
> > > > > > > > > I'm looking forward the discussion.
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > >
> > > > > > > > > Christophe Bornet
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

Le mar. 18 oct. 2022 à 16:51, Michael Marshall <mm...@apache.org> a
écrit :

> Great discussion. I have one minor comment that is tangentially related.
>
> > On building the project `pulsar-io-http-{version}.nar` will be built and
> > added to the `pulsar-all` distribution.
>
> The pulsar-all docker image is pretty big. I assume we will continue
> to build and package additional connectors. It would be great to
> figure out how to make it smaller at some point.
>
> Thanks,
> Michael
>
>
> On Tue, Sep 27, 2022 at 9:27 AM Christophe Bornet
> <bo...@gmail.com> wrote:
> >
> > Sure you can test with the Sink of my PR branch.
> > Otherwise I'll do the test after ApacheCon.
> >
> > Le mar. 27 sept. 2022 à 12:57, tison <wa...@gmail.com> a écrit :
> >
> > > Yes. It's a potential use case for validating the implementation. If
> you
> > > don't have time to try it out, I can schedule some time to demo it
> with a
> > > prototype HTTP sink or after the patch gets merged :)
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Christophe Bornet <bo...@gmail.com> 于2022年9月27日周二 18:51写道:
> > >
> > > > Hi Tison,
> > > >
> > > > Very interesting and shows the value of such a HTTP Sink.
> > > > The Pulsar HTTP Sink should work OOTB with ClickHouse. I don't have
> time
> > > to
> > > > do the test right now, so would someone want to do it ?
> > > >
> > > > Best regards.
> > > >
> > > > Christophe Bornet
> > > >
> > > > Le mar. 27 sept. 2022 à 12:31, tison <wa...@gmail.com> a écrit
> :
> > > >
> > > > > Hi Christophe,
> > > > >
> > > > > Thanks for starting this proposal. It looks cool.
> > > > >
> > > > > I'd suggest one real-world integration test you can make use of:
> > > > >
> https://clickhouse.com/docs/en/integrations/kafka/kafka-connect-http
> > > > > (replace source kafka with pulsar).
> > > > >
> > > > > Best,
> > > > > tison.
> > > > >
> > > > >
> > > > > Enrico Olivelli <eo...@gmail.com> 于2022年9月27日周二 18:04写道:
> > > > >
> > > > > > Thanks for your answers.
> > > > > > I am fine with the current proposal.
> > > > > > We can enhance it as follow up work
> > > > > >
> > > > > > Enrico
> > > > > >
> > > > > > Il giorno ven 23 set 2022 alle ore 19:20 Christophe Bornet
> > > > > > <bo...@gmail.com> ha scritto:
> > > > > > >
> > > > > > > Thanks for your feedback Enrico.
> > > > > > > My answers to your comments below
> > > > > > >
> > > > > > > BR
> > > > > > >
> > > > > > > Christophe
> > > > > > >
> > > > > > > Le mar. 20 sept. 2022 à 14:16, Enrico Olivelli <
> > > eolivelli@gmail.com>
> > > > a
> > > > > > > écrit :
> > > > > > >
> > > > > > > > Christophe,
> > > > > > > > very good initiative!
> > > > > > > >
> > > > > > > > I support it
> > > > > > > > Some comments inline below
> > > > > > > >
> > > > > > > >
> > > > > > > > Enrico
> > > > > > > >
> > > > > > > > Il giorno lun 19 set 2022 alle ore 19:10 Christophe Bornet
> > > > > > > > <bo...@gmail.com> ha scritto:
> > > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I have drafted PIP-208: HTTP Sink
> > > > > > > > >
> > > > > > > > > PIP link:
> > > > > > > > > https://github.com/apache/pulsar/issues/17719
> > > > > > > > >
> > > > > > > > > Here's a copy of the contents of the GH issue for your
> > > > references:
> > > > > > > > >
> > > > > > > > > ### Motivation
> > > > > > > > >
> > > > > > > > > Currently, when you want to consume from Pulsar topics in
> > > > > > applications
> > > > > > > > > written in languages that don't have a Pulsar driver
> supported,
> > > > you
> > > > > > need
> > > > > > > > to
> > > > > > > > > run some type of proxy like the WebSocket Proxy or Pulsar
> Beam.
> > > > In
> > > > > > > > > production this needs additional effort to deploy, scale,
> load
> > > > > > balance,
> > > > > > > > > monitor, and so on...
> > > > > > > > > Pulsar IO is a framework that deals with all these
> operational
> > > > > > subjects
> > > > > > > > and
> > > > > > > > > can be leveraged to provide a way to push messages to
> external
> > > > > > systems
> > > > > > > > > using HTTP, a protocol supported by every existing
> language and
> > > > OS.
> > > > > > > > >
> > > > > > > > > ### Goal
> > > > > > > > >
> > > > > > > > > This proposal defines an HTTP Sink that sends the messages
> to a
> > > > > > > > configured
> > > > > > > > > URL.
> > > > > > > > > It takes inspiration from [Pulsar Beam](
> > > > > > > > > https://github.com/kafkaesque-io/pulsar-beam) and the
> > > [Confluent
> > > > > > HTTP
> > > > > > > > Sink
> > > > > > > > > connector](
> > > > > > > > >
> > > > > >
> > > https://docs.confluent.io/kafka-connectors/http/current/overview.html
> > > > ).
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > ### Implementation
> > > > > > > > >
> > > > > > > > > A `pulsar-io-http` module will be added to `pulsar-io`.
> > > > > > > > > On building the project `pulsar-io-http-{version}.nar`
> will be
> > > > > built
> > > > > > and
> > > > > > > > > added to the `pulsar-all` distribution.
> > > > > > > > > The name of the Sink will be `http`.
> > > > > > > > >
> > > > > > > > > The HTTP Sink pushes records to any HTTP server with the
> record
> > > > > > value in
> > > > > > > > > the body of a POST method.
> > > > > > > > > The body of the HTTP request is the JSON representation of
> the
> > > > > record
> > > > > > > > value.
> > > > > > > >
> > > > > > > > What do you mean ?
> > > > > > > > I think that this should depend on the Schema.
> > > > > > > >
> > > > > > > > BYTES SCHEMA -> I would push the raw message payload
> > > > > > > > PRIMITIVE VALUES (long, integer, string) - > I would push the
> > > JSON
> > > > > > > > represantation
> > > > > > > > JSON SCHEMA ->  push the raw message payload
> > > > > > > > AVRO -> ?  convert to JSON ?
> > > > > > > > PROTOBUF -> ? convert to JSON ?
> > > > > > > > KEY-VALUE ?
> > > > > > > >
> > > > > > > > Probably we need some flag to define the behaviour for the
> non
> > > > > trivial
> > > > > > > > cases.
> > > > > > > >
> > > > > > > > The current impl chooses to serialize as JSON because it's a
> well
> > > > > > > supported content-type on the server frameworks.
> > > > > > > It's also to be consistent with existing HTTP Sinks such as
> Pulsar
> > > > Bean
> > > > > > and
> > > > > > > Confluent HTTP Sink Connector.
> > > > > > > The possibility to adapt the content-type to the schema is
> elegant
> > > > and
> > > > > > will
> > > > > > > probably result in shorter payloads (but less readable) and I
> think
> > > > it
> > > > > > > could be done as a follow-up option.
> > > > > > > It has indeed the problem of being difficult to do for KV
> schema.
> > > > > > > For the content-type mappings I would do:
> > > > > > > BYTES SCHEMA -> application/octet-stream (raw bytes)
> > > > > > > PRIMITIVE VALUES (long, integer, string) - > text/plain
> > > > > > > JSON ->  application/json
> > > > > > > AVRO -> avro/binary
> > > > > > > PROTOBUF -> probably application/octet-stream ?
> > > > > > > KEY-VALUE ?
> > > > > > >
> > > > > > > Would also need to indicate the Schema-Type in the HTTP
> headers.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Some headers are added to the HTTP request:
> > > > > > > > > * `PulsarTopic`: the topic of the record
> > > > > > > > > * `PulsarKey`: the key of the record
> > > > > > > > > * `PulsarEventTime`: the event time of the record
> > > > > > > > > * `PulsarPublishTime`: the publish time of the record
> > > > > > > > > * `PulsarMessageId`: the ID of the message contained in the
> > > > record
> > > > > > > > > * `PulsarProperties-*`: each record property is passed
> with the
> > > > > > property
> > > > > > > > > name prefixed by `PulsarProperties-`
> > > > > > > > >
> > > > > > > >
> > > > > > > > Can we make the "Content-Type" configurable ?
> > > > > > > >
> > > > > > > Yes we can. But do we do it for the first iteration ?
> > > > > > > If we do it, I would have an option to add some fix headers
> and the
> > > > > user
> > > > > > > can override the content-type.
> > > > > > > If we go for a variable content-type depending on the schema,
> then
> > > we
> > > > > > could
> > > > > > > have a map<SchemaType, content-type>
> > > > > > >
> > > > > > > > Can we make the HTTP METHOD configurable ?
> > > > > > > >
> > > > > > > Yes we can. But do we do it for the first iteration ?
> > > > > > >
> > > > > > > >
> > > > > > > > > ### Alternatives
> > > > > > > > >
> > > > > > > > > Creating a separated project for this Sink is rejected
> since:
> > > > > > > > > * this Sink is very useful for developers to test the
> Pulsar IO
> > > > > > > > framework,
> > > > > > > > > transform functions, and to make demos.
> > > > > > > > > * the code has a very small footprint with no external
> > > > > dependencies.
> > > > > > > > > * it should be visible at the same level as other sinks
> > > > > > > >
> > > > > > > > 100% agreed !
> > > > > > > >
> > > > > > > > >
> > > > > > > > > I'm looking forward the discussion.
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > >
> > > > > > > > > Christophe Bornet
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>