You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Juan Romero <js...@gmail.com> on 2023/06/23 23:47:50 UTC

Create IO connector for HTTP or ParDO

Hi guys. I have a doubt related with it make sense to create an HTTP
connector in Apache Beam or simply I can create a PArdo Function that make
the http request. I want to know which advantages I would have creating an
IO HTTP connector.

Re: Create IO connector for HTTP or ParDO

Posted by Chamikara Jayalath via user <us...@beam.apache.org>.
Connectors are written using ParDos. A connector (source) may use a source
framework (Splittable DoFn is the recommended framework currently) or may
be written using regular ParDos. The main advantages of a source framework
are various features provided by such frameworks (progress reporting,
dynamic work rebalancing, checkpointing, backlog reporting etc.). If your
HTTP endpoint can be used to implement such features it makes sense to use
the source framework. Otherwise I would simply use a regular ParDos.

When it comes to sinks, the most important feature would be to write data
in an idempotent way in the presence of worker failures without writing
duplicate data to the HTTP endpoint. I'm not sure this can be done
efficiently without knowing more details about the nature of the endpoint.

Thanks,
Cham

On Fri, Jun 23, 2023 at 4:48 PM Juan Romero <js...@gmail.com> wrote:

> Hi guys. I have a doubt related with it make sense to create an HTTP
> connector in Apache Beam or simply I can create a PArdo Function that make
> the http request. I want to know which advantages I would have creating an
> IO HTTP connector.
>

Re: Create IO connector for HTTP or ParDO

Posted by John Casey via user <us...@beam.apache.org>.
I have a doc (
https://docs.google.com/document/d/1-WxZTNu9RrLhh5O7Dl5PbnKqz3e5gm1x3gDBBhszVF8/edit#heading=h.n02teqc95avo)
on writing an IO in Beam.

Some of it is specific to using SplittableDoFn, but most of it is
applicable to doing IO at scale in beam in general.

I hope this helps
John

On Mon, Jun 26, 2023 at 10:57 AM Juan Romero <js...@gmail.com> wrote:

> According what you said guys, creating a Http io connector makes sense if
> you have scenarios like these:
> - I want to monitor the number of http successful  or failed requests in a
> window of time.
> - I want to track transactions that require multiple http requests to be
> done and we need to save a state.
> - Implement idempotency to avoid duplicates in the http request in a
> window of time.
>
> Respect with the functionality to distribute the incoming request to
> different nodes, I think that in the current world it is more
> responsibility for the API side, because this is responsibility of a load
> balancer that receive the request and distribute them across multiple
> containers or VM.
>
> I want to validate your opinion with the examples I gave or if you have
> other more scenarios that help me to clarify my ideas.
>
> Thanks guys!
>
>
>
>
>
> El El vie, 23 de jun. de 2023 a la(s) 10:46 p.m., Jean-Baptiste Onofré <
> jb@nanthrax.net> escribió:
>
>> Hi,
>>
>> While ago (at the very early stage of Beam :)), I proposed to create a
>> HTTP/REST source/sink (we should still have the Jira :)).
>> However, we didn't have a consensus in terms of features (I proposed
>> something very simple). Splittable DoFn didn't exist at that time.
>>
>> So, if we want to move forward on HTTP/REST, we have to list the
>> features and expected behavior.
>>
>> Regards
>> JB
>>
>> On Sat, Jun 24, 2023 at 1:47 AM Juan Romero <js...@gmail.com> wrote:
>> >
>> > Hi guys. I have a doubt related with it make sense to create an HTTP
>> connector in Apache Beam or simply I can create a PArdo Function that make
>> the http request. I want to know which advantages I would have creating an
>> IO HTTP connector.
>>
>

Re: Create IO connector for HTTP or ParDO

Posted by Juan Romero <js...@gmail.com>.
According what you said guys, creating a Http io connector makes sense if
you have scenarios like these:
- I want to monitor the number of http successful  or failed requests in a
window of time.
- I want to track transactions that require multiple http requests to be
done and we need to save a state.
- Implement idempotency to avoid duplicates in the http request in a window
of time.

Respect with the functionality to distribute the incoming request to
different nodes, I think that in the current world it is more
responsibility for the API side, because this is responsibility of a load
balancer that receive the request and distribute them across multiple
containers or VM.

I want to validate your opinion with the examples I gave or if you have
other more scenarios that help me to clarify my ideas.

Thanks guys!





El El vie, 23 de jun. de 2023 a la(s) 10:46 p.m., Jean-Baptiste Onofré <
jb@nanthrax.net> escribió:

> Hi,
>
> While ago (at the very early stage of Beam :)), I proposed to create a
> HTTP/REST source/sink (we should still have the Jira :)).
> However, we didn't have a consensus in terms of features (I proposed
> something very simple). Splittable DoFn didn't exist at that time.
>
> So, if we want to move forward on HTTP/REST, we have to list the
> features and expected behavior.
>
> Regards
> JB
>
> On Sat, Jun 24, 2023 at 1:47 AM Juan Romero <js...@gmail.com> wrote:
> >
> > Hi guys. I have a doubt related with it make sense to create an HTTP
> connector in Apache Beam or simply I can create a PArdo Function that make
> the http request. I want to know which advantages I would have creating an
> IO HTTP connector.
>

Re: Create IO connector for HTTP or ParDO

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi,

While ago (at the very early stage of Beam :)), I proposed to create a
HTTP/REST source/sink (we should still have the Jira :)).
However, we didn't have a consensus in terms of features (I proposed
something very simple). Splittable DoFn didn't exist at that time.

So, if we want to move forward on HTTP/REST, we have to list the
features and expected behavior.

Regards
JB

On Sat, Jun 24, 2023 at 1:47 AM Juan Romero <js...@gmail.com> wrote:
>
> Hi guys. I have a doubt related with it make sense to create an HTTP connector in Apache Beam or simply I can create a PArdo Function that make the http request. I want to know which advantages I would have creating an IO HTTP connector.