You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Anjana Pydi <an...@bahwancybertek.com> on 2019/06/02 00:49:47 UTC

How to build a beam python pipeline which does GET/POST request to API's

Hi,

I have a requirement to create an apache beam python pipeline to read a JSON from an API endpoint, transform it (add/remove few fields)and send the transformed JSON to another API endpoint.

Can anyone please provide some suggestions on how to do it.

Thanks,
Anjana
----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.

RE: [Sender Auth Failure] Re: How to build a beam python pipeline which does GET/POST request to API's

Posted by Anjana Pydi <an...@bahwancybertek.com>.
Hi Soliman,

Thanks for providing the example !! I tried this and it worked out.

Regards,
Anjana
________________________________
From: Soliman ElSaber [soliman@mindvalley.com]
Sent: Wednesday, June 05, 2019 8:56 PM
To: user@beam.apache.org
Cc: Anjana Pydi
Subject: Re: [Sender Auth Failure] Re: How to build a beam python pipeline which does GET/POST request to API's

Hi Anjana,

I used this code before to get some data form API call and store it into BigQuery using Apache Beam



def get_api_data(data):
    data_every_sec = requests.get("https://min-api.cryptocompare.com/data/price?fsym=ETH&tsyms=BTC,USD,EUR").json()
    return [data_every_sec]
# expected: {u'USD': 210.76, u'BTC': 0.03273, u'EUR': 184.02}

def parse_btc(btc_item):
    usd, btc, eur = btc_item['USD'], btc_item['BTC'], btc_item['EUR']
    return [(btc,usd,eur)]

dayData = (p
           | 'get data' >> beam.ParDo(get_api_data)
           | 'parse btc' >> beam.ParDo(parse_btc)
           | 'Write' >> beam.io.WriteToBigQuery(...)
           )

Hope it will help you...



On Tue, Jun 4, 2019 at 8:01 PM Anjana Pydi <an...@bahwancybertek.com>> wrote:
Hi Ankur,

Thanks for the suggestion.

Could you please provide me any examples if you know which are close to this use case.

Regards,
Anjana
________________________________
From: Ankur Goenka [goenka@google.com<ma...@google.com>]
Sent: Monday, June 03, 2019 4:27 PM
To: user@beam.apache.org<ma...@beam.apache.org>
Subject: [Sender Auth Failure] Re: How to build a beam python pipeline which does GET/POST request to API's

By looking at your usecase, the whole processing logic seems to be very custom.
I would recommend using ParDo's to express your use case. If the processing for individual dictionary is expensive then you can potentially use a reshuffle operation to distribute the updation of dictionary over multiple workers.

Note: As you are going to make write API calls your self, in case of worker failure, your transform can be executed multiple times.

On Mon, Jun 3, 2019 at 11:41 AM Anjana Pydi <an...@bahwancybertek.com>> wrote:
Hi Ankur,

Thanks for reply. Please find responses updated in below mail.

Thanks,
Anjana
________________________________
From: Ankur Goenka [goenka@google.com<ma...@google.com>]
Sent: Monday, June 03, 2019 11:01 AM
To: user@beam.apache.org<ma...@beam.apache.org>
Subject: Re: How to build a beam python pipeline which does GET/POST request to API's

Thanks for providing more information.

Some follow up questions/comments
1. Call an API which would provide a dictionary as response.
Question: Do you need to make multiple of these API calls? If yes, what distinguishes API call1 from call2? If its the input to the API, then can you provide the inputs to in a file etc? What I am trying to identify is an input source to the pipeline so that beam can distribute the work.
Answer : When an API call is made, it can provide a list of dictionaries as response, we have to go through every dictionary, do the same transformations for each and send it.
2. Transform dictionary to add / remove few keys.
3. Send transformed dictionary as JSON to an API which prints this JSON as output.
Question: Are these write operation idempotent? As you are doing your own api calls, its possible that after a failure, the calls are done again for the same input. If write calls are not idempotent then their can be duplicate data.
Answer : Suppose, if I receive a list of 1000 dictionaries as response when I called API in point1, I should do only 1000 write operations respectively to each input. If there is a failure for any input, only that should not be posted and remaining should be posted successfully.

On Sat, Jun 1, 2019 at 8:13 PM Anjana Pydi <an...@bahwancybertek.com>> wrote:
Hi Ankur,

Thanks for the reply! Below is more details of the usecase:

1. Call an API which would provide a dictionary as response.
2. Transform dictionary to add / remove few keys.
3. Send transformed dictionary as JSON to an API which prints this JSON as output.

Please let me know in case of any clarifications.

Thanks,
Anjana
________________________________
From: Ankur Goenka [goenka@google.com<ma...@google.com>]
Sent: Saturday, June 01, 2019 6:47 PM
To: user@beam.apache.org<ma...@beam.apache.org>
Subject: Re: How to build a beam python pipeline which does GET/POST request to API's

Hi Anjana,

You can write your API logic in a ParDo and subsequently pass the elements to other ParDos to transform and eventually make an API call to to another endpoint.

However, this might not be a good fit for Beam as the input is not well defined and hence scaling and "once processing" of elements will not be possible as their is no well defined input.

It will be better to elaborate a bit more on the usecase for better suggestions.

Thanks,
Ankur

On Sat, Jun 1, 2019 at 5:50 PM Anjana Pydi <an...@bahwancybertek.com>> wrote:
Hi,

I have a requirement to create an apache beam python pipeline to read a JSON from an API endpoint, transform it (add/remove few fields)and send the transformed JSON to another API endpoint.

Can anyone please provide some suggestions on how to do it.

Thanks,
Anjana
----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.


--
Soliman ElSaber
Data Engineer
www.mindvalley.com<http://www.mindvalley.com>
----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.

Re: [Sender Auth Failure] Re: How to build a beam python pipeline which does GET/POST request to API's

Posted by Soliman ElSaber <so...@mindvalley.com>.
Hi Anjana,

I used this code before to get some data form API call and store it into
BigQuery using Apache Beam



def get_api_data(data):
    data_every_sec =
requests.get("https://min-api.cryptocompare.com/data/price?fsym=ETH&tsyms=BTC,USD,EUR").json()
    return [data_every_sec]
# expected: {u'USD': 210.76, u'BTC': 0.03273, u'EUR': 184.02}

def parse_btc(btc_item):
    usd, btc, eur = btc_item['USD'], btc_item['BTC'], btc_item['EUR']
    return [(btc,usd,eur)]

dayData = (p
           | 'get data' >> beam.ParDo(get_api_data)
           | 'parse btc' >> beam.ParDo(parse_btc)
           | 'Write' >> beam.io.WriteToBigQuery(...)
           )

Hope it will help you...



On Tue, Jun 4, 2019 at 8:01 PM Anjana Pydi <an...@bahwancybertek.com>
wrote:

> Hi Ankur,
>
> Thanks for the suggestion.
>
> Could you please provide me any examples if you know which are close to
> this use case.
>
> Regards,
> Anjana
> ------------------------------
> *From:* Ankur Goenka [goenka@google.com]
> *Sent:* Monday, June 03, 2019 4:27 PM
> *To:* user@beam.apache.org
> *Subject:* [Sender Auth Failure] Re: How to build a beam python pipeline
> which does GET/POST request to API's
>
> By looking at your usecase, the whole processing logic seems to be very
> custom.
> I would recommend using ParDo's to express your use case. If the
> processing for individual dictionary is expensive then you can potentially
> use a reshuffle operation to distribute the updation of dictionary over
> multiple workers.
>
> Note: As you are going to make write API calls your self, in case of
> worker failure, your transform can be executed multiple times.
>
> On Mon, Jun 3, 2019 at 11:41 AM Anjana Pydi <an...@bahwancybertek.com>
> wrote:
>
>> Hi Ankur,
>>
>> Thanks for reply. Please find responses updated in below mail.
>>
>> Thanks,
>> Anjana
>> ------------------------------
>> *From:* Ankur Goenka [goenka@google.com]
>> *Sent:* Monday, June 03, 2019 11:01 AM
>> *To:* user@beam.apache.org
>> *Subject:* Re: How to build a beam python pipeline which does GET/POST
>> request to API's
>>
>> Thanks for providing more information.
>>
>> Some follow up questions/comments
>> 1. Call an API which would provide a dictionary as response.
>> Question: Do you need to make multiple of these API calls? If yes, what
>> distinguishes API call1 from call2? If its the input to the API, then can
>> you provide the inputs to in a file etc? What I am trying to identify is an
>> input source to the pipeline so that beam can distribute the work.
>> Answer : When an API call is made, it can provide a list of dictionaries
>> as response, we have to go through every dictionary, do the same
>> transformations for each and send it.
>> 2. Transform dictionary to add / remove few keys.
>> 3. Send transformed dictionary as JSON to an API which prints this JSON
>> as output.
>> Question: Are these write operation idempotent? As you are doing your own
>> api calls, its possible that after a failure, the calls are done again for
>> the same input. If write calls are not idempotent then their can be
>> duplicate data.
>> Answer : Suppose, if I receive a list of 1000 dictionaries as response
>> when I called API in point1, I should do only 1000 write operations
>> respectively to each input. If there is a failure for any input, only that
>> should not be posted and remaining should be posted successfully.
>>
>> On Sat, Jun 1, 2019 at 8:13 PM Anjana Pydi <an...@bahwancybertek.com>
>> wrote:
>>
>>> Hi Ankur,
>>>
>>> Thanks for the reply! Below is more details of the usecase:
>>>
>>> 1. Call an API which would provide a dictionary as response.
>>> 2. Transform dictionary to add / remove few keys.
>>> 3. Send transformed dictionary as JSON to an API which prints this JSON
>>> as output.
>>>
>>> Please let me know in case of any clarifications.
>>>
>>> Thanks,
>>> Anjana
>>> ------------------------------
>>> *From:* Ankur Goenka [goenka@google.com]
>>> *Sent:* Saturday, June 01, 2019 6:47 PM
>>> *To:* user@beam.apache.org
>>> *Subject:* Re: How to build a beam python pipeline which does GET/POST
>>> request to API's
>>>
>>> Hi Anjana,
>>>
>>> You can write your API logic in a ParDo and subsequently pass the
>>> elements to other ParDos to transform and eventually make an API call to to
>>> another endpoint.
>>>
>>> However, this might not be a good fit for Beam as the input is not well
>>> defined and hence scaling and "once processing" of elements will not be
>>> possible as their is no well defined input.
>>>
>>> It will be better to elaborate a bit more on the usecase for better
>>> suggestions.
>>>
>>> Thanks,
>>> Ankur
>>>
>>> On Sat, Jun 1, 2019 at 5:50 PM Anjana Pydi <an...@bahwancybertek.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a requirement to create an apache beam python pipeline to read a
>>>> JSON from an API endpoint, transform it (add/remove few fields)and send the
>>>> transformed JSON to another API endpoint.
>>>>
>>>> Can anyone please provide some suggestions on how to do it.
>>>>
>>>> Thanks,
>>>> Anjana
>>>> -----------------------------------------------------------------------------------------------------------------------
>>>> The information contained in this communication is intended solely for the
>>>> use of the individual or entity to whom it is addressed and others
>>>> authorized to receive it. It may contain confidential or legally privileged
>>>> information. If you are not the intended recipient you are hereby notified
>>>> that any disclosure, copying, distribution or taking any action in reliance
>>>> on the contents of this information is strictly prohibited and may be
>>>> unlawful. If you are not the intended recipient, please notify us
>>>> immediately by responding to this email and then delete it from your
>>>> system. Bahwan Cybertek is neither liable for the proper and complete
>>>> transmission of the information contained in this communication nor for any
>>>> delay in its receipt.
>>>>
>>> -----------------------------------------------------------------------------------------------------------------------
>>> The information contained in this communication is intended solely for the
>>> use of the individual or entity to whom it is addressed and others
>>> authorized to receive it. It may contain confidential or legally privileged
>>> information. If you are not the intended recipient you are hereby notified
>>> that any disclosure, copying, distribution or taking any action in reliance
>>> on the contents of this information is strictly prohibited and may be
>>> unlawful. If you are not the intended recipient, please notify us
>>> immediately by responding to this email and then delete it from your
>>> system. Bahwan Cybertek is neither liable for the proper and complete
>>> transmission of the information contained in this communication nor for any
>>> delay in its receipt.
>>>
>> -----------------------------------------------------------------------------------------------------------------------
>> The information contained in this communication is intended solely for the
>> use of the individual or entity to whom it is addressed and others
>> authorized to receive it. It may contain confidential or legally privileged
>> information. If you are not the intended recipient you are hereby notified
>> that any disclosure, copying, distribution or taking any action in reliance
>> on the contents of this information is strictly prohibited and may be
>> unlawful. If you are not the intended recipient, please notify us
>> immediately by responding to this email and then delete it from your
>> system. Bahwan Cybertek is neither liable for the proper and complete
>> transmission of the information contained in this communication nor for any
>> delay in its receipt.
>>
> -----------------------------------------------------------------------------------------------------------------------
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you are not the intended recipient, please notify us
> immediately by responding to this email and then delete it from your
> system. Bahwan Cybertek is neither liable for the proper and complete
> transmission of the information contained in this communication nor for any
> delay in its receipt.
>


-- 
Soliman ElSaber
Data Engineer
www.mindvalley.com

RE: [Sender Auth Failure] Re: How to build a beam python pipeline which does GET/POST request to API's

Posted by Anjana Pydi <an...@bahwancybertek.com>.
Hi Ankur,

Thanks for the suggestion.

Could you please provide me any examples if you know which are close to this use case.

Regards,
Anjana
________________________________
From: Ankur Goenka [goenka@google.com]
Sent: Monday, June 03, 2019 4:27 PM
To: user@beam.apache.org
Subject: [Sender Auth Failure] Re: How to build a beam python pipeline which does GET/POST request to API's

By looking at your usecase, the whole processing logic seems to be very custom.
I would recommend using ParDo's to express your use case. If the processing for individual dictionary is expensive then you can potentially use a reshuffle operation to distribute the updation of dictionary over multiple workers.

Note: As you are going to make write API calls your self, in case of worker failure, your transform can be executed multiple times.

On Mon, Jun 3, 2019 at 11:41 AM Anjana Pydi <an...@bahwancybertek.com>> wrote:
Hi Ankur,

Thanks for reply. Please find responses updated in below mail.

Thanks,
Anjana
________________________________
From: Ankur Goenka [goenka@google.com<ma...@google.com>]
Sent: Monday, June 03, 2019 11:01 AM
To: user@beam.apache.org<ma...@beam.apache.org>
Subject: Re: How to build a beam python pipeline which does GET/POST request to API's

Thanks for providing more information.

Some follow up questions/comments
1. Call an API which would provide a dictionary as response.
Question: Do you need to make multiple of these API calls? If yes, what distinguishes API call1 from call2? If its the input to the API, then can you provide the inputs to in a file etc? What I am trying to identify is an input source to the pipeline so that beam can distribute the work.
Answer : When an API call is made, it can provide a list of dictionaries as response, we have to go through every dictionary, do the same transformations for each and send it.
2. Transform dictionary to add / remove few keys.
3. Send transformed dictionary as JSON to an API which prints this JSON as output.
Question: Are these write operation idempotent? As you are doing your own api calls, its possible that after a failure, the calls are done again for the same input. If write calls are not idempotent then their can be duplicate data.
Answer : Suppose, if I receive a list of 1000 dictionaries as response when I called API in point1, I should do only 1000 write operations respectively to each input. If there is a failure for any input, only that should not be posted and remaining should be posted successfully.

On Sat, Jun 1, 2019 at 8:13 PM Anjana Pydi <an...@bahwancybertek.com>> wrote:
Hi Ankur,

Thanks for the reply! Below is more details of the usecase:

1. Call an API which would provide a dictionary as response.
2. Transform dictionary to add / remove few keys.
3. Send transformed dictionary as JSON to an API which prints this JSON as output.

Please let me know in case of any clarifications.

Thanks,
Anjana
________________________________
From: Ankur Goenka [goenka@google.com<ma...@google.com>]
Sent: Saturday, June 01, 2019 6:47 PM
To: user@beam.apache.org<ma...@beam.apache.org>
Subject: Re: How to build a beam python pipeline which does GET/POST request to API's

Hi Anjana,

You can write your API logic in a ParDo and subsequently pass the elements to other ParDos to transform and eventually make an API call to to another endpoint.

However, this might not be a good fit for Beam as the input is not well defined and hence scaling and "once processing" of elements will not be possible as their is no well defined input.

It will be better to elaborate a bit more on the usecase for better suggestions.

Thanks,
Ankur

On Sat, Jun 1, 2019 at 5:50 PM Anjana Pydi <an...@bahwancybertek.com>> wrote:
Hi,

I have a requirement to create an apache beam python pipeline to read a JSON from an API endpoint, transform it (add/remove few fields)and send the transformed JSON to another API endpoint.

Can anyone please provide some suggestions on how to do it.

Thanks,
Anjana
----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.

Re: How to build a beam python pipeline which does GET/POST request to API's

Posted by Ankur Goenka <go...@google.com>.
By looking at your usecase, the whole processing logic seems to be very
custom.
I would recommend using ParDo's to express your use case. If the processing
for individual dictionary is expensive then you can potentially use a
reshuffle operation to distribute the updation of dictionary over multiple
workers.

Note: As you are going to make write API calls your self, in case of worker
failure, your transform can be executed multiple times.

On Mon, Jun 3, 2019 at 11:41 AM Anjana Pydi <an...@bahwancybertek.com>
wrote:

> Hi Ankur,
>
> Thanks for reply. Please find responses updated in below mail.
>
> Thanks,
> Anjana
> ------------------------------
> *From:* Ankur Goenka [goenka@google.com]
> *Sent:* Monday, June 03, 2019 11:01 AM
> *To:* user@beam.apache.org
> *Subject:* Re: How to build a beam python pipeline which does GET/POST
> request to API's
>
> Thanks for providing more information.
>
> Some follow up questions/comments
> 1. Call an API which would provide a dictionary as response.
> Question: Do you need to make multiple of these API calls? If yes, what
> distinguishes API call1 from call2? If its the input to the API, then can
> you provide the inputs to in a file etc? What I am trying to identify is an
> input source to the pipeline so that beam can distribute the work.
> Answer : When an API call is made, it can provide a list of dictionaries
> as response, we have to go through every dictionary, do the same
> transformations for each and send it.
> 2. Transform dictionary to add / remove few keys.
> 3. Send transformed dictionary as JSON to an API which prints this JSON as
> output.
> Question: Are these write operation idempotent? As you are doing your own
> api calls, its possible that after a failure, the calls are done again for
> the same input. If write calls are not idempotent then their can be
> duplicate data.
> Answer : Suppose, if I receive a list of 1000 dictionaries as response
> when I called API in point1, I should do only 1000 write operations
> respectively to each input. If there is a failure for any input, only that
> should not be posted and remaining should be posted successfully.
>
> On Sat, Jun 1, 2019 at 8:13 PM Anjana Pydi <an...@bahwancybertek.com>
> wrote:
>
>> Hi Ankur,
>>
>> Thanks for the reply! Below is more details of the usecase:
>>
>> 1. Call an API which would provide a dictionary as response.
>> 2. Transform dictionary to add / remove few keys.
>> 3. Send transformed dictionary as JSON to an API which prints this JSON
>> as output.
>>
>> Please let me know in case of any clarifications.
>>
>> Thanks,
>> Anjana
>> ------------------------------
>> *From:* Ankur Goenka [goenka@google.com]
>> *Sent:* Saturday, June 01, 2019 6:47 PM
>> *To:* user@beam.apache.org
>> *Subject:* Re: How to build a beam python pipeline which does GET/POST
>> request to API's
>>
>> Hi Anjana,
>>
>> You can write your API logic in a ParDo and subsequently pass the
>> elements to other ParDos to transform and eventually make an API call to to
>> another endpoint.
>>
>> However, this might not be a good fit for Beam as the input is not well
>> defined and hence scaling and "once processing" of elements will not be
>> possible as their is no well defined input.
>>
>> It will be better to elaborate a bit more on the usecase for better
>> suggestions.
>>
>> Thanks,
>> Ankur
>>
>> On Sat, Jun 1, 2019 at 5:50 PM Anjana Pydi <an...@bahwancybertek.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have a requirement to create an apache beam python pipeline to read a
>>> JSON from an API endpoint, transform it (add/remove few fields)and send the
>>> transformed JSON to another API endpoint.
>>>
>>> Can anyone please provide some suggestions on how to do it.
>>>
>>> Thanks,
>>> Anjana
>>> -----------------------------------------------------------------------------------------------------------------------
>>> The information contained in this communication is intended solely for the
>>> use of the individual or entity to whom it is addressed and others
>>> authorized to receive it. It may contain confidential or legally privileged
>>> information. If you are not the intended recipient you are hereby notified
>>> that any disclosure, copying, distribution or taking any action in reliance
>>> on the contents of this information is strictly prohibited and may be
>>> unlawful. If you are not the intended recipient, please notify us
>>> immediately by responding to this email and then delete it from your
>>> system. Bahwan Cybertek is neither liable for the proper and complete
>>> transmission of the information contained in this communication nor for any
>>> delay in its receipt.
>>>
>> -----------------------------------------------------------------------------------------------------------------------
>> The information contained in this communication is intended solely for the
>> use of the individual or entity to whom it is addressed and others
>> authorized to receive it. It may contain confidential or legally privileged
>> information. If you are not the intended recipient you are hereby notified
>> that any disclosure, copying, distribution or taking any action in reliance
>> on the contents of this information is strictly prohibited and may be
>> unlawful. If you are not the intended recipient, please notify us
>> immediately by responding to this email and then delete it from your
>> system. Bahwan Cybertek is neither liable for the proper and complete
>> transmission of the information contained in this communication nor for any
>> delay in its receipt.
>>
> -----------------------------------------------------------------------------------------------------------------------
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you are not the intended recipient, please notify us
> immediately by responding to this email and then delete it from your
> system. Bahwan Cybertek is neither liable for the proper and complete
> transmission of the information contained in this communication nor for any
> delay in its receipt.
>

RE: How to build a beam python pipeline which does GET/POST request to API's

Posted by Anjana Pydi <an...@bahwancybertek.com>.
Hi Ankur,

Thanks for reply. Please find responses updated in below mail.

Thanks,
Anjana
________________________________
From: Ankur Goenka [goenka@google.com]
Sent: Monday, June 03, 2019 11:01 AM
To: user@beam.apache.org
Subject: Re: How to build a beam python pipeline which does GET/POST request to API's

Thanks for providing more information.

Some follow up questions/comments
1. Call an API which would provide a dictionary as response.
Question: Do you need to make multiple of these API calls? If yes, what distinguishes API call1 from call2? If its the input to the API, then can you provide the inputs to in a file etc? What I am trying to identify is an input source to the pipeline so that beam can distribute the work.
Answer : When an API call is made, it can provide a list of dictionaries as response, we have to go through every dictionary, do the same transformations for each and send it.
2. Transform dictionary to add / remove few keys.
3. Send transformed dictionary as JSON to an API which prints this JSON as output.
Question: Are these write operation idempotent? As you are doing your own api calls, its possible that after a failure, the calls are done again for the same input. If write calls are not idempotent then their can be duplicate data.
Answer : Suppose, if I receive a list of 1000 dictionaries as response when I called API in point1, I should do only 1000 write operations respectively to each input. If there is a failure for any input, only that should not be posted and remaining should be posted successfully.

On Sat, Jun 1, 2019 at 8:13 PM Anjana Pydi <an...@bahwancybertek.com>> wrote:
Hi Ankur,

Thanks for the reply! Below is more details of the usecase:

1. Call an API which would provide a dictionary as response.
2. Transform dictionary to add / remove few keys.
3. Send transformed dictionary as JSON to an API which prints this JSON as output.

Please let me know in case of any clarifications.

Thanks,
Anjana
________________________________
From: Ankur Goenka [goenka@google.com<ma...@google.com>]
Sent: Saturday, June 01, 2019 6:47 PM
To: user@beam.apache.org<ma...@beam.apache.org>
Subject: Re: How to build a beam python pipeline which does GET/POST request to API's

Hi Anjana,

You can write your API logic in a ParDo and subsequently pass the elements to other ParDos to transform and eventually make an API call to to another endpoint.

However, this might not be a good fit for Beam as the input is not well defined and hence scaling and "once processing" of elements will not be possible as their is no well defined input.

It will be better to elaborate a bit more on the usecase for better suggestions.

Thanks,
Ankur

On Sat, Jun 1, 2019 at 5:50 PM Anjana Pydi <an...@bahwancybertek.com>> wrote:
Hi,

I have a requirement to create an apache beam python pipeline to read a JSON from an API endpoint, transform it (add/remove few fields)and send the transformed JSON to another API endpoint.

Can anyone please provide some suggestions on how to do it.

Thanks,
Anjana
----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.

Re: How to build a beam python pipeline which does GET/POST request to API's

Posted by Ankur Goenka <go...@google.com>.
Thanks for providing more information.

Some follow up questions/comments
1. Call an API which would provide a dictionary as response.
Question: Do you need to make multiple of these API calls? If yes, what
distinguishes API call1 from call2? If its the input to the API, then can
you provide the inputs to in a file etc? What I am trying to identify is an
input source to the pipeline so that beam can distribute the work.
2. Transform dictionary to add / remove few keys.
3. Send transformed dictionary as JSON to an API which prints this JSON as
output.
Question: Are these write operation idempotent? As you are doing your own
api calls, its possible that after a failure, the calls are done again for
the same input. If write calls are not idempotent then their can be
duplicate data.

On Sat, Jun 1, 2019 at 8:13 PM Anjana Pydi <an...@bahwancybertek.com>
wrote:

> Hi Ankur,
>
> Thanks for the reply! Below is more details of the usecase:
>
> 1. Call an API which would provide a dictionary as response.
> 2. Transform dictionary to add / remove few keys.
> 3. Send transformed dictionary as JSON to an API which prints this JSON as
> output.
>
> Please let me know in case of any clarifications.
>
> Thanks,
> Anjana
> ------------------------------
> *From:* Ankur Goenka [goenka@google.com]
> *Sent:* Saturday, June 01, 2019 6:47 PM
> *To:* user@beam.apache.org
> *Subject:* Re: How to build a beam python pipeline which does GET/POST
> request to API's
>
> Hi Anjana,
>
> You can write your API logic in a ParDo and subsequently pass the elements
> to other ParDos to transform and eventually make an API call to to another
> endpoint.
>
> However, this might not be a good fit for Beam as the input is not well
> defined and hence scaling and "once processing" of elements will not be
> possible as their is no well defined input.
>
> It will be better to elaborate a bit more on the usecase for better
> suggestions.
>
> Thanks,
> Ankur
>
> On Sat, Jun 1, 2019 at 5:50 PM Anjana Pydi <an...@bahwancybertek.com>
> wrote:
>
>> Hi,
>>
>> I have a requirement to create an apache beam python pipeline to read a
>> JSON from an API endpoint, transform it (add/remove few fields)and send the
>> transformed JSON to another API endpoint.
>>
>> Can anyone please provide some suggestions on how to do it.
>>
>> Thanks,
>> Anjana
>> -----------------------------------------------------------------------------------------------------------------------
>> The information contained in this communication is intended solely for the
>> use of the individual or entity to whom it is addressed and others
>> authorized to receive it. It may contain confidential or legally privileged
>> information. If you are not the intended recipient you are hereby notified
>> that any disclosure, copying, distribution or taking any action in reliance
>> on the contents of this information is strictly prohibited and may be
>> unlawful. If you are not the intended recipient, please notify us
>> immediately by responding to this email and then delete it from your
>> system. Bahwan Cybertek is neither liable for the proper and complete
>> transmission of the information contained in this communication nor for any
>> delay in its receipt.
>>
> -----------------------------------------------------------------------------------------------------------------------
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you are not the intended recipient, please notify us
> immediately by responding to this email and then delete it from your
> system. Bahwan Cybertek is neither liable for the proper and complete
> transmission of the information contained in this communication nor for any
> delay in its receipt.
>

RE: How to build a beam python pipeline which does GET/POST request to API's

Posted by Anjana Pydi <an...@bahwancybertek.com>.
Hi Ankur,

Thanks for the reply! Below is more details of the usecase:

1. Call an API which would provide a dictionary as response.
2. Transform dictionary to add / remove few keys.
3. Send transformed dictionary as JSON to an API which prints this JSON as output.

Please let me know in case of any clarifications.

Thanks,
Anjana
________________________________
From: Ankur Goenka [goenka@google.com]
Sent: Saturday, June 01, 2019 6:47 PM
To: user@beam.apache.org
Subject: Re: How to build a beam python pipeline which does GET/POST request to API's

Hi Anjana,

You can write your API logic in a ParDo and subsequently pass the elements to other ParDos to transform and eventually make an API call to to another endpoint.

However, this might not be a good fit for Beam as the input is not well defined and hence scaling and "once processing" of elements will not be possible as their is no well defined input.

It will be better to elaborate a bit more on the usecase for better suggestions.

Thanks,
Ankur

On Sat, Jun 1, 2019 at 5:50 PM Anjana Pydi <an...@bahwancybertek.com>> wrote:
Hi,

I have a requirement to create an apache beam python pipeline to read a JSON from an API endpoint, transform it (add/remove few fields)and send the transformed JSON to another API endpoint.

Can anyone please provide some suggestions on how to do it.

Thanks,
Anjana
----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
----------------------------------------------------------------------------------------------------------------------- The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you are not the intended recipient, please notify us immediately by responding to this email and then delete it from your system. Bahwan Cybertek is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.

Re: How to build a beam python pipeline which does GET/POST request to API's

Posted by Ankur Goenka <go...@google.com>.
Hi Anjana,

You can write your API logic in a ParDo and subsequently pass the elements
to other ParDos to transform and eventually make an API call to to another
endpoint.

However, this might not be a good fit for Beam as the input is not well
defined and hence scaling and "once processing" of elements will not be
possible as their is no well defined input.

It will be better to elaborate a bit more on the usecase for better
suggestions.

Thanks,
Ankur

On Sat, Jun 1, 2019 at 5:50 PM Anjana Pydi <an...@bahwancybertek.com>
wrote:

> Hi,
>
> I have a requirement to create an apache beam python pipeline to read a
> JSON from an API endpoint, transform it (add/remove few fields)and send the
> transformed JSON to another API endpoint.
>
> Can anyone please provide some suggestions on how to do it.
>
> Thanks,
> Anjana
> -----------------------------------------------------------------------------------------------------------------------
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you are not the intended recipient, please notify us
> immediately by responding to this email and then delete it from your
> system. Bahwan Cybertek is neither liable for the proper and complete
> transmission of the information contained in this communication nor for any
> delay in its receipt.
>