You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Chaim Turkel <ch...@behalf.com> on 2017/10/30 08:33:47 UTC

read source - MongoDbIO.read()

Hi,
   Is there a way to have some code run before the read?
I would like to check before how many records exists and based on this
have two different pipelines.
Currently this code is in the runner but since i have 20 tables this
takes a long time.
I would like to move the check into the pipeline -

any ideas?


chaim

Re: read source - MongoDbIO.read()

Posted by Chaim Turkel <ch...@behalf.com>.
thanks, and bigquery

On Mon, Oct 30, 2017 at 2:21 PM, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
> That's the evolution I'm proposing and I already implemented in some IO: readAll pattern. Let me check for mongo.
>
> On Oct 30, 2017, 12:00, at 12:00, Chaim Turkel <ch...@behalf.com> wrote:
>>I am syncing multiple tables from mongo to bigquery.
>>So i first check how many records there are, and then if there are
>>records a need to sync them, else i need to update the status table,
>>that there was nothing to sync. Also in the case that I do sync i need
>>to update the status table with information about the sync.
>>
>>Why can't the read start from a collection also?
>>chaim
>>
>>On Mon, Oct 30, 2017 at 12:13 PM, Jean-Baptiste Onofré
>><jb...@nanthrax.net> wrote:
>>> Can you describe your use case ? We can imagine to be able to define
>>a custom FN in the read. But I'm afraid it would be too specific.
>>>
>>> On Oct 30, 2017, 10:46, at 10:46, Chaim Turkel <ch...@behalf.com>
>>wrote:
>>>>any reason for this, there should be a way to run it from any point
>>>>
>>>>On Mon, Oct 30, 2017 at 11:24 AM, Jean-Baptiste Onofré
>>>><jb...@nanthrax.net> wrote:
>>>>> Hi
>>>>>
>>>>> No the pipeline starts with the read. You can always create your
>>own
>>>>custom read.
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> On Oct 30, 2017, 09:33, at 09:33, Chaim Turkel <ch...@behalf.com>
>>>>wrote:
>>>>>>Hi,
>>>>>>   Is there a way to have some code run before the read?
>>>>>>I would like to check before how many records exists and based on
>>>>this
>>>>>>have two different pipelines.
>>>>>>Currently this code is in the runner but since i have 20 tables
>>this
>>>>>>takes a long time.
>>>>>>I would like to move the check into the pipeline -
>>>>>>
>>>>>>any ideas?
>>>>>>
>>>>>>
>>>>>>chaim

Re: read source - MongoDbIO.read()

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
ReadAll is implemented in JDBC and Redis IOs.

I will add for on other IOs.

Regards
JB

On Oct 30, 2017, 13:39, at 13:39, Chaim Turkel <ch...@behalf.com> wrote:
>can you send me a link to the code?
>
>On Mon, Oct 30, 2017 at 2:21 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>wrote:
>> That's the evolution I'm proposing and I already implemented in some
>IO: readAll pattern. Let me check for mongo.
>>
>> On Oct 30, 2017, 12:00, at 12:00, Chaim Turkel <ch...@behalf.com>
>wrote:
>>>I am syncing multiple tables from mongo to bigquery.
>>>So i first check how many records there are, and then if there are
>>>records a need to sync them, else i need to update the status table,
>>>that there was nothing to sync. Also in the case that I do sync i
>need
>>>to update the status table with information about the sync.
>>>
>>>Why can't the read start from a collection also?
>>>chaim
>>>
>>>On Mon, Oct 30, 2017 at 12:13 PM, Jean-Baptiste Onofré
>>><jb...@nanthrax.net> wrote:
>>>> Can you describe your use case ? We can imagine to be able to
>define
>>>a custom FN in the read. But I'm afraid it would be too specific.
>>>>
>>>> On Oct 30, 2017, 10:46, at 10:46, Chaim Turkel <ch...@behalf.com>
>>>wrote:
>>>>>any reason for this, there should be a way to run it from any point
>>>>>
>>>>>On Mon, Oct 30, 2017 at 11:24 AM, Jean-Baptiste Onofré
>>>>><jb...@nanthrax.net> wrote:
>>>>>> Hi
>>>>>>
>>>>>> No the pipeline starts with the read. You can always create your
>>>own
>>>>>custom read.
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>> On Oct 30, 2017, 09:33, at 09:33, Chaim Turkel <ch...@behalf.com>
>>>>>wrote:
>>>>>>>Hi,
>>>>>>>   Is there a way to have some code run before the read?
>>>>>>>I would like to check before how many records exists and based on
>>>>>this
>>>>>>>have two different pipelines.
>>>>>>>Currently this code is in the runner but since i have 20 tables
>>>this
>>>>>>>takes a long time.
>>>>>>>I would like to move the check into the pipeline -
>>>>>>>
>>>>>>>any ideas?
>>>>>>>
>>>>>>>
>>>>>>>chaim

Re: read source - MongoDbIO.read()

Posted by Chaim Turkel <ch...@behalf.com>.
can you send me a link to the code?

On Mon, Oct 30, 2017 at 2:21 PM, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
> That's the evolution I'm proposing and I already implemented in some IO: readAll pattern. Let me check for mongo.
>
> On Oct 30, 2017, 12:00, at 12:00, Chaim Turkel <ch...@behalf.com> wrote:
>>I am syncing multiple tables from mongo to bigquery.
>>So i first check how many records there are, and then if there are
>>records a need to sync them, else i need to update the status table,
>>that there was nothing to sync. Also in the case that I do sync i need
>>to update the status table with information about the sync.
>>
>>Why can't the read start from a collection also?
>>chaim
>>
>>On Mon, Oct 30, 2017 at 12:13 PM, Jean-Baptiste Onofré
>><jb...@nanthrax.net> wrote:
>>> Can you describe your use case ? We can imagine to be able to define
>>a custom FN in the read. But I'm afraid it would be too specific.
>>>
>>> On Oct 30, 2017, 10:46, at 10:46, Chaim Turkel <ch...@behalf.com>
>>wrote:
>>>>any reason for this, there should be a way to run it from any point
>>>>
>>>>On Mon, Oct 30, 2017 at 11:24 AM, Jean-Baptiste Onofré
>>>><jb...@nanthrax.net> wrote:
>>>>> Hi
>>>>>
>>>>> No the pipeline starts with the read. You can always create your
>>own
>>>>custom read.
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> On Oct 30, 2017, 09:33, at 09:33, Chaim Turkel <ch...@behalf.com>
>>>>wrote:
>>>>>>Hi,
>>>>>>   Is there a way to have some code run before the read?
>>>>>>I would like to check before how many records exists and based on
>>>>this
>>>>>>have two different pipelines.
>>>>>>Currently this code is in the runner but since i have 20 tables
>>this
>>>>>>takes a long time.
>>>>>>I would like to move the check into the pipeline -
>>>>>>
>>>>>>any ideas?
>>>>>>
>>>>>>
>>>>>>chaim

Re: read source - MongoDbIO.read()

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
That's the evolution I'm proposing and I already implemented in some IO: readAll pattern. Let me check for mongo.

On Oct 30, 2017, 12:00, at 12:00, Chaim Turkel <ch...@behalf.com> wrote:
>I am syncing multiple tables from mongo to bigquery.
>So i first check how many records there are, and then if there are
>records a need to sync them, else i need to update the status table,
>that there was nothing to sync. Also in the case that I do sync i need
>to update the status table with information about the sync.
>
>Why can't the read start from a collection also?
>chaim
>
>On Mon, Oct 30, 2017 at 12:13 PM, Jean-Baptiste Onofré
><jb...@nanthrax.net> wrote:
>> Can you describe your use case ? We can imagine to be able to define
>a custom FN in the read. But I'm afraid it would be too specific.
>>
>> On Oct 30, 2017, 10:46, at 10:46, Chaim Turkel <ch...@behalf.com>
>wrote:
>>>any reason for this, there should be a way to run it from any point
>>>
>>>On Mon, Oct 30, 2017 at 11:24 AM, Jean-Baptiste Onofré
>>><jb...@nanthrax.net> wrote:
>>>> Hi
>>>>
>>>> No the pipeline starts with the read. You can always create your
>own
>>>custom read.
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On Oct 30, 2017, 09:33, at 09:33, Chaim Turkel <ch...@behalf.com>
>>>wrote:
>>>>>Hi,
>>>>>   Is there a way to have some code run before the read?
>>>>>I would like to check before how many records exists and based on
>>>this
>>>>>have two different pipelines.
>>>>>Currently this code is in the runner but since i have 20 tables
>this
>>>>>takes a long time.
>>>>>I would like to move the check into the pipeline -
>>>>>
>>>>>any ideas?
>>>>>
>>>>>
>>>>>chaim

Re: read source - MongoDbIO.read()

Posted by Chaim Turkel <ch...@behalf.com>.
I am syncing multiple tables from mongo to bigquery.
So i first check how many records there are, and then if there are
records a need to sync them, else i need to update the status table,
that there was nothing to sync. Also in the case that I do sync i need
to update the status table with information about the sync.

Why can't the read start from a collection also?
chaim

On Mon, Oct 30, 2017 at 12:13 PM, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
> Can you describe your use case ? We can imagine to be able to define a custom FN in the read. But I'm afraid it would be too specific.
>
> On Oct 30, 2017, 10:46, at 10:46, Chaim Turkel <ch...@behalf.com> wrote:
>>any reason for this, there should be a way to run it from any point
>>
>>On Mon, Oct 30, 2017 at 11:24 AM, Jean-Baptiste Onofré
>><jb...@nanthrax.net> wrote:
>>> Hi
>>>
>>> No the pipeline starts with the read. You can always create your own
>>custom read.
>>>
>>> Regards
>>> JB
>>>
>>> On Oct 30, 2017, 09:33, at 09:33, Chaim Turkel <ch...@behalf.com>
>>wrote:
>>>>Hi,
>>>>   Is there a way to have some code run before the read?
>>>>I would like to check before how many records exists and based on
>>this
>>>>have two different pipelines.
>>>>Currently this code is in the runner but since i have 20 tables this
>>>>takes a long time.
>>>>I would like to move the check into the pipeline -
>>>>
>>>>any ideas?
>>>>
>>>>
>>>>chaim

Re: read source - MongoDbIO.read()

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Can you describe your use case ? We can imagine to be able to define a custom FN in the read. But I'm afraid it would be too specific.

On Oct 30, 2017, 10:46, at 10:46, Chaim Turkel <ch...@behalf.com> wrote:
>any reason for this, there should be a way to run it from any point
>
>On Mon, Oct 30, 2017 at 11:24 AM, Jean-Baptiste Onofré
><jb...@nanthrax.net> wrote:
>> Hi
>>
>> No the pipeline starts with the read. You can always create your own
>custom read.
>>
>> Regards
>> JB
>>
>> On Oct 30, 2017, 09:33, at 09:33, Chaim Turkel <ch...@behalf.com>
>wrote:
>>>Hi,
>>>   Is there a way to have some code run before the read?
>>>I would like to check before how many records exists and based on
>this
>>>have two different pipelines.
>>>Currently this code is in the runner but since i have 20 tables this
>>>takes a long time.
>>>I would like to move the check into the pipeline -
>>>
>>>any ideas?
>>>
>>>
>>>chaim

Re: read source - MongoDbIO.read()

Posted by Chaim Turkel <ch...@behalf.com>.
any reason for this, there should be a way to run it from any point

On Mon, Oct 30, 2017 at 11:24 AM, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
> Hi
>
> No the pipeline starts with the read. You can always create your own custom read.
>
> Regards
> JB
>
> On Oct 30, 2017, 09:33, at 09:33, Chaim Turkel <ch...@behalf.com> wrote:
>>Hi,
>>   Is there a way to have some code run before the read?
>>I would like to check before how many records exists and based on this
>>have two different pipelines.
>>Currently this code is in the runner but since i have 20 tables this
>>takes a long time.
>>I would like to move the check into the pipeline -
>>
>>any ideas?
>>
>>
>>chaim

Re: read source - MongoDbIO.read()

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi

No the pipeline starts with the read. You can always create your own custom read.

Regards
JB

On Oct 30, 2017, 09:33, at 09:33, Chaim Turkel <ch...@behalf.com> wrote:
>Hi,
>   Is there a way to have some code run before the read?
>I would like to check before how many records exists and based on this
>have two different pipelines.
>Currently this code is in the runner but since i have 20 tables this
>takes a long time.
>I would like to move the check into the pipeline -
>
>any ideas?
>
>
>chaim