You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Adalbert Makarovych <am...@singlestore.com> on 2022/08/25 14:17:34 UTC

SingleStore IO

Hello,

I'm working on the SingleStore IO connector and would like to discuss it
with Beam developers.
It would be great if the connector can use SingleStore parallel read
<https://docs.singlestore.com/managed-service/en/query-data/query-procedures/read-query-results-in-parallel.html>.
In the ideal case, the connector should use Single-read mode as it is
faster than Multiple-read and consumes much less memory.

One of the problems is that in Single-read mode, each reader must initiate
its read query before any readers will receive data. Is it possible to
somehow configure Beam to start all DoFns at the same time? Or to get the
numbers of started DoFns at the runtime?

The other problem is that Single-read allows reading data from partition
only once, so if one reading thread failed - all others should be restarted
to retry. Is it possible to achieve this behavior? Or to at least
gracefully fail without additional retries?

Here are the first drafts of the design documentation
<https://docs.google.com/document/d/1WU-hkoZ93SaGXyOz_UtX0jXzIRl194hCId_IdmEV9jw/edit?usp=sharing>
.
I would appreciate any help with this stuff :)

-- 
Adalbert Makarovych
Software Engineer at SingleStore

<https://www.singlestore.com/customers/?utm_source=singlestore&utm_medium=email&utm_campaign=1-on-trustradius>

Re: SingleStore IO

Posted by Adalbert Makarovych <am...@singlestore.com>.
Thanks for the answer!

It would be great to have a call with you.
Here is a meeting invitation.

SingleStore Beam connector discussion
Wednesday, August 31 · 7:00 – 8:00pm
Google Meet joining info
Video call link: https://meet.google.com/uiq-btvt-tpw
Or dial: ‪(GB) +44 20 3956 6918‬ PIN: ‪226 926 316‬#
More phone numbers: https://tel.meet/uiq-btvt-tpw?pin=2285528438746

I don't know in what timezone you are, so please email if this time is not
suitable for you.

On Thu, Aug 25, 2022 at 7:33 AM John Casey via dev <de...@beam.apache.org>
wrote:

> Hi Adalbert,
>
> The nature of scheduling work with splittable DoFns is such that trying to
> start all splits at the same time isn't really supported. In addition, the
> general assumption of splitting work in Beam is that a split can be retried
> in isolation from other splits, which doesn't look supported by SingleStore
> parallel read.
>
> That said, this looks really promising, so I'd be happy to get on a call
> to help better understand your design, and see if we can find a solution.
>
> John
>
> On Thu, Aug 25, 2022 at 10:16 AM Adalbert Makarovych <
> amakarovych-ua@singlestore.com> wrote:
>
>> Hello,
>>
>> I'm working on the SingleStore IO connector and would like to discuss it
>> with Beam developers.
>> It would be great if the connector can use SingleStore parallel read
>> <https://docs.singlestore.com/managed-service/en/query-data/query-procedures/read-query-results-in-parallel.html>.
>> In the ideal case, the connector should use Single-read mode as it is
>> faster than Multiple-read and consumes much less memory.
>>
>> One of the problems is that in Single-read mode, each reader must
>> initiate its read query before any readers will receive data. Is it
>> possible to somehow configure Beam to start all DoFns at the same time? Or
>> to get the numbers of started DoFns at the runtime?
>>
>> The other problem is that Single-read allows reading data from partition
>> only once, so if one reading thread failed - all others should be restarted
>> to retry. Is it possible to achieve this behavior? Or to at least
>> gracefully fail without additional retries?
>>
>> Here are the first drafts of the design documentation
>> <https://docs.google.com/document/d/1WU-hkoZ93SaGXyOz_UtX0jXzIRl194hCId_IdmEV9jw/edit?usp=sharing>
>> .
>> I would appreciate any help with this stuff :)
>>
>> --
>> Adalbert Makarovych
>> Software Engineer at SingleStore
>>
>>
>> <https://www.singlestore.com/customers/?utm_source=singlestore&utm_medium=email&utm_campaign=1-on-trustradius>
>>
>

-- 
Adalbert Makarovych
Software Engineer at SingleStore

<https://www.singlestore.com/customers/?utm_source=singlestore&utm_medium=email&utm_campaign=1-on-trustradius>

Re: SingleStore IO

Posted by John Casey via dev <de...@beam.apache.org>.
Hi Adalbert,

The nature of scheduling work with splittable DoFns is such that trying to
start all splits at the same time isn't really supported. In addition, the
general assumption of splitting work in Beam is that a split can be retried
in isolation from other splits, which doesn't look supported by SingleStore
parallel read.

That said, this looks really promising, so I'd be happy to get on a call to
help better understand your design, and see if we can find a solution.

John

On Thu, Aug 25, 2022 at 10:16 AM Adalbert Makarovych <
amakarovych-ua@singlestore.com> wrote:

> Hello,
>
> I'm working on the SingleStore IO connector and would like to discuss it
> with Beam developers.
> It would be great if the connector can use SingleStore parallel read
> <https://docs.singlestore.com/managed-service/en/query-data/query-procedures/read-query-results-in-parallel.html>.
> In the ideal case, the connector should use Single-read mode as it is
> faster than Multiple-read and consumes much less memory.
>
> One of the problems is that in Single-read mode, each reader must initiate
> its read query before any readers will receive data. Is it possible to
> somehow configure Beam to start all DoFns at the same time? Or to get the
> numbers of started DoFns at the runtime?
>
> The other problem is that Single-read allows reading data from partition
> only once, so if one reading thread failed - all others should be restarted
> to retry. Is it possible to achieve this behavior? Or to at least
> gracefully fail without additional retries?
>
> Here are the first drafts of the design documentation
> <https://docs.google.com/document/d/1WU-hkoZ93SaGXyOz_UtX0jXzIRl194hCId_IdmEV9jw/edit?usp=sharing>
> .
> I would appreciate any help with this stuff :)
>
> --
> Adalbert Makarovych
> Software Engineer at SingleStore
>
>
> <https://www.singlestore.com/customers/?utm_source=singlestore&utm_medium=email&utm_campaign=1-on-trustradius>
>