You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@beam.apache.org by Yushu Yao <ya...@gmail.com> on 2022/04/01 06:22:11 UTC

Changing SQL within same Beam Job?

Hi Experts,

We have a dynamically configurable list of transformations to be performed
on an input stream. Each input element will be transformed by one of those
transformations. Each transformation can be expressed as a SQL.

Wondering can this be achieved by BeamSQL? Say, for each input, it
will look up which SQL to use and apply it only to this input?

With my limited knowledge of BeamSQL, I guess BeamSQL takes SQL and
converts it to a pipeline, which is in turn optimized for underlying
runners. So this question may be related to whether we can dynamically
reconfig a running pipeline in beam. (Like MergeHub+ in Akka?)

Thanks

Re: Changing SQL within same Beam Job?

Posted by Brian Hulette <bh...@google.com>.

Even if the set of SQL statements is known ahead of time it might be
problematic if it's very very large. How many unique statements are there?

On Fri, Apr 1, 2022 at 10:22 AM Robert Bradshaw <ro...@google.com> wrote:

> Yes, what you can do in this case is use a DoFn with multiple outputs
> [1] to partition your input into multiple distinct PCollections
> according to the type of transformation you want to run, and then
> apply the corresponding SqlTransform to each of these PCollections.
> You could then Flatten [2] all the processed outputs if you want to
> put them all in the same place.
>
> If the set of SQL statements is not known ahead of time, that would be
> more difficult. You could possibly leverage something like the CalcRel
> DoFn used in the SQL implementation (but this is limited to
> projections, no grouping or aggregations allowed).
>
> [1]
> https://beam.apache.org/documentation/programming-guide/#additional-outputs
> [2] https://beam.apache.org/documentation/programming-guide/#flatten
> [3]
> https://github.com/apache/beam/blob/release-2.37.0/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java#L242
>
> On Thu, Mar 31, 2022 at 11:22 PM Yushu Yao <ya...@gmail.com> wrote:
> >
> > Hi Experts,
> >
> > We have a dynamically configurable list of transformations to be
> performed on an input stream. Each input element will be transformed by one
> of those transformations. Each transformation can be expressed as a SQL.
> >
> > Wondering can this be achieved by BeamSQL? Say, for each input, it will
> look up which SQL to use and apply it only to this input?
> >
> > With my limited knowledge of BeamSQL, I guess BeamSQL takes SQL and
> converts it to a pipeline, which is in turn optimized for underlying
> runners. So this question may be related to whether we can dynamically
> reconfig a running pipeline in beam. (Like MergeHub+ in Akka?)
> >
> > Thanks
> >
>

Re: Changing SQL within same Beam Job?

Posted by Robert Bradshaw <ro...@google.com>.

Yes, what you can do in this case is use a DoFn with multiple outputs
[1] to partition your input into multiple distinct PCollections
according to the type of transformation you want to run, and then
apply the corresponding SqlTransform to each of these PCollections.
You could then Flatten [2] all the processed outputs if you want to
put them all in the same place.

If the set of SQL statements is not known ahead of time, that would be
more difficult. You could possibly leverage something like the CalcRel
DoFn used in the SQL implementation (but this is limited to
projections, no grouping or aggregations allowed).

[1] https://beam.apache.org/documentation/programming-guide/#additional-outputs
[2] https://beam.apache.org/documentation/programming-guide/#flatten
[3] https://github.com/apache/beam/blob/release-2.37.0/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java#L242

On Thu, Mar 31, 2022 at 11:22 PM Yushu Yao <ya...@gmail.com> wrote:
>
> Hi Experts,
>
> We have a dynamically configurable list of transformations to be performed on an input stream. Each input element will be transformed by one of those transformations. Each transformation can be expressed as a SQL.
>
> Wondering can this be achieved by BeamSQL? Say, for each input, it will look up which SQL to use and apply it only to this input?
>
> With my limited knowledge of BeamSQL, I guess BeamSQL takes SQL and converts it to a pipeline, which is in turn optimized for underlying runners. So this question may be related to whether we can dynamically reconfig a running pipeline in beam. (Like MergeHub+ in Akka?)
>
> Thanks
>