You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Yuval Itzchakov <yu...@gmail.com> on 2021/08/18 10:51:17 UTC

Validating Flink SQL without registering with StreamTableEnvironment

Hi,

I have a use-case where I need to validate hundreds of Flink SQL queries.
Ideally, I'd like to run these validations in parallel. But, given that
there's an issue with Calcite and the use of thread-local storage, I can
only interact with the table runtime via a single thread.

Ideally, I don't really care about the overall registration process of
sources, transformations and sinks, I just want to make sure the syntax is
correct from Flinks perspective.

Is there any straightforward way of doing this?

-- 
Best Regards,
Yuval Itzchakov.

Re: Validating Flink SQL without registering with StreamTableEnvironment

Posted by Ingo Bürk <in...@ververica.com>.
Hi Yuval,

I can expand a bit more on the technical side of validation, though as a
heads-up, I don't have a solution.

When validating entire pipelines on a logical level, you run into the
(maybe obvious) issue, that statements depend on previous statements. In
the simple case of a CREATE TABLE DDL followed by some query, ("full")
validation of the query depends on the table actually existing. On the
other hand, validating a CREATE TABLE DDL shouldn't actually execute that
DDL, creating a conflict.

Of course this is only a concern if during validation we care about the
table existing, but from the perspective of syntax this wouldn't matter.
However, Flink's parser (ParserImpl) under the hood calls
SqlToOperationConverter, which in some places does table lookups etc., so
it depends on the catalog manager. This prevents us from doing this kind of
validation. Ideally, SqlToOperationConverter would not have such a
dependency, but it takes some work to change that as operations would have
to be redesigned and "evaluated" later on.

I think, as of now, you'd have to actually use the CalciteParser directly
to bypass this call, but of course this is not accessible
(non-reflectively). I've also never tried this, so I don't know whether it
would actually work. It'd definitely be missing the ability to parse
anything handled in Flink's "extended parser" right now, but that is mostly
concerning SQL-client-specific syntax.


Best
Ingo

On Wed, Aug 18, 2021 at 2:41 PM Yuval Itzchakov <yu...@gmail.com> wrote:

> Thanks Ingo!
> I just discovered this a short while before you posted :)
>
> Ideally, I'd like to validate that the entire pipeline is set up
> correctly. The problem is that I can't use methods like `tableEnv.sqlQuery`
> from multiple threads, and this is really limiting my ability to speed up
> the process (today it takes over an hour to complete, which isn't
> reasonable).
>
> If anyone has any suggestions on how I can still leverage the
> TableEnvironment in the processor to validate my SQL queries I'd be happy
> to know.
>
> On Wed, Aug 18, 2021 at 2:37 PM Ingo Bürk <in...@ververica.com> wrote:
>
>> Hi Yuval,
>>
>> if syntactical correctness is all you care about, parsing the SQL should
>> suffice. You can get a hold of the parser from
>> TableEnvironmentImpl#getParser and then run #parse. This will require you
>> to cast your table environment to the (internal) implementation, but maybe
>> this works for you?
>>
>>
>> Best
>> Ingo
>>
>> On Wed, Aug 18, 2021 at 12:51 PM Yuval Itzchakov <yu...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have a use-case where I need to validate hundreds of Flink SQL
>>> queries. Ideally, I'd like to run these validations in parallel. But, given
>>> that there's an issue with Calcite and the use of thread-local storage, I
>>> can only interact with the table runtime via a single thread.
>>>
>>> Ideally, I don't really care about the overall registration process of
>>> sources, transformations and sinks, I just want to make sure the syntax is
>>> correct from Flinks perspective.
>>>
>>> Is there any straightforward way of doing this?
>>>
>>> --
>>> Best Regards,
>>> Yuval Itzchakov.
>>>
>>
>
> --
> Best Regards,
> Yuval Itzchakov.
>

Re: Validating Flink SQL without registering with StreamTableEnvironment

Posted by Yuval Itzchakov <yu...@gmail.com>.
Thanks Ingo!
I just discovered this a short while before you posted :)

Ideally, I'd like to validate that the entire pipeline is set up correctly.
The problem is that I can't use methods like `tableEnv.sqlQuery` from
multiple threads, and this is really limiting my ability to speed up the
process (today it takes over an hour to complete, which isn't reasonable).

If anyone has any suggestions on how I can still leverage the
TableEnvironment in the processor to validate my SQL queries I'd be happy
to know.

On Wed, Aug 18, 2021 at 2:37 PM Ingo Bürk <in...@ververica.com> wrote:

> Hi Yuval,
>
> if syntactical correctness is all you care about, parsing the SQL should
> suffice. You can get a hold of the parser from
> TableEnvironmentImpl#getParser and then run #parse. This will require you
> to cast your table environment to the (internal) implementation, but maybe
> this works for you?
>
>
> Best
> Ingo
>
> On Wed, Aug 18, 2021 at 12:51 PM Yuval Itzchakov <yu...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a use-case where I need to validate hundreds of Flink SQL queries.
>> Ideally, I'd like to run these validations in parallel. But, given that
>> there's an issue with Calcite and the use of thread-local storage, I can
>> only interact with the table runtime via a single thread.
>>
>> Ideally, I don't really care about the overall registration process of
>> sources, transformations and sinks, I just want to make sure the syntax is
>> correct from Flinks perspective.
>>
>> Is there any straightforward way of doing this?
>>
>> --
>> Best Regards,
>> Yuval Itzchakov.
>>
>

-- 
Best Regards,
Yuval Itzchakov.

Re: Validating Flink SQL without registering with StreamTableEnvironment

Posted by Ingo Bürk <in...@ververica.com>.
Hi Yuval,

if syntactical correctness is all you care about, parsing the SQL should
suffice. You can get a hold of the parser from
TableEnvironmentImpl#getParser and then run #parse. This will require you
to cast your table environment to the (internal) implementation, but maybe
this works for you?


Best
Ingo

On Wed, Aug 18, 2021 at 12:51 PM Yuval Itzchakov <yu...@gmail.com> wrote:

> Hi,
>
> I have a use-case where I need to validate hundreds of Flink SQL queries.
> Ideally, I'd like to run these validations in parallel. But, given that
> there's an issue with Calcite and the use of thread-local storage, I can
> only interact with the table runtime via a single thread.
>
> Ideally, I don't really care about the overall registration process of
> sources, transformations and sinks, I just want to make sure the syntax is
> correct from Flinks perspective.
>
> Is there any straightforward way of doing this?
>
> --
> Best Regards,
> Yuval Itzchakov.
>