You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by faaron zheng <fa...@gmail.com> on 2020/03/06 09:38:40 UTC

The parallelism of sink is always 1 in sqlUpdate

Hi all,

I am trying to use flink sql to run hive task. I use tEnv.sqlUpdate to
execute my sql which looks like "insert overtwrite ... select ...". But I
find the parallelism of sink is always 1, it's intolerable for large data.
Why it happens? Otherwise, Is there any guide to decide the memory of
taskmanager when I have two huge table to hashjoin, for example, each table
has several TB data?

Thanks,
Faaron

Re: The parallelism of sink is always 1 in sqlUpdate

Posted by Jingsong Li <ji...@gmail.com>.

Which sink do you use?
It depends on sink implementation like [1]

[1]
https://github.com/apache/flink/blob/2b13a4155fd4284f6092decba867e71eea058043/flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/table/sinks/CsvTableSink.java#L147

Best,
Jingsong Lee

On Fri, Mar 6, 2020 at 6:37 PM faaron zheng <fa...@gmail.com> wrote:

> Thanks for you attention.  The input of sink is 500, and there is no order
> by and limit.
>
> Jingsong Li <ji...@gmail.com> 于 2020年3月6日周五 下午6:15写道：
>
>> Hi faaron,
>>
>> For sink parallelism.
>> - What is parallelism of the input of sink? The sink parallelism should
>> be same.
>> - Does you sql have order by or limit ?
>> Flink batch sql not support range partition now, so it will use single
>> parallelism to run order by.
>>
>> For the memory of taskmanager.
>> There is manage memory option to configure.
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_setup.html#managed-memory
>>
>> Best,
>> Jingsong Lee
>>
>> On Fri, Mar 6, 2020 at 5:38 PM faaron zheng <fa...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I am trying to use flink sql to run hive task. I use tEnv.sqlUpdate to
>>> execute my sql which looks like "insert overtwrite ... select ...". But I
>>> find the parallelism of sink is always 1, it's intolerable for large data.
>>> Why it happens? Otherwise, Is there any guide to decide the memory of
>>> taskmanager when I have two huge table to hashjoin, for example, each table
>>> has several TB data?
>>>
>>> Thanks,
>>> Faaron
>>>
>>
>>
>> --
>> Best, Jingsong Lee
>>
>

-- 
Best, Jingsong Lee

Re: The parallelism of sink is always 1 in sqlUpdate

Posted by faaron zheng <fa...@gmail.com>.

Thanks for you attention.  The input of sink is 500, and there is no order
by and limit.

Jingsong Li <ji...@gmail.com> 于 2020年3月6日周五 下午6:15写道：

> Hi faaron,
>
> For sink parallelism.
> - What is parallelism of the input of sink? The sink parallelism should be
> same.
> - Does you sql have order by or limit ?
> Flink batch sql not support range partition now, so it will use single
> parallelism to run order by.
>
> For the memory of taskmanager.
> There is manage memory option to configure.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_setup.html#managed-memory
>
> Best,
> Jingsong Lee
>
> On Fri, Mar 6, 2020 at 5:38 PM faaron zheng <fa...@gmail.com> wrote:
>
>> Hi all,
>>
>> I am trying to use flink sql to run hive task. I use tEnv.sqlUpdate to
>> execute my sql which looks like "insert overtwrite ... select ...". But I
>> find the parallelism of sink is always 1, it's intolerable for large data.
>> Why it happens? Otherwise, Is there any guide to decide the memory of
>> taskmanager when I have two huge table to hashjoin, for example, each table
>> has several TB data?
>>
>> Thanks,
>> Faaron
>>
>
>
> --
> Best, Jingsong Lee
>

Re: The parallelism of sink is always 1 in sqlUpdate

Posted by Jingsong Li <ji...@gmail.com>.

Hi faaron,

For sink parallelism.
- What is parallelism of the input of sink? The sink parallelism should be
same.
- Does you sql have order by or limit ?
Flink batch sql not support range partition now, so it will use single
parallelism to run order by.

For the memory of taskmanager.
There is manage memory option to configure.

[1]
https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_setup.html#managed-memory

Best,
Jingsong Lee

On Fri, Mar 6, 2020 at 5:38 PM faaron zheng <fa...@gmail.com> wrote:

> Hi all,
>
> I am trying to use flink sql to run hive task. I use tEnv.sqlUpdate to
> execute my sql which looks like "insert overtwrite ... select ...". But I
> find the parallelism of sink is always 1, it's intolerable for large data.
> Why it happens? Otherwise, Is there any guide to decide the memory of
> taskmanager when I have two huge table to hashjoin, for example, each table
> has several TB data?
>
> Thanks,
> Faaron
>


-- 
Best, Jingsong Lee