You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user-zh@flink.apache.org by 111 <xi...@163.com> on 2020/03/21 03:01:47 UTC

回复： Flink SQL1.10 大表join如何优化？

Hi, wu:
好的，我这边观察下gc情况。
另外，我的sql里面有关联条件的，只是第一个表1400多万条，第二张表1000多万条。
| select 


  wte.external_user_id,

  wte.union_id,

  mr.fk_member_id as member_id

from a wte

left join b mr

 on wte.union_id = mr.relation_code

where wte.ods_date = '${today}'

limit 10;

|
我在ui里面可以看到任务也在正常运行，只是每秒输入700条左右，每秒输出1700，所以对比总量来说十分缓慢。


目前不太清楚性能的瓶颈点和优化的方向：
1 网络传输太慢，导致两表不能及时join？这里不知道如何排查，Metrics里面有个netty的相关指标，看不出什么；其他的指标除了hashjoin in和out缓慢变化，其他的都没有什么变化。
2 并行度过低，导致单点slot需要执行两个千万级表的关联？可否动态修改或者配置probe表的并行度？
3 JVM内存问题？详情见附件，观察内存还是很充足的，貌似垃圾回收有点频繁，是否有必要修改jvm配置？
4 taskmanager的日志不太理解….到build phase就停住了，是日志卡主了 还是 此时正在进行build的网络传输？
|
2020-03-21 09:23:14,732 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:14,738 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:14,744 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:14,750 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:14,756 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:14,762 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:14,772 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:14,779 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:16,357 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 14 ms for 65536 segments
2020-03-21 09:23:16,453 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 10 ms for 65536 segments
2020-03-21 09:23:16,478 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 9 ms for 65536 segments
2020-03-21 09:23:16,494 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 9 ms for 65536 segments
2020-03-21 09:23:16,509 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 10 ms for 65536 segments
2020-03-21 09:23:16,522 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 9 ms for 65536 segments
2020-03-21 09:23:16,539 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 9 ms for 65536 segments
2020-03-21 09:23:16,554 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 10 ms for 65536 segments
2020-03-21 09:23:16,574 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 9 ms for 65536 segments
2020-03-21 09:23:16,598 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 9 ms for 65536 segments
2020-03-21 09:23:16,611 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 10 ms for 65536 segments
2020-03-21 09:23:20,157 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 213 ms for 131072 segments
2020-03-21 09:23:21,579 INFO  org.apache.flink.table.runtime.operators.join.HashJoinOperator  - Finish build phase.
|




在2020年03月21日 10:31，Jark Wu<im...@gmail.com> 写道：
Hi,

看起来你的 join 没有等值关联条件，导致只能单并发运行。你可以观察下这个 join 节点的 gc 情况，看看是不是 full gc 导致运行缓慢。
关于 batch join，Jingsong 比我更熟悉一些调优手段，也许他能提供一些思路，cc @Jingsong Li
<ji...@gmail.com>

Best,
Jark

On Fri, 20 Mar 2020 at 17:56, 111 <xi...@163.com> wrote:



图片好像挂了：



https://picabstract-preview-ftn.weiyun.com/ftn_pic_abs_v3/93a8ac1299f8edd31aa93d69bd591dcc5b768e2c6f2d7a32ff3ac244040b1cac3e8afffd0daf92c4703c276fa1202361?pictype=scale&from=30113&version=3.3.3.3&uin=23603357&fname=F74D73D5-810B-4AE7-888C-E65BF787E490.png&size=750


在2020年03月20日 17:52，111<xi...@163.com> 写道：
您好：
我有两张表数据量都是1000多万条，需要针对两张表做join。
提交任务后，发现join十分缓慢，请问有什么调优的思路？
需要调整managed memory吗？

目前每个TaskManager申请的总内存是2g，每个taskManager上面有4个slot。taskmanager的metrics如下：
| {
"id":"container_e40_1555496777286_675191_01_000107",
"path":"akka.tcp://flink@hnode9:33156/user/taskmanager_0",
"dataPort":39423,
"timeSinceLastHeartbeat":1584697728127,
"slotsNumber":4,
"freeSlots":3,
"hardware":{
"cpuCores":32,
"physicalMemory":135355260928,
"freeMemory":749731840,
"managedMemory":732828804
},
"metrics":{
"heapUsed":261623760,
"heapCommitted":781189120,
"heapMax":781189120,
"nonHeapUsed":100441328,
"nonHeapCommitted":102957056,
"nonHeapMax":1426063360,
"directCount":5662,
"directUsed":191911352,
"directMax":191911350,
"mappedCount":0,
"mappedUsed":0,
"mappedMax":0,
"memorySegmentsAvailable":5582,
"memorySegmentsTotal":5591,
"garbageCollectors":[
{
"name":"PS_Scavenge",
"count":5734,
"time":19767
},
{
"name":"PS_MarkSweep",
"count":7,
"time":893
}
]
}
} |

回复： Flink SQL1.10 大表join如何优化？

Posted by 111 <xi...@163.com>.

Hi ，
非常感谢，问题解决了。调整并行度后，任务执行就很快了
（最主要的问题是我的数据内部有倾斜问题，在一个并行度的时候没有发现….增加并行度的时候问题就暴露出来了）


| |
xinghalo
|
|
xinghalo@163.com
|
签名由网易邮箱大师定制


在2020年03月23日 11:16，Jingsong Li<ji...@gmail.com> 写道：
只有source(包括和source chain起来的算子)的并行度是推断的，后续shuffle过后的节点都是依赖这个参数。

Best,
Jingsong Lee

On Mon, Mar 23, 2020 at 11:01 AM 111 <xi...@163.com> wrote:

Hi jingsong，
非常感谢，我以为这里的并行度是自动推断的， 没注意这个参数。我试试哈


| |
xinghalo
|
|
xinghalo@163.com
|
签名由网易邮箱大师定制


在2020年03月23日 09:59，Jingsong Li<ji...@gmail.com> 写道：
在[1]里的“configuration:”配table.exec.resource.default-parallelism

[1]

https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sqlClient.html#environment-files

On Mon, Mar 23, 2020 at 9:48 AM 111 <xi...@163.com> wrote:

Hi jingsong:
这里的并发是系统自动生产的，前面两张表都是通过sql-gateway，在一个session中创建出来的。所以到这里并行度都是1了...


| |
xinghalo
|
|
xinghalo@163.com
|
签名由网易邮箱大师定制


在2020年03月23日 09:33，Jingsong Li<ji...@gmail.com> 写道：
Hi,

看起来你的Join SQL是有Key等值条件的，所以它可以做分布式的Join。
但是你的并发为1，一般来说我们分布式的计算都不会设成1，不然就是单机运算了。

就像Kurt所说， 修改你的并发：
table.exec.resource.default-parallelism，比如设为50或100试试。

Best,
Jingsong Lee

On Sun, Mar 22, 2020 at 10:08 AM Kurt Young <yk...@gmail.com> wrote:

你的plan里除了source之外，其他所有节点都是在单并发运行，这对两张1000多万的表join来说是不够的，你可以尝试加大并发。

Best,
Kurt


On Sat, Mar 21, 2020 at 1:30 PM 111 <xi...@163.com> wrote:

Hi:
看了下源代码，了解了下Hybrid hash join。大致了解了瓶颈点：
Hybrid hash
join，会把build表（也就是我的右表）通过hash映射成map，并按照某种规则进行分区存储（有的在内存，超过的放入磁盘）。
目前看磁盘上的那部分join应该是整个任务的瓶颈。
具体调优方法，还在探索中...也许有什么配置可以控制build表内存存储的大小.
在2020年03月21日 11:01，111<xi...@163.com> 写道：
Hi, wu:
好的，我这边观察下gc情况。
另外，我的sql里面有关联条件的，只是第一个表1400多万条，第二张表1000多万条。
| select


wte.external_user_id,

wte.union_id,

mr.fk_member_id as member_id

from a wte

left join b mr

on wte.union_id = mr.relation_code

where wte.ods_date = '${today}'

limit 10;

|
我在ui里面可以看到任务也在正常运行，只是每秒输入700条左右，每秒输出1700，所以对比总量来说十分缓慢。


目前不太清楚性能的瓶颈点和优化的方向：
1
网络传输太慢，导致两表不能及时join？这里不知道如何排查，Metrics里面有个netty的相关指标，看不出什么；其他的指标除了hashjoin
in和out缓慢变化，其他的都没有什么变化。
2 并行度过低，导致单点slot需要执行两个千万级表的关联？可否动态修改或者配置probe表的并行度？
3 JVM内存问题？详情见附件，观察内存还是很充足的，貌似垃圾回收有点频繁，是否有必要修改jvm配置？
4 taskmanager的日志不太理解….到build phase就停住了，是日志卡主了 还是 此时正在进行build的网络传输？
|
2020-03-21 09:23:14,732 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,738 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,744 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,750 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,756 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,762 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,772 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,779 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:16,357 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 14 ms for 65536 segments
2020-03-21 09:23:16,453 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 10 ms for 65536 segments
2020-03-21 09:23:16,478 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,494 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,509 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 10 ms for 65536 segments
2020-03-21 09:23:16,522 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,539 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,554 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 10 ms for 65536 segments
2020-03-21 09:23:16,574 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,598 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,611 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 10 ms for 65536 segments
2020-03-21 09:23:20,157 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 213 ms for 131072 segments
2020-03-21 09:23:21,579 INFO
org.apache.flink.table.runtime.operators.join.HashJoinOperator  - Finish
build phase.
|




在2020年03月21日 10:31，Jark Wu<im...@gmail.com> 写道：
Hi,

看起来你的 join 没有等值关联条件，导致只能单并发运行。你可以观察下这个 join 节点的 gc 情况，看看是不是 full gc
导致运行缓慢。
关于 batch join，Jingsong 比我更熟悉一些调优手段，也许他能提供一些思路，cc @Jingsong Li
<ji...@gmail.com>

Best,
Jark

On Fri, 20 Mar 2020 at 17:56, 111 <xi...@163.com> wrote:



图片好像挂了：







https://picabstract-preview-ftn.weiyun.com/ftn_pic_abs_v3/93a8ac1299f8edd31aa93d69bd591dcc5b768e2c6f2d7a32ff3ac244040b1cac3e8afffd0daf92c4703c276fa1202361?pictype=scale&from=30113&version=3.3.3.3&uin=23603357&fname=F74D73D5-810B-4AE7-888C-E65BF787E490.png&size=750


在2020年03月20日 17:52，111<xi...@163.com> 写道：
您好：
我有两张表数据量都是1000多万条，需要针对两张表做join。
提交任务后，发现join十分缓慢，请问有什么调优的思路？
需要调整managed memory吗？

目前每个TaskManager申请的总内存是2g，每个taskManager上面有4个slot。taskmanager的metrics如下：
| {
"id":"container_e40_1555496777286_675191_01_000107",
"path":"akka.tcp://flink@hnode9:33156/user/taskmanager_0",
"dataPort":39423,
"timeSinceLastHeartbeat":1584697728127,
"slotsNumber":4,
"freeSlots":3,
"hardware":{
"cpuCores":32,
"physicalMemory":135355260928,
"freeMemory":749731840,
"managedMemory":732828804
},
"metrics":{
"heapUsed":261623760,
"heapCommitted":781189120,
"heapMax":781189120,
"nonHeapUsed":100441328,
"nonHeapCommitted":102957056,
"nonHeapMax":1426063360,
"directCount":5662,
"directUsed":191911352,
"directMax":191911350,
"mappedCount":0,
"mappedUsed":0,
"mappedMax":0,
"memorySegmentsAvailable":5582,
"memorySegmentsTotal":5591,
"garbageCollectors":[
{
"name":"PS_Scavenge",
"count":5734,
"time":19767
},
{
"name":"PS_MarkSweep",
"count":7,
"time":893
}
]
}
} |









--
Best, Jingsong Lee



--
Best, Jingsong Lee



--
Best, Jingsong Lee

Re: Flink SQL1.10 大表join如何优化？

Posted by Jingsong Li <ji...@gmail.com>.

只有source(包括和source chain起来的算子)的并行度是推断的，后续shuffle过后的节点都是依赖这个参数。

Best,
Jingsong Lee

On Mon, Mar 23, 2020 at 11:01 AM 111 <xi...@163.com> wrote:

> Hi jingsong，
> 非常感谢，我以为这里的并行度是自动推断的， 没注意这个参数。我试试哈
>
>
> | |
> xinghalo
> |
> |
> xinghalo@163.com
> |
> 签名由网易邮箱大师定制
>
>
> 在2020年03月23日 09:59，Jingsong Li<ji...@gmail.com> 写道：
> 在[1]里的“configuration:”配table.exec.resource.default-parallelism
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sqlClient.html#environment-files
>
> On Mon, Mar 23, 2020 at 9:48 AM 111 <xi...@163.com> wrote:
>
> Hi jingsong:
> 这里的并发是系统自动生产的，前面两张表都是通过sql-gateway，在一个session中创建出来的。所以到这里并行度都是1了...
>
>
> | |
> xinghalo
> |
> |
> xinghalo@163.com
> |
> 签名由网易邮箱大师定制
>
>
> 在2020年03月23日 09:33，Jingsong Li<ji...@gmail.com> 写道：
> Hi,
>
> 看起来你的Join SQL是有Key等值条件的，所以它可以做分布式的Join。
> 但是你的并发为1，一般来说我们分布式的计算都不会设成1，不然就是单机运算了。
>
> 就像Kurt所说， 修改你的并发：
> table.exec.resource.default-parallelism，比如设为50或100试试。
>
> Best,
> Jingsong Lee
>
> On Sun, Mar 22, 2020 at 10:08 AM Kurt Young <yk...@gmail.com> wrote:
>
> 你的plan里除了source之外，其他所有节点都是在单并发运行，这对两张1000多万的表join来说是不够的，你可以尝试加大并发。
>
> Best,
> Kurt
>
>
> On Sat, Mar 21, 2020 at 1:30 PM 111 <xi...@163.com> wrote:
>
> Hi:
> 看了下源代码，了解了下Hybrid hash join。大致了解了瓶颈点：
> Hybrid hash
> join，会把build表（也就是我的右表）通过hash映射成map，并按照某种规则进行分区存储（有的在内存，超过的放入磁盘）。
> 目前看磁盘上的那部分join应该是整个任务的瓶颈。
> 具体调优方法，还在探索中...也许有什么配置可以控制build表内存存储的大小.
> 在2020年03月21日 11:01，111<xi...@163.com> 写道：
> Hi, wu:
> 好的，我这边观察下gc情况。
> 另外，我的sql里面有关联条件的，只是第一个表1400多万条，第二张表1000多万条。
> | select
>
>
> wte.external_user_id,
>
> wte.union_id,
>
> mr.fk_member_id as member_id
>
> from a wte
>
> left join b mr
>
> on wte.union_id = mr.relation_code
>
> where wte.ods_date = '${today}'
>
> limit 10;
>
> |
> 我在ui里面可以看到任务也在正常运行，只是每秒输入700条左右，每秒输出1700，所以对比总量来说十分缓慢。
>
>
> 目前不太清楚性能的瓶颈点和优化的方向：
> 1
> 网络传输太慢，导致两表不能及时join？这里不知道如何排查，Metrics里面有个netty的相关指标，看不出什么；其他的指标除了hashjoin
> in和out缓慢变化，其他的都没有什么变化。
> 2 并行度过低，导致单点slot需要执行两个千万级表的关联？可否动态修改或者配置probe表的并行度？
> 3 JVM内存问题？详情见附件，观察内存还是很充足的，貌似垃圾回收有点频繁，是否有必要修改jvm配置？
> 4 taskmanager的日志不太理解….到build phase就停住了，是日志卡主了 还是 此时正在进行build的网络传输？
> |
> 2020-03-21 09:23:14,732 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,738 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,744 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,750 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,756 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,762 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,772 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,779 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:16,357 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 14 ms for 65536 segments
> 2020-03-21 09:23:16,453 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 10 ms for 65536 segments
> 2020-03-21 09:23:16,478 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,494 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,509 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 10 ms for 65536 segments
> 2020-03-21 09:23:16,522 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,539 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,554 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 10 ms for 65536 segments
> 2020-03-21 09:23:16,574 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,598 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,611 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 10 ms for 65536 segments
> 2020-03-21 09:23:20,157 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 213 ms for 131072 segments
> 2020-03-21 09:23:21,579 INFO
> org.apache.flink.table.runtime.operators.join.HashJoinOperator  - Finish
> build phase.
> |
>
>
>
>
> 在2020年03月21日 10:31，Jark Wu<im...@gmail.com> 写道：
> Hi,
>
> 看起来你的 join 没有等值关联条件，导致只能单并发运行。你可以观察下这个 join 节点的 gc 情况，看看是不是 full gc
> 导致运行缓慢。
> 关于 batch join，Jingsong 比我更熟悉一些调优手段，也许他能提供一些思路，cc @Jingsong Li
> <ji...@gmail.com>
>
> Best,
> Jark
>
> On Fri, 20 Mar 2020 at 17:56, 111 <xi...@163.com> wrote:
>
>
>
> 图片好像挂了：
>
>
>
>
>
>
>
> https://picabstract-preview-ftn.weiyun.com/ftn_pic_abs_v3/93a8ac1299f8edd31aa93d69bd591dcc5b768e2c6f2d7a32ff3ac244040b1cac3e8afffd0daf92c4703c276fa1202361?pictype=scale&from=30113&version=3.3.3.3&uin=23603357&fname=F74D73D5-810B-4AE7-888C-E65BF787E490.png&size=750
>
>
> 在2020年03月20日 17:52，111<xi...@163.com> 写道：
> 您好：
> 我有两张表数据量都是1000多万条，需要针对两张表做join。
> 提交任务后，发现join十分缓慢，请问有什么调优的思路？
> 需要调整managed memory吗？
>
> 目前每个TaskManager申请的总内存是2g，每个taskManager上面有4个slot。taskmanager的metrics如下：
> | {
> "id":"container_e40_1555496777286_675191_01_000107",
> "path":"akka.tcp://flink@hnode9:33156/user/taskmanager_0",
> "dataPort":39423,
> "timeSinceLastHeartbeat":1584697728127,
> "slotsNumber":4,
> "freeSlots":3,
> "hardware":{
> "cpuCores":32,
> "physicalMemory":135355260928,
> "freeMemory":749731840,
> "managedMemory":732828804
> },
> "metrics":{
> "heapUsed":261623760,
> "heapCommitted":781189120,
> "heapMax":781189120,
> "nonHeapUsed":100441328,
> "nonHeapCommitted":102957056,
> "nonHeapMax":1426063360,
> "directCount":5662,
> "directUsed":191911352,
> "directMax":191911350,
> "mappedCount":0,
> "mappedUsed":0,
> "mappedMax":0,
> "memorySegmentsAvailable":5582,
> "memorySegmentsTotal":5591,
> "garbageCollectors":[
> {
> "name":"PS_Scavenge",
> "count":5734,
> "time":19767
> },
> {
> "name":"PS_MarkSweep",
> "count":7,
> "time":893
> }
> ]
> }
> } |
>
>
>
>
>
>
>
>
>
> --
> Best, Jingsong Lee
>
>
>
> --
> Best, Jingsong Lee
>


-- 
Best, Jingsong Lee

回复： Flink SQL1.10 大表join如何优化？

Posted by 111 <xi...@163.com>.

Hi jingsong，
非常感谢，我以为这里的并行度是自动推断的， 没注意这个参数。我试试哈


| |
xinghalo
|
|
xinghalo@163.com
|
签名由网易邮箱大师定制


在2020年03月23日 09:59，Jingsong Li<ji...@gmail.com> 写道：
在[1]里的“configuration:”配table.exec.resource.default-parallelism

[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sqlClient.html#environment-files

On Mon, Mar 23, 2020 at 9:48 AM 111 <xi...@163.com> wrote:

Hi jingsong:
这里的并发是系统自动生产的，前面两张表都是通过sql-gateway，在一个session中创建出来的。所以到这里并行度都是1了...


| |
xinghalo
|
|
xinghalo@163.com
|
签名由网易邮箱大师定制


在2020年03月23日 09:33，Jingsong Li<ji...@gmail.com> 写道：
Hi,

看起来你的Join SQL是有Key等值条件的，所以它可以做分布式的Join。
但是你的并发为1，一般来说我们分布式的计算都不会设成1，不然就是单机运算了。

就像Kurt所说， 修改你的并发：
table.exec.resource.default-parallelism，比如设为50或100试试。

Best,
Jingsong Lee

On Sun, Mar 22, 2020 at 10:08 AM Kurt Young <yk...@gmail.com> wrote:

你的plan里除了source之外，其他所有节点都是在单并发运行，这对两张1000多万的表join来说是不够的，你可以尝试加大并发。

Best,
Kurt


On Sat, Mar 21, 2020 at 1:30 PM 111 <xi...@163.com> wrote:

Hi:
看了下源代码，了解了下Hybrid hash join。大致了解了瓶颈点：
Hybrid hash
join，会把build表（也就是我的右表）通过hash映射成map，并按照某种规则进行分区存储（有的在内存，超过的放入磁盘）。
目前看磁盘上的那部分join应该是整个任务的瓶颈。
具体调优方法，还在探索中...也许有什么配置可以控制build表内存存储的大小.
在2020年03月21日 11:01，111<xi...@163.com> 写道：
Hi, wu:
好的，我这边观察下gc情况。
另外，我的sql里面有关联条件的，只是第一个表1400多万条，第二张表1000多万条。
| select


wte.external_user_id,

wte.union_id,

mr.fk_member_id as member_id

from a wte

left join b mr

on wte.union_id = mr.relation_code

where wte.ods_date = '${today}'

limit 10;

|
我在ui里面可以看到任务也在正常运行，只是每秒输入700条左右，每秒输出1700，所以对比总量来说十分缓慢。


目前不太清楚性能的瓶颈点和优化的方向：
1
网络传输太慢，导致两表不能及时join？这里不知道如何排查，Metrics里面有个netty的相关指标，看不出什么；其他的指标除了hashjoin
in和out缓慢变化，其他的都没有什么变化。
2 并行度过低，导致单点slot需要执行两个千万级表的关联？可否动态修改或者配置probe表的并行度？
3 JVM内存问题？详情见附件，观察内存还是很充足的，貌似垃圾回收有点频繁，是否有必要修改jvm配置？
4 taskmanager的日志不太理解….到build phase就停住了，是日志卡主了 还是 此时正在进行build的网络传输？
|
2020-03-21 09:23:14,732 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,738 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,744 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,750 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,756 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,762 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,772 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,779 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:16,357 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 14 ms for 65536 segments
2020-03-21 09:23:16,453 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 10 ms for 65536 segments
2020-03-21 09:23:16,478 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,494 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,509 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 10 ms for 65536 segments
2020-03-21 09:23:16,522 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,539 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,554 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 10 ms for 65536 segments
2020-03-21 09:23:16,574 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,598 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,611 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 10 ms for 65536 segments
2020-03-21 09:23:20,157 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 213 ms for 131072 segments
2020-03-21 09:23:21,579 INFO
org.apache.flink.table.runtime.operators.join.HashJoinOperator  - Finish
build phase.
|




在2020年03月21日 10:31，Jark Wu<im...@gmail.com> 写道：
Hi,

看起来你的 join 没有等值关联条件，导致只能单并发运行。你可以观察下这个 join 节点的 gc 情况，看看是不是 full gc
导致运行缓慢。
关于 batch join，Jingsong 比我更熟悉一些调优手段，也许他能提供一些思路，cc @Jingsong Li
<ji...@gmail.com>

Best,
Jark

On Fri, 20 Mar 2020 at 17:56, 111 <xi...@163.com> wrote:



图片好像挂了：






https://picabstract-preview-ftn.weiyun.com/ftn_pic_abs_v3/93a8ac1299f8edd31aa93d69bd591dcc5b768e2c6f2d7a32ff3ac244040b1cac3e8afffd0daf92c4703c276fa1202361?pictype=scale&from=30113&version=3.3.3.3&uin=23603357&fname=F74D73D5-810B-4AE7-888C-E65BF787E490.png&size=750


在2020年03月20日 17:52，111<xi...@163.com> 写道：
您好：
我有两张表数据量都是1000多万条，需要针对两张表做join。
提交任务后，发现join十分缓慢，请问有什么调优的思路？
需要调整managed memory吗？

目前每个TaskManager申请的总内存是2g，每个taskManager上面有4个slot。taskmanager的metrics如下：
| {
"id":"container_e40_1555496777286_675191_01_000107",
"path":"akka.tcp://flink@hnode9:33156/user/taskmanager_0",
"dataPort":39423,
"timeSinceLastHeartbeat":1584697728127,
"slotsNumber":4,
"freeSlots":3,
"hardware":{
"cpuCores":32,
"physicalMemory":135355260928,
"freeMemory":749731840,
"managedMemory":732828804
},
"metrics":{
"heapUsed":261623760,
"heapCommitted":781189120,
"heapMax":781189120,
"nonHeapUsed":100441328,
"nonHeapCommitted":102957056,
"nonHeapMax":1426063360,
"directCount":5662,
"directUsed":191911352,
"directMax":191911350,
"mappedCount":0,
"mappedUsed":0,
"mappedMax":0,
"memorySegmentsAvailable":5582,
"memorySegmentsTotal":5591,
"garbageCollectors":[
{
"name":"PS_Scavenge",
"count":5734,
"time":19767
},
{
"name":"PS_MarkSweep",
"count":7,
"time":893
}
]
}
} |









--
Best, Jingsong Lee



--
Best, Jingsong Lee

Re: Flink SQL1.10 大表join如何优化？

Posted by Jingsong Li <ji...@gmail.com>.

在[1]里的“configuration:”配table.exec.resource.default-parallelism

[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sqlClient.html#environment-files

On Mon, Mar 23, 2020 at 9:48 AM 111 <xi...@163.com> wrote:

> Hi jingsong:
> 这里的并发是系统自动生产的，前面两张表都是通过sql-gateway，在一个session中创建出来的。所以到这里并行度都是1了...
>
>
> | |
> xinghalo
> |
> |
> xinghalo@163.com
> |
> 签名由网易邮箱大师定制
>
>
> 在2020年03月23日 09:33，Jingsong Li<ji...@gmail.com> 写道：
> Hi,
>
> 看起来你的Join SQL是有Key等值条件的，所以它可以做分布式的Join。
> 但是你的并发为1，一般来说我们分布式的计算都不会设成1，不然就是单机运算了。
>
> 就像Kurt所说， 修改你的并发：
> table.exec.resource.default-parallelism，比如设为50或100试试。
>
> Best,
> Jingsong Lee
>
> On Sun, Mar 22, 2020 at 10:08 AM Kurt Young <yk...@gmail.com> wrote:
>
> 你的plan里除了source之外，其他所有节点都是在单并发运行，这对两张1000多万的表join来说是不够的，你可以尝试加大并发。
>
> Best,
> Kurt
>
>
> On Sat, Mar 21, 2020 at 1:30 PM 111 <xi...@163.com> wrote:
>
> Hi:
> 看了下源代码，了解了下Hybrid hash join。大致了解了瓶颈点：
> Hybrid hash
> join，会把build表（也就是我的右表）通过hash映射成map，并按照某种规则进行分区存储（有的在内存，超过的放入磁盘）。
> 目前看磁盘上的那部分join应该是整个任务的瓶颈。
> 具体调优方法，还在探索中...也许有什么配置可以控制build表内存存储的大小.
> 在2020年03月21日 11:01，111<xi...@163.com> 写道：
> Hi, wu:
> 好的，我这边观察下gc情况。
> 另外，我的sql里面有关联条件的，只是第一个表1400多万条，第二张表1000多万条。
> | select
>
>
> wte.external_user_id,
>
> wte.union_id,
>
> mr.fk_member_id as member_id
>
> from a wte
>
> left join b mr
>
> on wte.union_id = mr.relation_code
>
> where wte.ods_date = '${today}'
>
> limit 10;
>
> |
> 我在ui里面可以看到任务也在正常运行，只是每秒输入700条左右，每秒输出1700，所以对比总量来说十分缓慢。
>
>
> 目前不太清楚性能的瓶颈点和优化的方向：
> 1
> 网络传输太慢，导致两表不能及时join？这里不知道如何排查，Metrics里面有个netty的相关指标，看不出什么；其他的指标除了hashjoin
> in和out缓慢变化，其他的都没有什么变化。
> 2 并行度过低，导致单点slot需要执行两个千万级表的关联？可否动态修改或者配置probe表的并行度？
> 3 JVM内存问题？详情见附件，观察内存还是很充足的，貌似垃圾回收有点频繁，是否有必要修改jvm配置？
> 4 taskmanager的日志不太理解….到build phase就停住了，是日志卡主了 还是 此时正在进行build的网络传输？
> |
> 2020-03-21 09:23:14,732 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,738 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,744 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,750 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,756 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,762 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,772 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,779 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:16,357 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 14 ms for 65536 segments
> 2020-03-21 09:23:16,453 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 10 ms for 65536 segments
> 2020-03-21 09:23:16,478 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,494 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,509 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 10 ms for 65536 segments
> 2020-03-21 09:23:16,522 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,539 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,554 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 10 ms for 65536 segments
> 2020-03-21 09:23:16,574 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,598 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,611 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 10 ms for 65536 segments
> 2020-03-21 09:23:20,157 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> take 213 ms for 131072 segments
> 2020-03-21 09:23:21,579 INFO
> org.apache.flink.table.runtime.operators.join.HashJoinOperator  - Finish
> build phase.
> |
>
>
>
>
> 在2020年03月21日 10:31，Jark Wu<im...@gmail.com> 写道：
> Hi,
>
> 看起来你的 join 没有等值关联条件，导致只能单并发运行。你可以观察下这个 join 节点的 gc 情况，看看是不是 full gc
> 导致运行缓慢。
> 关于 batch join，Jingsong 比我更熟悉一些调优手段，也许他能提供一些思路，cc @Jingsong Li
> <ji...@gmail.com>
>
> Best,
> Jark
>
> On Fri, 20 Mar 2020 at 17:56, 111 <xi...@163.com> wrote:
>
>
>
> 图片好像挂了：
>
>
>
>
>
>
> https://picabstract-preview-ftn.weiyun.com/ftn_pic_abs_v3/93a8ac1299f8edd31aa93d69bd591dcc5b768e2c6f2d7a32ff3ac244040b1cac3e8afffd0daf92c4703c276fa1202361?pictype=scale&from=30113&version=3.3.3.3&uin=23603357&fname=F74D73D5-810B-4AE7-888C-E65BF787E490.png&size=750
>
>
> 在2020年03月20日 17:52，111<xi...@163.com> 写道：
> 您好：
> 我有两张表数据量都是1000多万条，需要针对两张表做join。
> 提交任务后，发现join十分缓慢，请问有什么调优的思路？
> 需要调整managed memory吗？
>
> 目前每个TaskManager申请的总内存是2g，每个taskManager上面有4个slot。taskmanager的metrics如下：
> | {
> "id":"container_e40_1555496777286_675191_01_000107",
> "path":"akka.tcp://flink@hnode9:33156/user/taskmanager_0",
> "dataPort":39423,
> "timeSinceLastHeartbeat":1584697728127,
> "slotsNumber":4,
> "freeSlots":3,
> "hardware":{
> "cpuCores":32,
> "physicalMemory":135355260928,
> "freeMemory":749731840,
> "managedMemory":732828804
> },
> "metrics":{
> "heapUsed":261623760,
> "heapCommitted":781189120,
> "heapMax":781189120,
> "nonHeapUsed":100441328,
> "nonHeapCommitted":102957056,
> "nonHeapMax":1426063360,
> "directCount":5662,
> "directUsed":191911352,
> "directMax":191911350,
> "mappedCount":0,
> "mappedUsed":0,
> "mappedMax":0,
> "memorySegmentsAvailable":5582,
> "memorySegmentsTotal":5591,
> "garbageCollectors":[
> {
> "name":"PS_Scavenge",
> "count":5734,
> "time":19767
> },
> {
> "name":"PS_MarkSweep",
> "count":7,
> "time":893
> }
> ]
> }
> } |
>
>
>
>
>
>
>
>
>
> --
> Best, Jingsong Lee
>


-- 
Best, Jingsong Lee

回复： Flink SQL1.10 大表join如何优化？

Posted by 111 <xi...@163.com>.

Hi jingsong:
这里的并发是系统自动生产的，前面两张表都是通过sql-gateway，在一个session中创建出来的。所以到这里并行度都是1了...


| |
xinghalo
|
|
xinghalo@163.com
|
签名由网易邮箱大师定制


在2020年03月23日 09:33，Jingsong Li<ji...@gmail.com> 写道：
Hi,

看起来你的Join SQL是有Key等值条件的，所以它可以做分布式的Join。
但是你的并发为1，一般来说我们分布式的计算都不会设成1，不然就是单机运算了。

就像Kurt所说， 修改你的并发：
table.exec.resource.default-parallelism，比如设为50或100试试。

Best,
Jingsong Lee

On Sun, Mar 22, 2020 at 10:08 AM Kurt Young <yk...@gmail.com> wrote:

你的plan里除了source之外，其他所有节点都是在单并发运行，这对两张1000多万的表join来说是不够的，你可以尝试加大并发。

Best,
Kurt


On Sat, Mar 21, 2020 at 1:30 PM 111 <xi...@163.com> wrote:

Hi:
看了下源代码，了解了下Hybrid hash join。大致了解了瓶颈点：
Hybrid hash
join，会把build表（也就是我的右表）通过hash映射成map，并按照某种规则进行分区存储（有的在内存，超过的放入磁盘）。
目前看磁盘上的那部分join应该是整个任务的瓶颈。
具体调优方法，还在探索中...也许有什么配置可以控制build表内存存储的大小.
在2020年03月21日 11:01，111<xi...@163.com> 写道：
Hi, wu:
好的，我这边观察下gc情况。
另外，我的sql里面有关联条件的，只是第一个表1400多万条，第二张表1000多万条。
| select


wte.external_user_id,

wte.union_id,

mr.fk_member_id as member_id

from a wte

left join b mr

on wte.union_id = mr.relation_code

where wte.ods_date = '${today}'

limit 10;

|
我在ui里面可以看到任务也在正常运行，只是每秒输入700条左右，每秒输出1700，所以对比总量来说十分缓慢。


目前不太清楚性能的瓶颈点和优化的方向：
1
网络传输太慢，导致两表不能及时join？这里不知道如何排查，Metrics里面有个netty的相关指标，看不出什么；其他的指标除了hashjoin
in和out缓慢变化，其他的都没有什么变化。
2 并行度过低，导致单点slot需要执行两个千万级表的关联？可否动态修改或者配置probe表的并行度？
3 JVM内存问题？详情见附件，观察内存还是很充足的，貌似垃圾回收有点频繁，是否有必要修改jvm配置？
4 taskmanager的日志不太理解….到build phase就停住了，是日志卡主了 还是 此时正在进行build的网络传输？
|
2020-03-21 09:23:14,732 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,738 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,744 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,750 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,756 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,762 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,772 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:14,779 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 4 ms for 32768 segments
2020-03-21 09:23:16,357 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 14 ms for 65536 segments
2020-03-21 09:23:16,453 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 10 ms for 65536 segments
2020-03-21 09:23:16,478 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,494 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,509 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 10 ms for 65536 segments
2020-03-21 09:23:16,522 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,539 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,554 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 10 ms for 65536 segments
2020-03-21 09:23:16,574 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,598 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 9 ms for 65536 segments
2020-03-21 09:23:16,611 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 10 ms for 65536 segments
2020-03-21 09:23:20,157 INFO
org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
rehash
take 213 ms for 131072 segments
2020-03-21 09:23:21,579 INFO
org.apache.flink.table.runtime.operators.join.HashJoinOperator  - Finish
build phase.
|




在2020年03月21日 10:31，Jark Wu<im...@gmail.com> 写道：
Hi,

看起来你的 join 没有等值关联条件，导致只能单并发运行。你可以观察下这个 join 节点的 gc 情况，看看是不是 full gc
导致运行缓慢。
关于 batch join，Jingsong 比我更熟悉一些调优手段，也许他能提供一些思路，cc @Jingsong Li
<ji...@gmail.com>

Best,
Jark

On Fri, 20 Mar 2020 at 17:56, 111 <xi...@163.com> wrote:



图片好像挂了：





https://picabstract-preview-ftn.weiyun.com/ftn_pic_abs_v3/93a8ac1299f8edd31aa93d69bd591dcc5b768e2c6f2d7a32ff3ac244040b1cac3e8afffd0daf92c4703c276fa1202361?pictype=scale&from=30113&version=3.3.3.3&uin=23603357&fname=F74D73D5-810B-4AE7-888C-E65BF787E490.png&size=750


在2020年03月20日 17:52，111<xi...@163.com> 写道：
您好：
我有两张表数据量都是1000多万条，需要针对两张表做join。
提交任务后，发现join十分缓慢，请问有什么调优的思路？
需要调整managed memory吗？

目前每个TaskManager申请的总内存是2g，每个taskManager上面有4个slot。taskmanager的metrics如下：
| {
"id":"container_e40_1555496777286_675191_01_000107",
"path":"akka.tcp://flink@hnode9:33156/user/taskmanager_0",
"dataPort":39423,
"timeSinceLastHeartbeat":1584697728127,
"slotsNumber":4,
"freeSlots":3,
"hardware":{
"cpuCores":32,
"physicalMemory":135355260928,
"freeMemory":749731840,
"managedMemory":732828804
},
"metrics":{
"heapUsed":261623760,
"heapCommitted":781189120,
"heapMax":781189120,
"nonHeapUsed":100441328,
"nonHeapCommitted":102957056,
"nonHeapMax":1426063360,
"directCount":5662,
"directUsed":191911352,
"directMax":191911350,
"mappedCount":0,
"mappedUsed":0,
"mappedMax":0,
"memorySegmentsAvailable":5582,
"memorySegmentsTotal":5591,
"garbageCollectors":[
{
"name":"PS_Scavenge",
"count":5734,
"time":19767
},
{
"name":"PS_MarkSweep",
"count":7,
"time":893
}
]
}
} |









--
Best, Jingsong Lee

Re: Flink SQL1.10 大表join如何优化？

Posted by Jingsong Li <ji...@gmail.com>.

Hi,

看起来你的Join SQL是有Key等值条件的，所以它可以做分布式的Join。
但是你的并发为1，一般来说我们分布式的计算都不会设成1，不然就是单机运算了。

就像Kurt所说， 修改你的并发：
table.exec.resource.default-parallelism，比如设为50或100试试。

Best,
Jingsong Lee

On Sun, Mar 22, 2020 at 10:08 AM Kurt Young <yk...@gmail.com> wrote:

> 你的plan里除了source之外，其他所有节点都是在单并发运行，这对两张1000多万的表join来说是不够的，你可以尝试加大并发。
>
> Best,
> Kurt
>
>
> On Sat, Mar 21, 2020 at 1:30 PM 111 <xi...@163.com> wrote:
>
> > Hi:
> > 看了下源代码，了解了下Hybrid hash join。大致了解了瓶颈点：
> > Hybrid hash
> > join，会把build表（也就是我的右表）通过hash映射成map，并按照某种规则进行分区存储（有的在内存，超过的放入磁盘）。
> > 目前看磁盘上的那部分join应该是整个任务的瓶颈。
> > 具体调优方法，还在探索中...也许有什么配置可以控制build表内存存储的大小.
> > 在2020年03月21日 11:01，111<xi...@163.com> 写道：
> > Hi, wu:
> > 好的，我这边观察下gc情况。
> > 另外，我的sql里面有关联条件的，只是第一个表1400多万条，第二张表1000多万条。
> > | select
> >
> >
> >   wte.external_user_id,
> >
> >   wte.union_id,
> >
> >   mr.fk_member_id as member_id
> >
> > from a wte
> >
> > left join b mr
> >
> >  on wte.union_id = mr.relation_code
> >
> > where wte.ods_date = '${today}'
> >
> > limit 10;
> >
> > |
> > 我在ui里面可以看到任务也在正常运行，只是每秒输入700条左右，每秒输出1700，所以对比总量来说十分缓慢。
> >
> >
> > 目前不太清楚性能的瓶颈点和优化的方向：
> > 1
> > 网络传输太慢，导致两表不能及时join？这里不知道如何排查，Metrics里面有个netty的相关指标，看不出什么；其他的指标除了hashjoin
> > in和out缓慢变化，其他的都没有什么变化。
> > 2 并行度过低，导致单点slot需要执行两个千万级表的关联？可否动态修改或者配置probe表的并行度？
> > 3 JVM内存问题？详情见附件，观察内存还是很充足的，貌似垃圾回收有点频繁，是否有必要修改jvm配置？
> > 4 taskmanager的日志不太理解….到build phase就停住了，是日志卡主了 还是 此时正在进行build的网络传输？
> > |
> > 2020-03-21 09:23:14,732 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 4 ms for 32768 segments
> > 2020-03-21 09:23:14,738 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 4 ms for 32768 segments
> > 2020-03-21 09:23:14,744 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 4 ms for 32768 segments
> > 2020-03-21 09:23:14,750 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 4 ms for 32768 segments
> > 2020-03-21 09:23:14,756 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 4 ms for 32768 segments
> > 2020-03-21 09:23:14,762 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 4 ms for 32768 segments
> > 2020-03-21 09:23:14,772 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 4 ms for 32768 segments
> > 2020-03-21 09:23:14,779 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 4 ms for 32768 segments
> > 2020-03-21 09:23:16,357 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 14 ms for 65536 segments
> > 2020-03-21 09:23:16,453 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 10 ms for 65536 segments
> > 2020-03-21 09:23:16,478 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 9 ms for 65536 segments
> > 2020-03-21 09:23:16,494 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 9 ms for 65536 segments
> > 2020-03-21 09:23:16,509 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 10 ms for 65536 segments
> > 2020-03-21 09:23:16,522 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 9 ms for 65536 segments
> > 2020-03-21 09:23:16,539 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 9 ms for 65536 segments
> > 2020-03-21 09:23:16,554 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 10 ms for 65536 segments
> > 2020-03-21 09:23:16,574 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 9 ms for 65536 segments
> > 2020-03-21 09:23:16,598 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 9 ms for 65536 segments
> > 2020-03-21 09:23:16,611 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 10 ms for 65536 segments
> > 2020-03-21 09:23:20,157 INFO
> > org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The
> rehash
> > take 213 ms for 131072 segments
> > 2020-03-21 09:23:21,579 INFO
> > org.apache.flink.table.runtime.operators.join.HashJoinOperator  - Finish
> > build phase.
> > |
> >
> >
> >
> >
> > 在2020年03月21日 10:31，Jark Wu<im...@gmail.com> 写道：
> > Hi,
> >
> > 看起来你的 join 没有等值关联条件，导致只能单并发运行。你可以观察下这个 join 节点的 gc 情况，看看是不是 full gc
> 导致运行缓慢。
> > 关于 batch join，Jingsong 比我更熟悉一些调优手段，也许他能提供一些思路，cc @Jingsong Li
> > <ji...@gmail.com>
> >
> > Best,
> > Jark
> >
> > On Fri, 20 Mar 2020 at 17:56, 111 <xi...@163.com> wrote:
> >
> >
> >
> > 图片好像挂了：
> >
> >
> >
> >
> >
> https://picabstract-preview-ftn.weiyun.com/ftn_pic_abs_v3/93a8ac1299f8edd31aa93d69bd591dcc5b768e2c6f2d7a32ff3ac244040b1cac3e8afffd0daf92c4703c276fa1202361?pictype=scale&from=30113&version=3.3.3.3&uin=23603357&fname=F74D73D5-810B-4AE7-888C-E65BF787E490.png&size=750
> >
> >
> > 在2020年03月20日 17:52，111<xi...@163.com> 写道：
> > 您好：
> > 我有两张表数据量都是1000多万条，需要针对两张表做join。
> > 提交任务后，发现join十分缓慢，请问有什么调优的思路？
> > 需要调整managed memory吗？
> >
> > 目前每个TaskManager申请的总内存是2g，每个taskManager上面有4个slot。taskmanager的metrics如下：
> > | {
> > "id":"container_e40_1555496777286_675191_01_000107",
> > "path":"akka.tcp://flink@hnode9:33156/user/taskmanager_0",
> > "dataPort":39423,
> > "timeSinceLastHeartbeat":1584697728127,
> > "slotsNumber":4,
> > "freeSlots":3,
> > "hardware":{
> > "cpuCores":32,
> > "physicalMemory":135355260928,
> > "freeMemory":749731840,
> > "managedMemory":732828804
> > },
> > "metrics":{
> > "heapUsed":261623760,
> > "heapCommitted":781189120,
> > "heapMax":781189120,
> > "nonHeapUsed":100441328,
> > "nonHeapCommitted":102957056,
> > "nonHeapMax":1426063360,
> > "directCount":5662,
> > "directUsed":191911352,
> > "directMax":191911350,
> > "mappedCount":0,
> > "mappedUsed":0,
> > "mappedMax":0,
> > "memorySegmentsAvailable":5582,
> > "memorySegmentsTotal":5591,
> > "garbageCollectors":[
> > {
> > "name":"PS_Scavenge",
> > "count":5734,
> > "time":19767
> > },
> > {
> > "name":"PS_MarkSweep",
> > "count":7,
> > "time":893
> > }
> > ]
> > }
> > } |
> >
> >
> >
> >
> >
> >
>


-- 
Best, Jingsong Lee

Re: Flink SQL1.10 大表join如何优化？

Posted by Kurt Young <yk...@gmail.com>.

你的plan里除了source之外，其他所有节点都是在单并发运行，这对两张1000多万的表join来说是不够的，你可以尝试加大并发。

Best,
Kurt


On Sat, Mar 21, 2020 at 1:30 PM 111 <xi...@163.com> wrote:

> Hi:
> 看了下源代码，了解了下Hybrid hash join。大致了解了瓶颈点：
> Hybrid hash
> join，会把build表（也就是我的右表）通过hash映射成map，并按照某种规则进行分区存储（有的在内存，超过的放入磁盘）。
> 目前看磁盘上的那部分join应该是整个任务的瓶颈。
> 具体调优方法，还在探索中...也许有什么配置可以控制build表内存存储的大小.
> 在2020年03月21日 11:01，111<xi...@163.com> 写道：
> Hi, wu:
> 好的，我这边观察下gc情况。
> 另外，我的sql里面有关联条件的，只是第一个表1400多万条，第二张表1000多万条。
> | select
>
>
>   wte.external_user_id,
>
>   wte.union_id,
>
>   mr.fk_member_id as member_id
>
> from a wte
>
> left join b mr
>
>  on wte.union_id = mr.relation_code
>
> where wte.ods_date = '${today}'
>
> limit 10;
>
> |
> 我在ui里面可以看到任务也在正常运行，只是每秒输入700条左右，每秒输出1700，所以对比总量来说十分缓慢。
>
>
> 目前不太清楚性能的瓶颈点和优化的方向：
> 1
> 网络传输太慢，导致两表不能及时join？这里不知道如何排查，Metrics里面有个netty的相关指标，看不出什么；其他的指标除了hashjoin
> in和out缓慢变化，其他的都没有什么变化。
> 2 并行度过低，导致单点slot需要执行两个千万级表的关联？可否动态修改或者配置probe表的并行度？
> 3 JVM内存问题？详情见附件，观察内存还是很充足的，貌似垃圾回收有点频繁，是否有必要修改jvm配置？
> 4 taskmanager的日志不太理解….到build phase就停住了，是日志卡主了 还是 此时正在进行build的网络传输？
> |
> 2020-03-21 09:23:14,732 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,738 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,744 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,750 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,756 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,762 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,772 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:14,779 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 4 ms for 32768 segments
> 2020-03-21 09:23:16,357 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 14 ms for 65536 segments
> 2020-03-21 09:23:16,453 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 10 ms for 65536 segments
> 2020-03-21 09:23:16,478 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,494 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,509 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 10 ms for 65536 segments
> 2020-03-21 09:23:16,522 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,539 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,554 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 10 ms for 65536 segments
> 2020-03-21 09:23:16,574 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,598 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 9 ms for 65536 segments
> 2020-03-21 09:23:16,611 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 10 ms for 65536 segments
> 2020-03-21 09:23:20,157 INFO
> org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash
> take 213 ms for 131072 segments
> 2020-03-21 09:23:21,579 INFO
> org.apache.flink.table.runtime.operators.join.HashJoinOperator  - Finish
> build phase.
> |
>
>
>
>
> 在2020年03月21日 10:31，Jark Wu<im...@gmail.com> 写道：
> Hi,
>
> 看起来你的 join 没有等值关联条件，导致只能单并发运行。你可以观察下这个 join 节点的 gc 情况，看看是不是 full gc 导致运行缓慢。
> 关于 batch join，Jingsong 比我更熟悉一些调优手段，也许他能提供一些思路，cc @Jingsong Li
> <ji...@gmail.com>
>
> Best,
> Jark
>
> On Fri, 20 Mar 2020 at 17:56, 111 <xi...@163.com> wrote:
>
>
>
> 图片好像挂了：
>
>
>
>
> https://picabstract-preview-ftn.weiyun.com/ftn_pic_abs_v3/93a8ac1299f8edd31aa93d69bd591dcc5b768e2c6f2d7a32ff3ac244040b1cac3e8afffd0daf92c4703c276fa1202361?pictype=scale&from=30113&version=3.3.3.3&uin=23603357&fname=F74D73D5-810B-4AE7-888C-E65BF787E490.png&size=750
>
>
> 在2020年03月20日 17:52，111<xi...@163.com> 写道：
> 您好：
> 我有两张表数据量都是1000多万条，需要针对两张表做join。
> 提交任务后，发现join十分缓慢，请问有什么调优的思路？
> 需要调整managed memory吗？
>
> 目前每个TaskManager申请的总内存是2g，每个taskManager上面有4个slot。taskmanager的metrics如下：
> | {
> "id":"container_e40_1555496777286_675191_01_000107",
> "path":"akka.tcp://flink@hnode9:33156/user/taskmanager_0",
> "dataPort":39423,
> "timeSinceLastHeartbeat":1584697728127,
> "slotsNumber":4,
> "freeSlots":3,
> "hardware":{
> "cpuCores":32,
> "physicalMemory":135355260928,
> "freeMemory":749731840,
> "managedMemory":732828804
> },
> "metrics":{
> "heapUsed":261623760,
> "heapCommitted":781189120,
> "heapMax":781189120,
> "nonHeapUsed":100441328,
> "nonHeapCommitted":102957056,
> "nonHeapMax":1426063360,
> "directCount":5662,
> "directUsed":191911352,
> "directMax":191911350,
> "mappedCount":0,
> "mappedUsed":0,
> "mappedMax":0,
> "memorySegmentsAvailable":5582,
> "memorySegmentsTotal":5591,
> "garbageCollectors":[
> {
> "name":"PS_Scavenge",
> "count":5734,
> "time":19767
> },
> {
> "name":"PS_MarkSweep",
> "count":7,
> "time":893
> }
> ]
> }
> } |
>
>
>
>
>
>

回复： Flink SQL1.10 大表join如何优化？

Posted by 111 <xi...@163.com>.

Hi:
看了下源代码，了解了下Hybrid hash join。大致了解了瓶颈点：
Hybrid hash join，会把build表（也就是我的右表）通过hash映射成map，并按照某种规则进行分区存储（有的在内存，超过的放入磁盘）。
目前看磁盘上的那部分join应该是整个任务的瓶颈。
具体调优方法，还在探索中...也许有什么配置可以控制build表内存存储的大小.
在2020年03月21日 11:01，111<xi...@163.com> 写道：
Hi, wu:
好的，我这边观察下gc情况。
另外，我的sql里面有关联条件的，只是第一个表1400多万条，第二张表1000多万条。
| select 


  wte.external_user_id,

  wte.union_id,

  mr.fk_member_id as member_id

from a wte

left join b mr

 on wte.union_id = mr.relation_code

where wte.ods_date = '${today}'

limit 10;

|
我在ui里面可以看到任务也在正常运行，只是每秒输入700条左右，每秒输出1700，所以对比总量来说十分缓慢。


目前不太清楚性能的瓶颈点和优化的方向：
1 网络传输太慢，导致两表不能及时join？这里不知道如何排查，Metrics里面有个netty的相关指标，看不出什么；其他的指标除了hashjoin in和out缓慢变化，其他的都没有什么变化。
2 并行度过低，导致单点slot需要执行两个千万级表的关联？可否动态修改或者配置probe表的并行度？
3 JVM内存问题？详情见附件，观察内存还是很充足的，貌似垃圾回收有点频繁，是否有必要修改jvm配置？
4 taskmanager的日志不太理解….到build phase就停住了，是日志卡主了 还是 此时正在进行build的网络传输？
|
2020-03-21 09:23:14,732 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:14,738 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:14,744 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:14,750 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:14,756 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:14,762 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:14,772 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:14,779 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 4 ms for 32768 segments
2020-03-21 09:23:16,357 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 14 ms for 65536 segments
2020-03-21 09:23:16,453 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 10 ms for 65536 segments
2020-03-21 09:23:16,478 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 9 ms for 65536 segments
2020-03-21 09:23:16,494 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 9 ms for 65536 segments
2020-03-21 09:23:16,509 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 10 ms for 65536 segments
2020-03-21 09:23:16,522 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 9 ms for 65536 segments
2020-03-21 09:23:16,539 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 9 ms for 65536 segments
2020-03-21 09:23:16,554 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 10 ms for 65536 segments
2020-03-21 09:23:16,574 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 9 ms for 65536 segments
2020-03-21 09:23:16,598 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 9 ms for 65536 segments
2020-03-21 09:23:16,611 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 10 ms for 65536 segments
2020-03-21 09:23:20,157 INFO  org.apache.flink.table.runtime.hashtable.BinaryHashBucketArea  - The rehash take 213 ms for 131072 segments
2020-03-21 09:23:21,579 INFO  org.apache.flink.table.runtime.operators.join.HashJoinOperator  - Finish build phase.
|




在2020年03月21日 10:31，Jark Wu<im...@gmail.com> 写道：
Hi,

看起来你的 join 没有等值关联条件，导致只能单并发运行。你可以观察下这个 join 节点的 gc 情况，看看是不是 full gc 导致运行缓慢。
关于 batch join，Jingsong 比我更熟悉一些调优手段，也许他能提供一些思路，cc @Jingsong Li
<ji...@gmail.com>

Best,
Jark

On Fri, 20 Mar 2020 at 17:56, 111 <xi...@163.com> wrote:



图片好像挂了：



https://picabstract-preview-ftn.weiyun.com/ftn_pic_abs_v3/93a8ac1299f8edd31aa93d69bd591dcc5b768e2c6f2d7a32ff3ac244040b1cac3e8afffd0daf92c4703c276fa1202361?pictype=scale&from=30113&version=3.3.3.3&uin=23603357&fname=F74D73D5-810B-4AE7-888C-E65BF787E490.png&size=750


在2020年03月20日 17:52，111<xi...@163.com> 写道：
您好：
我有两张表数据量都是1000多万条，需要针对两张表做join。
提交任务后，发现join十分缓慢，请问有什么调优的思路？
需要调整managed memory吗？

目前每个TaskManager申请的总内存是2g，每个taskManager上面有4个slot。taskmanager的metrics如下：
| {
"id":"container_e40_1555496777286_675191_01_000107",
"path":"akka.tcp://flink@hnode9:33156/user/taskmanager_0",
"dataPort":39423,
"timeSinceLastHeartbeat":1584697728127,
"slotsNumber":4,
"freeSlots":3,
"hardware":{
"cpuCores":32,
"physicalMemory":135355260928,
"freeMemory":749731840,
"managedMemory":732828804
},
"metrics":{
"heapUsed":261623760,
"heapCommitted":781189120,
"heapMax":781189120,
"nonHeapUsed":100441328,
"nonHeapCommitted":102957056,
"nonHeapMax":1426063360,
"directCount":5662,
"directUsed":191911352,
"directMax":191911350,
"mappedCount":0,
"mappedUsed":0,
"mappedMax":0,
"memorySegmentsAvailable":5582,
"memorySegmentsTotal":5591,
"garbageCollectors":[
{
"name":"PS_Scavenge",
"count":5734,
"time":19767
},
{
"name":"PS_MarkSweep",
"count":7,
"time":893
}
]
}
} |