You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user-zh@flink.apache.org by Husky Zeng <56...@qq.com> on 2020/10/30 02:22:54 UTC

讨论分析：数据类型对于shuffle时数据传输IO速度的影响（数十倍的差距）

Hi all，

在使用flink的shuffle功能时，我发现在operator chain中不同的位置进行shuffle，IO速度有非常明显的差距。

比如我的这个例子：

source-->cal--->sort--->SinkConversionToRow--->sink

从hive读数据，计算，排序，转化为外部类型行，写入hive。

当我把shuffle加到cal和sort中间时，

source-->cal-- （rebalance）->sort--->SinkConversionToRow--->sink

shuffle的数据传输IO速度是3G/s

当我把shuffle加到SinkConversionToRow和sink中间时，

source-->cal-- ->sort--->SinkConversionToRow--（rebalance）-->sink

shuffle的数据传输IO速度是0.1G/s


足足差了30倍！


我猜测这是由于SinkConversionToRow将数据转化为了外部格式，外部格式传输速度慢，内部格式传输速度快。

但是为什么差距这么大?  内部格式如何做到传输速度这么快，外部格式又为什么传输速度这么慢？

SinkConversionToRow代码位置：
org.apache.flink.table.planner.plan.nodes.physical.batch.BatchExecSink#translateToTransformation



--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 讨论分析：数据类型对于shuffle时数据传输IO速度的影响（数十倍的差距）

Posted by 赵一旦 <hi...@gmail.com>.

我只看到了你byte send=xxxMB，这个数量级1个并发度都够感觉。。。

Husky Zeng <56...@qq.com> 于2020年10月30日周五 下午4:16写道：

> shuffle位置在前面时：
>
> <http://apache-flink.147419.n8.nabble.com/file/t930/shuffle%E5%89%8D.png>
>
>
>
> shuffle位置在后面时：
>
> <http://apache-flink.147419.n8.nabble.com/file/t930/shuffle%E5%90%8E2.png>
>
>
>
> <http://apache-flink.147419.n8.nabble.com/file/t930/shuaffle%E5%90%8E1.png>
>
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
>

Re: 讨论分析：数据类型对于shuffle时数据传输IO速度的影响（数十倍的差距）

Posted by Husky Zeng <56...@qq.com>.

shuffle位置在前面时：   

<http://apache-flink.147419.n8.nabble.com/file/t930/shuffle%E5%89%8D.png> 


 
shuffle位置在后面时：    
 
<http://apache-flink.147419.n8.nabble.com/file/t930/shuffle%E5%90%8E2.png> 


<http://apache-flink.147419.n8.nabble.com/file/t930/shuaffle%E5%90%8E1.png> 



--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 讨论分析：数据类型对于shuffle时数据传输IO速度的影响（数十倍的差距）

Posted by Husky Zeng <56...@qq.com>.

你是用的Filesystem connector读写hdfs的吗？

>>>>>>>是的





由于source和sink的并发已经确定了，中间不管哪个阶段进行shuffle，其实对首尾的处理速度应该影响不大。


>>>>>>>现状是首尾处理速度的确影响不大，但是"shuffle个过程的数据传输速度" 在 "不同的位置"  差异很大。

当我把shuffle加到cal和sort中间时，

source（640并发）-->cal（640并发）--
（rebalance）->sort（64并发）--->SinkConversionToRow（64并发）--->sink（64并发）

shuffle的数据传输IO速度是3G/s，370G文件传输花费2分钟。

当我把shuffle加到SinkConversionToRow和sink中间时，

source（640并发）-->cal（640并发）--
->sort（640并发）--->SinkConversionToRow（640并发）--（rebalance）-->sink（64并发）

shuffle的数据传输IO速度是0.1G/s，250G文件传输花费40分钟。






--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 讨论分析：数据类型对于shuffle时数据传输IO速度的影响（数十倍的差距）

Posted by admin <17...@163.com>.

你是用的Filesystem connector读写hdfs的吗？数据序列化和反序列化的时间也有差异，而且source和sink的并发度也有很大差异，为了控制小文件数量，减少了sink的并发度，那写入速度肯定也是有限的。
由于source和sink的并发已经确定了，中间不管哪个阶段进行shuffle，其实对首尾的处理速度应该影响不大。
以上是个人愚见，欢迎大佬指正。

> 2020年10月30日 下午2:30，Husky Zeng <56...@qq.com> 写道：
> 
> 我们的场景是这样的：
> 
> 从hive读数据，计算后写回hive。
> 
> 从hive读数据，为了加快速度，使用了650个并发subTask。
> 
> 向hive写数据，为了减少小文件，需要控制并发subTask数量。
> 
> 因此需要找一个环节进行shuffle。
> 
> 所以有上面的疑问。
> 
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 讨论分析：数据类型对于shuffle时数据传输IO速度的影响（数十倍的差距）

Posted by Husky Zeng <56...@qq.com>.

我们的场景是这样的：

从hive读数据，计算后写回hive。

从hive读数据，为了加快速度，使用了650个并发subTask。

向hive写数据，为了减少小文件，需要控制并发subTask数量。

因此需要找一个环节进行shuffle。

所以有上面的疑问。




--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 讨论分析：数据类型对于shuffle时数据传输IO速度的影响（数十倍的差距）

Posted by Husky Zeng <56...@qq.com>.

我把operator chain和streaming
dataflow的概念弄混了，不好意思。我想表达的是在整个任务流程中，选择shuffle的位置对于性能的影响。 



--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 讨论分析：数据类型对于shuffle时数据传输IO速度的影响（数十倍的差距）

Posted by admin <17...@163.com>.

HI,
operator chain的作用不就是避免shuffle，减少网络间的传输吗？你为什么要手动shuffle呢？

> 2020年10月30日 上午10:24，Husky Zeng <56...@qq.com> 写道：
> 
> 补充一个细节：
> 
> 
> 当我把shuffle加到cal和sort中间时，
> 
> source-->cal-- （rebalance）->sort--->SinkConversionToRow--->sink
> 
> shuffle的数据传输IO速度是3G/s，需要传输的文件大小是370G。
> 
> 当我把shuffle加到SinkConversionToRow和sink中间时，
> 
> source-->cal-- ->sort--->SinkConversionToRow--（rebalance）-->sink
> 
> shuffle的数据传输IO速度是0.1G/s，需要传输的文件大小是250G。
> 
> 
> 文件大小也是有区别的。
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 讨论分析：数据类型对于shuffle时数据传输IO速度的影响（数十倍的差距）

Posted by Husky Zeng <56...@qq.com>.

补充一个细节：


当我把shuffle加到cal和sort中间时，

source-->cal-- （rebalance）->sort--->SinkConversionToRow--->sink

shuffle的数据传输IO速度是3G/s，需要传输的文件大小是370G。

当我把shuffle加到SinkConversionToRow和sink中间时，

source-->cal-- ->sort--->SinkConversionToRow--（rebalance）-->sink

shuffle的数据传输IO速度是0.1G/s，需要传输的文件大小是250G。


文件大小也是有区别的。



--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 数据类型对于shuffle时数据传输IO速度有影响吗？

Posted by Husky Zeng <56...@qq.com>.

破案了，是写hive时有背压导致的。

我们的sql场景是动态分区。

写hive文件用的是
GroupedPartitionWriter，所以"先按照hive分区进行sort再写文件"，和"乱序情况下写文件"，两种方式的开销的差距非常大。因为GroupedPartitionWriter
每写完一个分区，就要把  OutputFormat 给close掉，打开新分区的OutputFormat 。而乱序会导致它持有的OutputFormat
不断打开关闭，几乎每条数据都要换分区。

所以导致第二种情况，写hive产生反压。导致IO速度变慢了，而并非文件格式等其他因素导致变慢。

标题误导大家了， 不好意思。。






--
Sent from: http://apache-flink.147419.n8.nabble.com/