You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by Kurt Young <yk...@gmail.com> on 2019/05/28 06:10:07 UTC

Re: Blink在Hive表没有统计信息的情况下如何优化

没有统计信息确实很难生成比较靠谱的执行计划,这也是之前很多DBA的工作 ;-)

你可以试试以下以下操作:
1. 如果是join顺序不合理,可以手动调整sql中的join顺序,并且关闭join reorder
2.
看看fail的具体原因,如果是个别比较激进的算子表现不好,比如HashAggregate、HashJoin,你可以手动禁止掉这些算子,选择性能稍差但可能执行起来更稳健的算子,比如SortMergeJoin

这是我拍脑袋想的,具体的建议你先分析一下SQL为什么会fail,然后贴出具体的问题来。

另外,我们正在开发SQL hint功能,可以有效缓解类似问题。

Best,
Kurt


On Tue, May 28, 2019 at 12:32 PM bigdatayunzhongyan <
bigdatayunzhongyan@aliyun.com> wrote:

>  @Kurt Young  @Jark Wu  @Bowen Li
>     先描述下我这边的情况,一些简单的SQL无论是否有统计的信息的情况下都能执行成功,
>     但是一些复杂的SQL在没有统计信息的情况下一直执行失败,尝试开启各种参数,都失败了,很痛苦,
>     不过在设置统计信息及开启相关参数后很轻松就能执行成功(点赞)。
>     我们线上的情况是很多表信息都没有统计信息,请问有哪些优化的办法。
>
> 启动命令:./bin/yarn-session.sh -n 50 -s 20 -jm 3072 -tm 6144 -d
> 配置见附件
> 谢谢!
>
>

Re: Blink在Hive表没有统计信息的情况下如何优化

Posted by Kurt Young <yk...@gmail.com>.
你先试试把HashJoin这个算子禁用看看,TableConfig里添加这个配置

sql.exec.disabled-operators: HashJoin

Best,
Kurt


On Tue, May 28, 2019 at 3:23 PM bigdatayunzhongyan <
bigdatayunzhongyan@aliyun.com> wrote:

> 感谢 @Kurt Young 大神的回复,报错信息在附件。谢谢!
>
>
> 在2019年05月28日 14:10,Kurt Young<yk...@gmail.com> <yk...@gmail.com> 写道:
>
> 没有统计信息确实很难生成比较靠谱的执行计划,这也是之前很多DBA的工作 ;-)
>
> 你可以试试以下以下操作:
> 1. 如果是join顺序不合理,可以手动调整sql中的join顺序,并且关闭join reorder
> 2.
> 看看fail的具体原因,如果是个别比较激进的算子表现不好,比如HashAggregate、HashJoin,你可以手动禁止掉这些算子,选择性能稍差但可能执行起来更稳健的算子,比如SortMergeJoin
>
> 这是我拍脑袋想的,具体的建议你先分析一下SQL为什么会fail,然后贴出具体的问题来。
>
> 另外,我们正在开发SQL hint功能,可以有效缓解类似问题。
>
> Best,
> Kurt
>
>
> On Tue, May 28, 2019 at 12:32 PM bigdatayunzhongyan <
> bigdatayunzhongyan@aliyun.com> wrote:
>
>>  @Kurt Young  @Jark Wu  @Bowen Li
>>     先描述下我这边的情况,一些简单的SQL无论是否有统计的信息的情况下都能执行成功,
>>     但是一些复杂的SQL在没有统计信息的情况下一直执行失败,尝试开启各种参数,都失败了,很痛苦,
>>     不过在设置统计信息及开启相关参数后很轻松就能执行成功(点赞)。
>>     我们线上的情况是很多表信息都没有统计信息,请问有哪些优化的办法。
>>
>> 启动命令:./bin/yarn-session.sh -n 50 -s 20 -jm 3072 -tm 6144 -d
>> 配置见附件
>> 谢谢!
>>
>>

回复: Blink在Hive表没有统计信息的情况下如何优化

Posted by bigdatayunzhongyan <bi...@aliyun.com.INVALID>.
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Fatal error at remote task manager '/xxxxxx:14941'.
        at org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.decodeMsg(CreditBasedPartitionRequestClientHandler.java:276)
        at org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelRead(CreditBasedPartitionRequestClientHandler.java:182)
        at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
        at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
        at org.apache.flink.runtime.io.network.netty.ZeroCopyNettyMessageDecoder.channelRead(ZeroCopyNettyMessageDecoder.java:128)
        at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
        at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
        at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
        at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
        at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.nio.channels.ClosedChannelException
        at org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.writeAndFlushNextMessageIfPossible(PartitionRequestQueue.java:297)
        at org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.enqueueAvailableReader(PartitionRequestQueue.java:119)
        at org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.addCredit(PartitionRequestQueue.java:181)
        at org.apache.flink.runtime.io.network.netty.PartitionRequestServerHandler.channelRead0(PartitionRequestServerHandler.java:140)
        at org.apache.flink.runtime.io.network.netty.PartitionRequestServerHandler.channelRead0(PartitionRequestServerHandler.java:43)
        at org.apache.flink.shaded.netty4.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        ... 13 more
Caused by: org.apache.flink.runtime.io.network.partition.DataConsumptionException: java.nio.channels.ClosedChannelException
        at org.apache.flink.runtime.io.network.partition.SpilledSubpartitionView.getNextBuffer(SpilledSubpartitionView.java:150)
        at org.apache.flink.runtime.io.network.netty.CreditBasedSequenceNumberingViewReader.getNextBuffer(CreditBasedSequenceNumberingViewReader.java:162)





 ERROR org.apache.flink.runtime.io.network.netty.PartitionRequestQueue  - Encountered error while consuming partitions
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at org.apache.flink.shaded.netty4.io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
        at org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
        at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)
        at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
        at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:745)
.......

org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingWithOneInputStreamOperatorOutput.pushToOperator(OperatorChain.java:638)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingWithOneInputStreamOperatorOutput.collect(OperatorChain.java:612)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingWithOneInputStreamOperatorOutput.collect(OperatorChain.java:575)
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:763)
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:741)
        at org.apache.flink.table.runtime.util.StreamRecordCollector.collect(StreamRecordCollector.java:44)
        at org.apache.flink.table.runtime.join.batch.HashJoinOperator.collect(HashJoinOperator.java:202)
        at org.apache.flink.table.runtime.join.batch.HashJoinOperator.innerJoin(HashJoinOperator.java:187)
        at org.apache.flink.table.runtime.join.batch.HashJoinOperator$InnerHashJoinOperator.join(HashJoinOperator.java:310)
        at org.apache.flink.table.runtime.join.batch.HashJoinOperator.joinWithNextKey(HashJoinOperator.java:181)
        at org.apache.flink.table.runtime.join.batch.HashJoinOperator.processElement2(HashJoinOperator.java:159)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$TwoInputStreamOperatorProxy.processElement2(OperatorChain.java:1389)
        at org.apache.flink.streaming.runtime.io.SecondOfTwoInputProcessor.processRecord(SecondOfTwoInputProcessor.java:91)
        at org.apache.flink.streaming.runtime.io.InputGateFetcher.fetchAndProcess(InputGateFetcher.java:159)
        at org.apache.flink.streaming.runtime.io.StreamArbitraryInputProcessor.process(StreamArbitraryInputProcessor.java:134)
        at org.apache.flink.streaming.runtime.tasks.ArbitraryInputStreamTask.run(ArbitraryInputStreamTask.java:183)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:324)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:727)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingWithSecondInputOfTwoInputStreamOperatorOutput.pushToOperator(OperatorChain.java:850)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingWithSecondInputOfTwoInputStreamOperatorOutput.collect(OperatorChain.java:824)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingWithSecondInputOfTwoInputStreamOperatorOutput.collect(OperatorChain.java:787)
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:763)
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:741)
        at BatchExecCalcRule$64.processElement(Unknown Source)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingWithOneInputStreamOperatorOutput.pushToOperator(OperatorChain.java:635)
        ... 18 more
Caused by: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingWithOneInputStreamOperatorOutput.pushToOperator(OperatorChain.java:638)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingWithOneInputStreamOperatorOutput.collect(OperatorChain.java:612)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingWithOneInputStreamOperatorOutput.collect(OperatorChain.java:575)
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:763)
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:741)
        at org.apache.flink.table.runtime.util.StreamRecordCollector.collect(StreamRecordCollector.java:44)
        at org.apache.flink.table.runtime.join.batch.HashJoinOperator.collect(HashJoinOperator.java:202)
        at org.apache.flink.table.runtime.join.batch.HashJoinOperator.innerJoin(HashJoinOperator.java:187)
        at org.apache.flink.table.runtime.join.batch.HashJoinOperator$InnerHashJoinOperator.join(HashJoinOperator.java:310)
        at org.apache.flink.table.runtime.join.batch.HashJoinOperator.joinWithNextKey(HashJoinOperator.java:181)
        at org.apache.flink.table.runtime.join.batch.HashJoinOperator.processElement2(HashJoinOperator.java:159)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$TwoInputStreamOperatorProxy.processElement2(OperatorChain.java:1389)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingWithSecondInputOfTwoInputStreamOperatorOutput.pushToOperator(OperatorChain.java:847)
        ... 24 more

Caused by: java.lang.RuntimeException
        at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:96)
        at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:76)
        at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:42)
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:763)
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:741)
        at BatchExecCalcRule$82.processElement(Unknown Source)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingWithOneInputStreamOperatorOutput.pushToOperator(OperatorChain.java:635)
        ... 36 more
Caused by: java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment(LocalBufferPool.java:250)
        at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:207)
        at org.apache.flink.runtime.io.network.partition.InternalResultPartition.requestNewBufferBuilder(InternalResultPartition.java:448)
        at org.apache.flink.runtime.io.network.partition.InternalResultPartition.copyFromSerializerToTargetChannel(InternalResultPartition.java:526)
        at org.apache.flink.runtime.io.network.partition.InternalResultPartition.emitRecord(InternalResultPartition.java:231)
        at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:89)
        at org.apache.flink.streaming.runtime.io.StreamRecordWriter.emit(StreamRecordWriter.java:81)
        at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:93)
        ... 42 more