You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by 娄东风 <25...@qq.com> on 2020/06/01 03:36:33 UTC
tez shuffle fetch phase has long pause
Hi
I'm using Tez 0.9.1 and Hive 2.3.3. Running TPC-DS query15 with 1TB.
In Reduce5, i see long pause during fetch occasionally so it's hard to make a Jstack.
Reduce5 depends on Map4 and Reduce2,these two vertexes finished before 16:33:00.
So the fetch task should not be in wait state for upstream vertexes.
How do i find out what makes this long pause?
thanks.
2020-05-30 16:36:29,531 [INFO] [Fetcher_B {Map_4} #12] |ShuffleManager.fetch|: Completed fetch for attempt: {5, 0, attempt_1590728138875_0202_1_01_000005_0_10003} to MEMORY, csize=24509, dsize=66456, EndTime=1590827789531, TimeTaken=1, Rate=23.37 MB/s 2020-05-30 16:37:41,368 [INFO] [Fetcher_B {Map_4} #2] |HttpConnection.url|: for url=http://node-ana-coreLKpD0001:13562/mapOutput?job=job_1590728138875_0202&dag=1&reduce=245&map=attempt_1590728138875_0202_1_01_000027_0_10002,attempt_1590728138875_0202_1_01_000023_0_10003,attempt_1590728138875_0202_1_01_000030_0_10002,attempt_1590728138875_0202_1_01_000025_0_10003,attempt_1590728138875_0202_1_01_000029_0_10003 sent hash and receievd reply 0 ms
Re: tez shuffle fetch phase has long pause
Posted by Kuhu Shukla <ks...@ncsu.edu>.
Thank you for reporting this.
You might want to check the logs and other metrics of the node of the
upstream task to see if it is healthy or not. The download rate seems quite
low to me and depending on the fetch size , we can verify if this is
because of poor network bandwidth and/or bad node health.
Let us know what you think and find out.
Regards,
Kuhu
On Sun, May 31, 2020, 22:36 娄东风 <25...@qq.com> wrote:
> Hi
> I'm using Tez 0.9.1 and Hive 2.3.3. Running TPC-DS query15 with 1TB.
> In Reduce5, i see long pause during fetch occasionally so it's hard to
> make a Jstack.
> Reduce5 depends on Map4 and Reduce2,these two vertexes finished before
> 16:33:00.
> So the fetch task should not be in wait state for upstream vertexes.
> How do i find out what makes this long pause?
> thanks.
>
> 2020-05-30 16:36:29,531 [INFO] [Fetcher_B {Map_4} #12] |ShuffleManager.fetch|: Completed fetch for attempt: {5, 0, attempt_1590728138875_0202_1_01_000005_0_10003} to MEMORY, csize=24509, dsize=66456, EndTime=1590827789531, TimeTaken=1, Rate=23.37 MB/s
> 2020-05-30 16:37:41,368 [INFO] [Fetcher_B {Map_4} #2] |HttpConnection.url|: for url=http://node-ana-coreLKpD0001:13562/mapOutput?job=job_1590728138875_0202&dag=1&reduce=245&map=attempt_1590728138875_0202_1_01_000027_0_10002,attempt_1590728138875_0202_1_01_000023_0_10003,attempt_1590728138875_0202_1_01_000030_0_10002,attempt_1590728138875_0202_1_01_000025_0_10003,attempt_1590728138875_0202_1_01_000029_0_10003 sent hash and receievd reply 0 ms
>
>
>
Re: tez shuffle fetch phase has long pause
Posted by 娄东风 <25...@qq.com>.
Correct! It is TCP listen overflow issue!
Thanks for your help, Gopal V and Kuhu!
Using below command, i can see many overflows.
# netstat -s | grep -i overflow
[root@node-ana-coreLKpD0001 ~]# netstat -s | grep -i overflow
96282 times the listen queue of a socket overflowed
TCPTimeWaitOverflow: 2499680
net.core.somaxconn default is 128 on my cluster, which is too small.
I ran about 10 queries concurrently with the following settings
set hive.tez.auto.reducer.parallelism=true
which made the situation worse.
Follow Gopal V's instructions below ,fetch phase long pause disappeared.
------------------ Original ------------------
From: "Gopal V";<gopalv@apache.org>;
Send time: Monday, Jun 1, 2020 1:49 PM
To: "user"<user@tez.apache.org>;
Subject: Re: tez shuffle fetch phase has long pause
> In Reduce5, i see long pause during fetch occasionally
This is likely the TCP listen overflow issue, but just doesn't get
reported as a packet loss issue because the retry works okay.
https://issues.apache.org/jira/browse/MAPREDUCE-6763
That's the fix to be applied on the YARN Shuffle handler.
You can confirm the change by running
# ss -tln
I usually diagnose it by checking for TCP cookies in the dmesg or
looking at the snmp data.
# netstat -s | grep -i overflow
This issue also affects HDFS namenode, which is also usually unreported
by users.
https://issues.apache.org/jira/browse/HADOOP-16504
The delay is usually 2 * tcp max-segment-length & is usually reduced by
increasing the OS half-open connection count.
I end up doing
# sysctl -w net.core.somaxconn=16384
# sysctl -w net.ipv4.tcp_fin_timeout=2
to speed up the retries & restarting daemons.
This affects Tez a little worse than MRv2, because the same JVM runs
multiple instances of the same vertex sequentially, instead of a new JVM
for every task (which runs way slower, reducing the concurrency of
connections).
Cheers,
Gopal
Re: tez shuffle fetch phase has long pause
Posted by Gopal V <go...@apache.org>.
> In Reduce5, i see long pause during fetch occasionally
This is likely the TCP listen overflow issue, but just doesn't get
reported as a packet loss issue because the retry works okay.
https://issues.apache.org/jira/browse/MAPREDUCE-6763
That's the fix to be applied on the YARN Shuffle handler.
You can confirm the change by running
# ss -tln
I usually diagnose it by checking for TCP cookies in the dmesg or
looking at the snmp data.
# netstat -s | grep -i overflow
This issue also affects HDFS namenode, which is also usually unreported
by users.
https://issues.apache.org/jira/browse/HADOOP-16504
The delay is usually 2 * tcp max-segment-length & is usually reduced by
increasing the OS half-open connection count.
I end up doing
# sysctl -w net.core.somaxconn=16384
# sysctl -w net.ipv4.tcp_fin_timeout=2
to speed up the retries & restarting daemons.
This affects Tez a little worse than MRv2, because the same JVM runs
multiple instances of the same vertex sequentially, instead of a new JVM
for every task (which runs way slower, reducing the concurrency of
connections).
Cheers,
Gopal