You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by Zac Blanco <za...@alluxio.com> on 2018/11/07 00:21:35 UTC

Tez & High Concurrency

Hi,

I’m working on a set up where we would like to test how Tez behaves with lots of concurrenct tasks running without spinning up a huge cluster and generating a real workload.

We’ve tried to simulate this by setting up an external table in hive with 10k- 50k files about 5MB in size each and set the tez.grouping.min-size and max-size equal to the size of the files. YARN container sizes are also set appropriately. Tez is able to properly calculate the number of tasks that we have files that we see in the PENDING column within the Hive shell, but we are unable to have a large number of them run concurrently. It seems that there are only ever 10-20 tasks running at a time however our YARN RM reports < 10% utilization so we know cluster resources are not the issue. Is there a way to “trick” Tez into scheduling more tasks concurrently?

We are running simple queries so it may be that tasks are simply finishing too fast but, for the scale of tasks we have, we expect more than 10-20 running at the same time. Any help would be appreciated.

Thank you,
Zac

Re: Tez & High Concurrency

Posted by Zac Blanco <za...@alluxio.com>.
Hi Gopal,

Thanks for the very informative and detailed response. I appreciate all of the information and links. I haven’t gotten back around to our testing yet but hopefully this information points us in the right direction.

Thanks again,
Zac

> On Nov 6, 2018, at 8:41 PM, Gopal Vijayaraghavan <go...@apache.org> wrote:
> 
> 
> Hi,
> 
>> It seems that there are only ever 10-20 tasks running at a time however our YARN RM reports < 10% utilization so we know cluster resources are not the issue. Is there a way to “trick” Tez into scheduling more tasks concurrently?
> ...
>> We are running simple queries so it may be that tasks are simply finishing too fast but, for the scale of tasks we have, we expect more than 10-20 running at the same time. Any help would be appreciated.
> 
> I have seen something like this in the past, but it needs a few "special" circumstances which are not common in a standard Hadoop cluster.
> 
> The Tez + YARN interaction is heavily driven by locality (as in Tez will always ask for a mapper with locality to YARN and YARN will try to satisfy it heavily) and that absolutely made sense when you consider HDFS was always co-located with YARN.
> 
> However, that doesn't work as the architectures evolve.
> 
> https://issues.apache.org/jira/browse/TEZ-3291
> 
> is a more trivial scenario of the problem, though that might be a very specific Azure example (i.e "ignore localhost, it is bogus").
> 
> But for another filesystem, the problem was a bit more intractable as it would provide IP addresses for the locality, but those IP addresses belong to a BSD service appliance which will never run YARN.
> 
> In that scenario, the following YARN ticket comes into play
> 
> https://issues.apache.org/jira/browse/YARN-4189
> 
> Wangda has a better deep-dive of that problem on his blog
> 
> https://wangda.live/2017/08/23/deep-understand-locality-in-capacityscheduler-and-how-to-control-it/
> 
> Short version is that if you provide locality information in your split, but don't run a NodeManager on that IP, YARN effectively throttles containers and swipes left for 40 heartbeats before taking a rack-local.
> 
> The basic config to start tweaking is "yarn.scheduler.capacity.node-locality-delay" and then turn off the additional rack-local delay (or pretend to be 1 rack).
> 
> If you want to go spelunking into the Hadoop core, here's the place to start 
> 
> https://github.com/apache/hadoop/blob/8598b498bcaf4deffa822f871a26635bdf3d9d5c/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L324
> 
> Cheers,
> Gopal
> 
> 


Re: Tez & High Concurrency

Posted by Gopal Vijayaraghavan <go...@apache.org>.
Hi,

>  It seems that there are only ever 10-20 tasks running at a time however our YARN RM reports < 10% utilization so we know cluster resources are not the issue. Is there a way to “trick” Tez into scheduling more tasks concurrently?
...
> We are running simple queries so it may be that tasks are simply finishing too fast but, for the scale of tasks we have, we expect more than 10-20 running at the same time. Any help would be appreciated.

I have seen something like this in the past, but it needs a few "special" circumstances which are not common in a standard Hadoop cluster.

The Tez + YARN interaction is heavily driven by locality (as in Tez will always ask for a mapper with locality to YARN and YARN will try to satisfy it heavily) and that absolutely made sense when you consider HDFS was always co-located with YARN.

However, that doesn't work as the architectures evolve.

https://issues.apache.org/jira/browse/TEZ-3291

is a more trivial scenario of the problem, though that might be a very specific Azure example (i.e "ignore localhost, it is bogus").

But for another filesystem, the problem was a bit more intractable as it would provide IP addresses for the locality, but those IP addresses belong to a BSD service appliance which will never run YARN.

In that scenario, the following YARN ticket comes into play

https://issues.apache.org/jira/browse/YARN-4189

Wangda has a better deep-dive of that problem on his blog

https://wangda.live/2017/08/23/deep-understand-locality-in-capacityscheduler-and-how-to-control-it/

Short version is that if you provide locality information in your split, but don't run a NodeManager on that IP, YARN effectively throttles containers and swipes left for 40 heartbeats before taking a rack-local.

The basic config to start tweaking is "yarn.scheduler.capacity.node-locality-delay" and then turn off the additional rack-local delay (or pretend to be 1 rack).

If you want to go spelunking into the Hadoop core, here's the place to start 

https://github.com/apache/hadoop/blob/8598b498bcaf4deffa822f871a26635bdf3d9d5c/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L324

Cheers,
Gopal