You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Ken Krugler <kk...@transpac.com> on 2016/04/29 06:00:07 UTC

Checking actual config values used by TaskManager

Hi all,

I’m running jobs on EMR via YARN, and wondering how to check exactly what configuration settings are actually being used.

This is mostly for the TaskManager.

I know I can modify the conf/flink-conf.yaml file, and (via the CLI) I can use -yD param=value.

But my experience with Hadoop makes me want to see the exact values being used, versus assuming I know what’s been set :)

Thanks,

— Ken


--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr




Re: Checking actual config values used by TaskManager

Posted by Ken Krugler <kk...@transpac.com>.
Hi Max,

> On May 2, 2016, at 4:43am, Maximilian Michels <mx...@apache.org> wrote:
> 
> Hi Ken,
> 
> When you're running Yarn, the Flink configuration is created once and
> shared among all nodes (JobManager and TaskManagers). Please have a
> look at the JobManager tab on the web interface. It shows you the
> configuration.

I’ve seen that, but the values displayed don’t match what I’m setting, or what I see in the logs.

I’m running a job using ./bin/flink run, with parameters:

-ytm 20000 \
-yjm 2048 \
-ys 4 \
-p 10 \
-yD taskmanager.network.numberOfBuffers=3000 \
-yD taskmanager.memory.off-heap=true

Here’s a screenshot from the JobManager:



If that doesn’t come through, it’s showing:

job manager.heap.mb	256
taskmanager.heap.mb	512
taskmanager.memory.off-heap	true
taskmanager.network.numberOfBuffers	3000
taskmanager.numberOfTaskSlots	1

So numberOfBuffers seems right, same with memory.off-heap.

But taskmanager.heap.mb looks like a default value, same for numberOfTaskSlots and jobmanager.heap.mb

When I look at my actual job, the settings I’m seeing for number of slots (as an example) match what I’m specifying from the command line.

When I look at the JobManager logs, I see -Xmx1448M, which I guess is an approximation of the 2048 I specified.

And when I look at the TaskManager logs, the JVM settings match what I’d expect (for -ytm 20000, so 15GB direct, and about 5GB for the JVM).
2016-05-05 01:07:16,161 INFO  org.apache.flink.yarn.YarnTaskManagerRunner                   -  JVM Options:
2016-05-05 01:07:16,161 INFO  org.apache.flink.yarn.YarnTaskManagerRunner                   -     -Xms4500m
2016-05-05 01:07:16,161 INFO  org.apache.flink.yarn.YarnTaskManagerRunner                   -     -Xmx4500m
2016-05-05 01:07:16,161 INFO  org.apache.flink.yarn.YarnTaskManagerRunner                   -     -XX:MaxDirectMemorySize=15000m
So I guess I’ve got two questions…

1. What is the meaning of the values I’m seeing in the JobManager UI.

2. How do I figure out what the TaskManager is getting for -yD taskmanager.tmp.dirs, as an example.

Thanks,

— Ken

> On Fri, Apr 29, 2016 at 3:18 PM, Ken Krugler
> <kk...@transpac.com> wrote:
>> Hi Timur,
>> 
>> On Apr 28, 2016, at 10:40pm, Timur Fayruzov <ti...@gmail.com>
>> wrote:
>> 
>> If you're talking about parameters that were set on JVM startup then `ps
>> aux|grep flink` on an EMR slave node should do the trick, that'll give you
>> the full command line.
>> 
>> 
>> No, I’m talking about values that come from flink-conf.yaml.
>> 
>> Maybe there’s no good reason to worry, but in Hadoop land you can have
>> parameters set via the conf on the client, which in turn get overridden by
>> values from conf files on the nodes, which you can then override via command
>> line parameters, which in turn can be changed by the user code.
>> 
>> Plus parameters that can be flagged as final/unmodifiable, and thus some of
>> the above actually don’t change anything.
>> 
>> So it’s a common issue where what you think you set as a value isn’t
>> actually being used, and that’s why examining the job conf that was actually
>> deployed with tasks is critical.
>> 
>> — Ken
>> 
>> 
>> 
>> On Thu, Apr 28, 2016 at 9:00 PM, Ken Krugler <kk...@transpac.com>
>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I’m running jobs on EMR via YARN, and wondering how to check exactly what
>>> configuration settings are actually being used.
>>> 
>>> This is mostly for the TaskManager.
>>> 
>>> I know I can modify the conf/flink-conf.yaml file, and (via the CLI) I can
>>> use -yD param=value.
>>> 
>>> But my experience with Hadoop makes me want to see the exact values being
>>> used, versus assuming I know what’s been set :)
>>> 
>>> Thanks,
>>> 
>>> — Ken
>> 
>> 
>> --------------------------
>> Ken Krugler
>> +1 530-210-6378
>> http://www.scaleunlimited.com
>> custom big data solutions & training
>> Hadoop, Cascading, Cassandra & Solr
>> 
>> 
>> 

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr




Re: Checking actual config values used by TaskManager

Posted by Maximilian Michels <mx...@apache.org>.
Hi Ken,

When you're running Yarn, the Flink configuration is created once and
shared among all nodes (JobManager and TaskManagers). Please have a
look at the JobManager tab on the web interface. It shows you the
configuration.

Cheers,
Max

On Fri, Apr 29, 2016 at 3:18 PM, Ken Krugler
<kk...@transpac.com> wrote:
> Hi Timur,
>
> On Apr 28, 2016, at 10:40pm, Timur Fayruzov <ti...@gmail.com>
> wrote:
>
> If you're talking about parameters that were set on JVM startup then `ps
> aux|grep flink` on an EMR slave node should do the trick, that'll give you
> the full command line.
>
>
> No, I’m talking about values that come from flink-conf.yaml.
>
> Maybe there’s no good reason to worry, but in Hadoop land you can have
> parameters set via the conf on the client, which in turn get overridden by
> values from conf files on the nodes, which you can then override via command
> line parameters, which in turn can be changed by the user code.
>
> Plus parameters that can be flagged as final/unmodifiable, and thus some of
> the above actually don’t change anything.
>
> So it’s a common issue where what you think you set as a value isn’t
> actually being used, and that’s why examining the job conf that was actually
> deployed with tasks is critical.
>
> — Ken
>
>
>
> On Thu, Apr 28, 2016 at 9:00 PM, Ken Krugler <kk...@transpac.com>
> wrote:
>>
>> Hi all,
>>
>> I’m running jobs on EMR via YARN, and wondering how to check exactly what
>> configuration settings are actually being used.
>>
>> This is mostly for the TaskManager.
>>
>> I know I can modify the conf/flink-conf.yaml file, and (via the CLI) I can
>> use -yD param=value.
>>
>> But my experience with Hadoop makes me want to see the exact values being
>> used, versus assuming I know what’s been set :)
>>
>> Thanks,
>>
>> — Ken
>
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>

Re: Checking actual config values used by TaskManager

Posted by Ken Krugler <kk...@transpac.com>.
Hi Timur,

> On Apr 28, 2016, at 10:40pm, Timur Fayruzov <ti...@gmail.com> wrote:
> 
> If you're talking about parameters that were set on JVM startup then `ps aux|grep flink` on an EMR slave node should do the trick, that'll give you the full command line.

No, I’m talking about values that come from flink-conf.yaml.

Maybe there’s no good reason to worry, but in Hadoop land you can have parameters set via the conf on the client, which in turn get overridden by values from conf files on the nodes, which you can then override via command line parameters, which in turn can be changed by the user code.

Plus parameters that can be flagged as final/unmodifiable, and thus some of the above actually don’t change anything.

So it’s a common issue where what you think you set as a value isn’t actually being used, and that’s why examining the job conf that was actually deployed with tasks is critical.

— Ken


> 
> On Thu, Apr 28, 2016 at 9:00 PM, Ken Krugler <kkrugler_lists@transpac.com <ma...@transpac.com>> wrote:
> Hi all,
> 
> I’m running jobs on EMR via YARN, and wondering how to check exactly what configuration settings are actually being used.
> 
> This is mostly for the TaskManager.
> 
> I know I can modify the conf/flink-conf.yaml file, and (via the CLI) I can use -yD param=value.
> 
> But my experience with Hadoop makes me want to see the exact values being used, versus assuming I know what’s been set :)
> 
> Thanks,
> 
> — Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr




Re: Checking actual config values used by TaskManager

Posted by Timur Fayruzov <ti...@gmail.com>.
If you're talking about parameters that were set on JVM startup then `ps
aux|grep flink` on an EMR slave node should do the trick, that'll give you
the full command line.

On Thu, Apr 28, 2016 at 9:00 PM, Ken Krugler <kk...@transpac.com>
wrote:

> Hi all,
>
> I’m running jobs on EMR via YARN, and wondering how to check exactly what
> configuration settings are actually being used.
>
> This is mostly for the TaskManager.
>
> I know I can modify the conf/flink-conf.yaml file, and (via the CLI) I can
> use -yD param=value.
>
> But my experience with Hadoop makes me want to see the exact values being
> used, versus assuming I know what’s been set :)
>
> Thanks,
>
> — Ken
>
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>