You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by Johannes Zillmann <jz...@googlemail.com> on 2015/09/14 16:57:32 UTC

Enable runtime compression programatically

Hey guys,

question. How do i enabled tez.runtime.compress programatically ?
When i set this property in the tez-site.xml it is picket up correctly.
But all other options i tried:
- dag.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true");
- mapVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true");
- reduceVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true”);

do not have any effect! (Checking the log output of the Shuffle class)

Johannes

Re: Enable runtime compression programatically

Posted by Johannes Zillmann <jz...@googlemail.com>.
Ok, thanks that helps.

> On 14 Sep 2015, at 20:42, Siddharth Seth <ss...@apache.org> wrote:
> 
> For Edges, the approach that you took with edgeBuilder.setAdditionalConfiguration will work to set relevant Tez properties for an edge. You should be able to iterate through properties and set the config on the edge - and the relevant ones will be set. (Compression has a specific API which you could use, but using setAdditionalConfiguration will also work).
> Typically, additional Hadoop properties are also required for Edges - things like the list of compression codecs. edgeConfigs.setAdditionalConfiguration does take care of allowing these properties through.
> 
> The TezClient needs to be provided a config - which is then made available to the AM. There's not much filtering involved here, and you could set tez.* for this configuration instance. An attempt will be made to pick up YarnConfiguration to connect to the cluster.
> 
> The same applies for InputInitializers and OutputCommitters. Typically (and unfortunately), you'll end up setting all configs.
> 
> dag.setConf, and vertex.setConf should not be used - I've opened a jira to add docs for these.
> 
> How do you get the Hadoop configs in this case ? Is that part of the Configuration like object ?
Yes all the default and explicitly set Hadoop configs are part of the Configuration!

Johannes


> 
> 
> 
> On Mon, Sep 14, 2015 at 9:47 AM, Johannes Zillmann <jzillmann@googlemail.com <ma...@googlemail.com>> wrote:
> Ok, 
> 
> found it. The 
> 	edgeBuilder.setAdditionalConfiguration(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true”); 
> does work for me!
> 
> So let me describe my use case a little bit...
> Basically i have one Configuration like object on the client side. This is assembled by multiple sources and the only way a user can set custom Tez properties (do not use tez-site.xml in any perspective). 
> Then i’m building my DAG with its vertices and edges programatically. 
> Now, do you have any recommendation for me how to route the right Tez properties effectively to the corresponding Tez components ? (with tez components i mean like vertex properties, dag properties, AM properties, edge properties, etc..)
> 
> Should i simply set all tez.* properties to any component or is there a smarter way ?
> And what components/properties might i’m missing ?
> 
> Any help appreciated!
> Johannes
> 
> 
>> On 14 Sep 2015, at 16:57, Johannes Zillmann <jzillmann@googlemail.com <ma...@googlemail.com>> wrote:
>> 
>> Hey guys,
>> 
>> question. How do i enabled tez.runtime.compress programatically ?
>> When i set this property in the tez-site.xml it is picket up correctly.
>> But all other options i tried:
>> - dag.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true");
>> - mapVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true");
>> - reduceVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true”);
>> 
>> do not have any effect! (Checking the log output of the Shuffle class)
>> 
>> Johannes
> 
> 


Re: Enable runtime compression programatically

Posted by Johannes Zillmann <jz...@googlemail.com>.
Hey Sid,

thanks for response!
So i think i got it now! :)

Johannes


On 15 Sep 2015, at 21:44, Siddharth Seth <ss...@apache.org> wrote:
> 
> I'd skip the second step. "For DAG, VERTEX i use the #setConf() method to forward all properties with the corresponding scope from my main conf object". This won't help anything at the moment.
> Other than that, this should work.
> 
> InputInitializers and OutputCommitters (as well as Processors, Inputs, Outputs) have a user payload field. If using FileInputFrmat / FileOutputFormat based Inputs and Outputs - a payload is setup for the initializer / committer. That will contain a Configuration instances (and some more information) serialized to bytes. This Configuration instance would require some of the properties as well.
> Regarding the TezRuntimeConfiguration values - these are used when configuring the standard Edges, and setAdditionalConfiguration will take care of propagating the appropriate config parameters for a specific edge.
> 
> On Tue, Sep 15, 2015 at 3:52 AM, Johannes Zillmann <jzillmann@googlemail.com <ma...@googlemail.com>> wrote:
> Alright… once again…
> 
> So i saw that all the TezConfiguration fields are annotated with a Scope like AM, DAG, VERTEX, etc…
> So here is what i intend to do:
> - The TezConfiguration for TezClient.create() will simply contain all properties from my main conf object
> - For DAG, VERTEX i use the #setConf() method to forward all properties with the corresponding scope from my main conf object
> - For the edgeBuilder i use the #setAdditionalConfiguration() method to forward all properties from my main conf object
> 
> So does this strategy make sense to you or am i missing something or getting it wrong ?
> 
> Couple of more questions:
> - Regarding your comment on InputInitializers and OutputCommitters… I don’t see any possibility to set properties on that. I’m using the user payload to transfer conf values which are needed. Do i miss something here ?
> - What about the TezRuntimeConfiguration values, do i need to do anything special with that ?
> 
> 
> best
> Johannes
>  
> 
> 
>> On 14 Sep 2015, at 20:42, Siddharth Seth <sseth@apache.org <ma...@apache.org>> wrote:
>> 
>> For Edges, the approach that you took with edgeBuilder.setAdditionalConfiguration will work to set relevant Tez properties for an edge. You should be able to iterate through properties and set the config on the edge - and the relevant ones will be set. (Compression has a specific API which you could use, but using setAdditionalConfiguration will also work).
>> Typically, additional Hadoop properties are also required for Edges - things like the list of compression codecs. edgeConfigs.setAdditionalConfiguration does take care of allowing these properties through.
>> 
>> The TezClient needs to be provided a config - which is then made available to the AM. There's not much filtering involved here, and you could set tez.* for this configuration instance. An attempt will be made to pick up YarnConfiguration to connect to the cluster.
>> 
>> The same applies for InputInitializers and OutputCommitters. Typically (and unfortunately), you'll end up setting all configs.
>> 
>> dag.setConf, and vertex.setConf should not be used - I've opened a jira to add docs for these.
>> 
>> How do you get the Hadoop configs in this case ? Is that part of the Configuration like object ?
>> 
>> 
>> 
>> On Mon, Sep 14, 2015 at 9:47 AM, Johannes Zillmann <jzillmann@googlemail.com <ma...@googlemail.com>> wrote:
>> Ok, 
>> 
>> found it. The 
>> 	edgeBuilder.setAdditionalConfiguration(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true”); 
>> does work for me!
>> 
>> So let me describe my use case a little bit...
>> Basically i have one Configuration like object on the client side. This is assembled by multiple sources and the only way a user can set custom Tez properties (do not use tez-site.xml in any perspective). 
>> Then i’m building my DAG with its vertices and edges programatically. 
>> Now, do you have any recommendation for me how to route the right Tez properties effectively to the corresponding Tez components ? (with tez components i mean like vertex properties, dag properties, AM properties, edge properties, etc..)
>> 
>> Should i simply set all tez.* properties to any component or is there a smarter way ?
>> And what components/properties might i’m missing ?
>> 
>> Any help appreciated!
>> Johannes
>> 
>> 
>>> On 14 Sep 2015, at 16:57, Johannes Zillmann <jzillmann@googlemail.com <ma...@googlemail.com>> wrote:
>>> 
>>> Hey guys,
>>> 
>>> question. How do i enabled tez.runtime.compress programatically ?
>>> When i set this property in the tez-site.xml it is picket up correctly.
>>> But all other options i tried:
>>> - dag.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true");
>>> - mapVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true");
>>> - reduceVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true”);
>>> 
>>> do not have any effect! (Checking the log output of the Shuffle class)
>>> 
>>> Johannes
>> 
>> 
> 
> 


Re: Enable runtime compression programatically

Posted by Siddharth Seth <ss...@apache.org>.
I'd skip the second step. "For DAG, VERTEX i use the #setConf() method to
forward *all properties with the* *corresponding scope* from my main conf
object". This won't help anything at the moment.
Other than that, this should work.

InputInitializers and OutputCommitters (as well as Processors, Inputs,
Outputs) have a user payload field. If using FileInputFrmat /
FileOutputFormat based Inputs and Outputs - a payload is setup for the
initializer / committer. That will contain a Configuration instances (and
some more information) serialized to bytes. This Configuration instance
would require some of the properties as well.
Regarding the TezRuntimeConfiguration values - these are used when
configuring the standard Edges, and setAdditionalConfiguration will take
care of propagating the appropriate config parameters for a specific edge.

On Tue, Sep 15, 2015 at 3:52 AM, Johannes Zillmann <jzillmann@googlemail.com
> wrote:

> Alright… once again…
>
> So i saw that all the TezConfiguration fields are annotated with a Scope
> like AM, DAG, VERTEX, etc…
> So here is what i intend to do:
> - The TezConfiguration for TezClient.create() will simply contain *all
> properties *from my main conf object
> - For DAG, VERTEX i use the #setConf() method to forward *all properties
> with the* *corresponding scope* from my main conf object
> - For the edgeBuilder i use the #setAdditionalConfiguration() method to
> forward *all properties *from my main conf object
>
> So does this strategy make sense to you or am i missing something or
> getting it wrong ?
>
> Couple of more questions:
> - Regarding your comment on InputInitializers and OutputCommitters… I
> don’t see any possibility to set properties on that. I’m using the user
> payload to transfer conf values which are needed. Do i miss something here ?
> - What about the TezRuntimeConfiguration values, do i need to do anything
> special with that ?
>
>
> best
> Johannes
>
>
>
> On 14 Sep 2015, at 20:42, Siddharth Seth <ss...@apache.org> wrote:
>
> For Edges, the approach that you took with
> edgeBuilder.setAdditionalConfiguration will work to set relevant Tez
> properties for an edge. You should be able to iterate through properties
> and set the config on the edge - and the relevant ones will be set.
> (Compression has a specific API which you could use, but using
> setAdditionalConfiguration will also work).
> Typically, additional Hadoop properties are also required for Edges -
> things like the list of compression codecs.
> edgeConfigs.setAdditionalConfiguration does take care of allowing these
> properties through.
>
> The TezClient needs to be provided a config - which is then made available
> to the AM. There's not much filtering involved here, and you could set
> tez.* for this configuration instance. An attempt will be made to pick up
> YarnConfiguration to connect to the cluster.
>
> The same applies for InputInitializers and OutputCommitters. Typically
> (and unfortunately), you'll end up setting all configs.
>
> dag.setConf, and vertex.setConf should not be used - I've opened a jira to
> add docs for these.
>
> How do you get the Hadoop configs in this case ? Is that part of the
> Configuration like object ?
>
>
>
> On Mon, Sep 14, 2015 at 9:47 AM, Johannes Zillmann <
> jzillmann@googlemail.com> wrote:
>
>> Ok,
>>
>> found it. The
>>
>> edgeBuilder.setAdditionalConfiguration(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true”);
>> does work for me!
>>
>> So let me describe my use case a little bit...
>> Basically i have one Configuration like object on the client side. This
>> is assembled by multiple sources and the only way a user can set custom Tez
>> properties (do not use tez-site.xml in any perspective).
>> Then i’m building my DAG with its vertices and edges programatically.
>> Now, do you have any recommendation for me how to route the right Tez
>> properties effectively to the corresponding Tez components ? (with tez
>> components i mean like vertex properties, dag properties, AM properties,
>> edge properties, etc..)
>>
>> Should i simply set all tez.* properties to any component or is there a
>> smarter way ?
>> And what components/properties might i’m missing ?
>>
>> Any help appreciated!
>> Johannes
>>
>>
>> On 14 Sep 2015, at 16:57, Johannes Zillmann <jz...@googlemail.com>
>> wrote:
>>
>> Hey guys,
>>
>> question. How do i enabled tez.runtime.compress programatically ?
>> When i set this property in the tez-site.xml it is picket up correctly.
>> But all other options i tried:
>> - dag.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true");
>> - mapVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true"
>> );
>> - reduceVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS,
>> "true”);
>>
>> do not have any effect! (Checking the log output of the Shuffle class)
>>
>> Johannes
>>
>>
>>
>
>

Re: Enable runtime compression programatically

Posted by Johannes Zillmann <jz...@googlemail.com>.
Alright… once again…

So i saw that all the TezConfiguration fields are annotated with a Scope like AM, DAG, VERTEX, etc…
So here is what i intend to do:
- The TezConfiguration for TezClient.create() will simply contain all properties from my main conf object
- For DAG, VERTEX i use the #setConf() method to forward all properties with the corresponding scope from my main conf object
- For the edgeBuilder i use the #setAdditionalConfiguration() method to forward all properties from my main conf object

So does this strategy make sense to you or am i missing something or getting it wrong ?

Couple of more questions:
- Regarding your comment on InputInitializers and OutputCommitters… I don’t see any possibility to set properties on that. I’m using the user payload to transfer conf values which are needed. Do i miss something here ?
- What about the TezRuntimeConfiguration values, do i need to do anything special with that ?


best
Johannes
 


> On 14 Sep 2015, at 20:42, Siddharth Seth <ss...@apache.org> wrote:
> 
> For Edges, the approach that you took with edgeBuilder.setAdditionalConfiguration will work to set relevant Tez properties for an edge. You should be able to iterate through properties and set the config on the edge - and the relevant ones will be set. (Compression has a specific API which you could use, but using setAdditionalConfiguration will also work).
> Typically, additional Hadoop properties are also required for Edges - things like the list of compression codecs. edgeConfigs.setAdditionalConfiguration does take care of allowing these properties through.
> 
> The TezClient needs to be provided a config - which is then made available to the AM. There's not much filtering involved here, and you could set tez.* for this configuration instance. An attempt will be made to pick up YarnConfiguration to connect to the cluster.
> 
> The same applies for InputInitializers and OutputCommitters. Typically (and unfortunately), you'll end up setting all configs.
> 
> dag.setConf, and vertex.setConf should not be used - I've opened a jira to add docs for these.
> 
> How do you get the Hadoop configs in this case ? Is that part of the Configuration like object ?
> 
> 
> 
> On Mon, Sep 14, 2015 at 9:47 AM, Johannes Zillmann <jzillmann@googlemail.com <ma...@googlemail.com>> wrote:
> Ok, 
> 
> found it. The 
> 	edgeBuilder.setAdditionalConfiguration(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true”); 
> does work for me!
> 
> So let me describe my use case a little bit...
> Basically i have one Configuration like object on the client side. This is assembled by multiple sources and the only way a user can set custom Tez properties (do not use tez-site.xml in any perspective). 
> Then i’m building my DAG with its vertices and edges programatically. 
> Now, do you have any recommendation for me how to route the right Tez properties effectively to the corresponding Tez components ? (with tez components i mean like vertex properties, dag properties, AM properties, edge properties, etc..)
> 
> Should i simply set all tez.* properties to any component or is there a smarter way ?
> And what components/properties might i’m missing ?
> 
> Any help appreciated!
> Johannes
> 
> 
>> On 14 Sep 2015, at 16:57, Johannes Zillmann <jzillmann@googlemail.com <ma...@googlemail.com>> wrote:
>> 
>> Hey guys,
>> 
>> question. How do i enabled tez.runtime.compress programatically ?
>> When i set this property in the tez-site.xml it is picket up correctly.
>> But all other options i tried:
>> - dag.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true");
>> - mapVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true");
>> - reduceVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true”);
>> 
>> do not have any effect! (Checking the log output of the Shuffle class)
>> 
>> Johannes
> 
> 


Re: Enable runtime compression programatically

Posted by Siddharth Seth <ss...@apache.org>.
For Edges, the approach that you took with
edgeBuilder.setAdditionalConfiguration will work to set relevant Tez
properties for an edge. You should be able to iterate through properties
and set the config on the edge - and the relevant ones will be set.
(Compression has a specific API which you could use, but using
setAdditionalConfiguration will also work).
Typically, additional Hadoop properties are also required for Edges -
things like the list of compression codecs.
edgeConfigs.setAdditionalConfiguration does take care of allowing these
properties through.

The TezClient needs to be provided a config - which is then made available
to the AM. There's not much filtering involved here, and you could set
tez.* for this configuration instance. An attempt will be made to pick up
YarnConfiguration to connect to the cluster.

The same applies for InputInitializers and OutputCommitters. Typically (and
unfortunately), you'll end up setting all configs.

dag.setConf, and vertex.setConf should not be used - I've opened a jira to
add docs for these.

How do you get the Hadoop configs in this case ? Is that part of the
Configuration like object ?



On Mon, Sep 14, 2015 at 9:47 AM, Johannes Zillmann <jzillmann@googlemail.com
> wrote:

> Ok,
>
> found it. The
>
> edgeBuilder.setAdditionalConfiguration(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true”);
> does work for me!
>
> So let me describe my use case a little bit...
> Basically i have one Configuration like object on the client side. This is
> assembled by multiple sources and the only way a user can set custom Tez
> properties (do not use tez-site.xml in any perspective).
> Then i’m building my DAG with its vertices and edges programatically.
> Now, do you have any recommendation for me how to route the right Tez
> properties effectively to the corresponding Tez components ? (with tez
> components i mean like vertex properties, dag properties, AM properties,
> edge properties, etc..)
>
> Should i simply set all tez.* properties to any component or is there a
> smarter way ?
> And what components/properties might i’m missing ?
>
> Any help appreciated!
> Johannes
>
>
> On 14 Sep 2015, at 16:57, Johannes Zillmann <jz...@googlemail.com>
> wrote:
>
> Hey guys,
>
> question. How do i enabled tez.runtime.compress programatically ?
> When i set this property in the tez-site.xml it is picket up correctly.
> But all other options i tried:
> - dag.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true");
> - mapVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true");
> - reduceVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true
> ”);
>
> do not have any effect! (Checking the log output of the Shuffle class)
>
> Johannes
>
>
>

Re: Enable runtime compression programatically

Posted by Johannes Zillmann <jz...@googlemail.com>.
Ok, 

found it. The 
	edgeBuilder.setAdditionalConfiguration(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true”); 
does work for me!

So let me describe my use case a little bit...
Basically i have one Configuration like object on the client side. This is assembled by multiple sources and the only way a user can set custom Tez properties (do not use tez-site.xml in any perspective). 
Then i’m building my DAG with its vertices and edges programatically. 
Now, do you have any recommendation for me how to route the right Tez properties effectively to the corresponding Tez components ? (with tez components i mean like vertex properties, dag properties, AM properties, edge properties, etc..)

Should i simply set all tez.* properties to any component or is there a smarter way ?
And what components/properties might i’m missing ?

Any help appreciated!
Johannes


> On 14 Sep 2015, at 16:57, Johannes Zillmann <jz...@googlemail.com> wrote:
> 
> Hey guys,
> 
> question. How do i enabled tez.runtime.compress programatically ?
> When i set this property in the tez-site.xml it is picket up correctly.
> But all other options i tried:
> - dag.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true");
> - mapVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true");
> - reduceVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true”);
> 
> do not have any effect! (Checking the log output of the Shuffle class)
> 
> Johannes