You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by Subroto Sanyal <sa...@gmail.com> on 2014/06/06 15:53:52 UTC

Tez configuration initialization ignoring JobConfigurable

Hi,

Tez has utility which created Configuration object from the payload:

TezUtils.createConfFromUserPayload(byte[] payload); this method returns a
Configuration object even though the serialized byte[] can be of type
JobConf.


Once we get the Configuration we try to  create few object using
ReflectionUtil.newInstance(class, conf). ReflectionUtil.newInstance makes a
check whether the conf is instance of "org.apache.hadoop.mapred.JobConf"
and accordingly invokes the "configure" method.


This behavior is not working  anymore in Tez scenario. One simple scenario
when user defines a custom "RawComparator" and makes it "JobConfigurable"
but, org.apache.tez.runtime.library.common.sort.impl.ExternalSorter doesn't
care if the configuration could be instance of "org.apache.hadoop.mapred.
JobConf"
Please let me know if there is a problem with Tez or there exist lack of my
understanding about how objects should be created in Tez  :-)

-- 
Cheers,
*Subroto Sanyal*

Re: Tez configuration initialization ignoring JobConfigurable

Posted by Subroto Sanyal <sa...@gmail.com>.
Hi Sid,

I agree with you on "Not very sure
we want to support MR constructs like JobConfigurable for this section of
the runtime, if we can avoid it"
Definitely it would be good idea if we can come out of MR constructs
completely but, I am sure there will be many applications already built
which use such MR construct.

Using Configurable will solve the problem.
I have raised a Sub-Task for TEZ-1198:
https://issues.apache.org/jira/browse/TEZ-1200

Thanks for your inputs and suggestions.

Cheers,
Subroto Sanyal


On Tue, Jun 10, 2014 at 1:49 AM, Siddharth Seth <ss...@apache.org> wrote:

> Subroto,
> I'm guessing you already have a Comparator in place which makes use of
> JobConfigurable ?
> In terms of member fields, there isn't a lot of difference between
> Configuration and JobConf. JobConf primarily offers methods to look up the
> Configuration. In terms of serialization, they're the same.
> For things like Sort and Shuffle (which is where the comparators are being
> used), we've tried to remove direct MapReduce dependencies. Not very sure
> we want to support MR constructs like JobConfigurable for this section of
> the runtime, if we can avoid it. That said, I just filed a jira to track
> incompatible changes when using yarn-tez as the framework - TEZ-1198, could
> you please file this issue as a sub-task of this.
>
> A temporary workaround, if changing your comparator is an option, would be
> to use Configurable - and check / create a JobConf based on how it's
> configured.
>
> Thanks
> - Sid
>
>
> On Fri, Jun 6, 2014 at 9:42 PM, Subroto Sanyal <sa...@gmail.com>
> wrote:
>
> > Hi Hitesh,
> >
> > Thanks for your inputs.
> > I would like to follow the approach mentioned in the trailing mail;
> > provided the code/processor implementation is done by non-Tez code.
> > But, how about the code which Tez provides; as I mentioned
> > the
> >
> org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.ExternalSorter(TezOutputContext,
> > Configuration, int, long) gets its configuration
> > from org.apache.tez.runtime.library.output.OnFileSortedOutput which
> > generates the conf using:
> >
> > this.conf =
> > TezUtils.createConfFromUserPayload(getContext().getUserPayload());
> >
> > This conf is finally used to create the comparator:
> >
> > comparator = ConfigUtils.getIntermediateOutputKeyComparator(this.conf);
> >
> >
> > Please let me know how this can be fixed? Do we need to change
> > org.apache.tez.runtime.library.output.OnFileSortedOutput or their exist
> > some workaround ?
> >
> >
> > On Fri, Jun 6, 2014 at 10:58 PM, Hitesh Shah <hi...@apache.org> wrote:
> >
> > > Most of the MR compat layer code in Tez does something like the
> > following:
> > >
> > >     byte[] userPayload = context.getUserPayload();
> > >     Configuration conf =
> TezUtils.createConfFromUserPayload(userPayload);
> > >     if (conf instanceof JobConf) {
> > >       this.jobConf = (JobConf)conf;
> > >     } else {
> > >       this.jobConf = new JobConf(conf);
> > >     }
> > >
> > > Some of the above should probably be fixed given that the deserialized
> > > payload currently cannot be an instance of JobConf but the above should
> > > give you an idea as to what is being done. If you look into
> > > ReduceProcessor, you will see the comparator being initialized
> > > using ConfigUtils::getInputKeySecondaryGroupingComparator() and it will
> > > always be passed an instance of JobConf.
> > >
> > > Let me know if you are following the above approach or if I am missing
> > > something which should be addressed in Tez.
> > >
> > > thanks
> > > — Hitesh
> > >
> > > On Jun 6, 2014, at 10:37 AM, Subroto Sanyal <sa...@gmail.com>
> > > wrote:
> > >
> > > Hi Hitesh,
> > >
> > > I am trying to build and execute a DAG similar to MR but, not exactly
> > > MR(have custom LogicalInput/Output and Processor implementation) which
> > > needs intermediate sorting and shuffling (configured via Edge)
> > > Lets say we have RawComparator class which looks like:
> > >
> > > public class CustomRawComparator implements RawComparator,
> > JobConfigurable
> > > {
> > >
> > > @Override
> > >
> > >    public void configure(JobConf conf) {
> > >
> > >      // some sort of init process
> > >
> > >       _comparator = blah blah blah
> > >
> > >    }
> > >
> > >    @Override
> > >
> > >    public int compare(Object o1, Object o2) {
> > >
> > >        return _comparator.compare(o1, o2);
> > >
> > >    }
> > >
> > >    @Override
> > >
> > >    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int
> > > l2) {
> > >
> > >        return _comparator.compare(b1, s1, l1, b2, s2, l2);
> > >
> > >    }
> > >
> > > }
> > >
> > >
> > > In my jobclient code I will write something like:
> > >
> > > jobConf.setOutputKeyComparatorClass(CustomRawComparator.class);
> > >
> > >
> > >
> > > On the cluster side (whatever be the framework say MRv1, MRv2 or MR on
> > Tez)
> > > one would expect to get an object fully configured when
> > >
> > > ReflectionUtil.newInstance(class, conf) is invoked.
> > >
> > > The above call is being used in "ExternalSorter" class but, instead of
> > > JobConf a Conf object is being passed.which doesn't allows the
> > "configure"
> > > method of the CustomRawComparator to be invoked. "ExternalSorter" is
> used
> > > in "OnFileSortedOutput" . TezUtils provides utility to provide
> > > Configuration but, not JobConf.
> > >
> > > I think there will other situation/scenario where this problem exist in
> > Tez
> > > code base.
> > >
> > >
> > > ** I patched the Tez-common so that TezUtils.createConfFromUserPayload
> > > returns a JobConf instead on Configuration which solves the problem(may
> > not
> > > be a good solution).
> > >
> > >
> > > On Fri, Jun 6, 2014 at 6:57 PM, Hitesh Shah <hi...@apache.org> wrote:
> > >
> > > Hi Subroto
> > >
> > > Could you provide some more context on what you are trying to do? Are
> you
> > > trying to run MR-on-Tez? or a native Tez job?
> > > If you could provide us with some code showing what you are trying to
> do,
> > > we can help further. There are probably some bugs in the MR
> compatibility
> > > that we may have not come across.
> > >
> > > thanks
> > > — Hitesh
> > >
> > >
> > > On Fri, Jun 6, 2014 at 6:53 AM, Subroto Sanyal <
> sanyalsubroto@gmail.com>
> > > wrote:
> > >
> > > Hi,
> > >
> > > Tez has utility which created Configuration object from the payload:
> > >
> > > TezUtils.createConfFromUserPayload(byte[] payload); this method
> returns a
> > > Configuration object even though the serialized byte[] can be of type
> > > JobConf.
> > >
> > >
> > > Once we get the Configuration we try to  create few object using
> > > ReflectionUtil.newInstance(class, conf). ReflectionUtil.newInstance
> > >
> > > makes a
> > >
> > > check whether the conf is instance of
> "org.apache.hadoop.mapred.JobConf"
> > > and accordingly invokes the "configure" method.
> > >
> > >
> > > This behavior is not working  anymore in Tez scenario. One simple
> > >
> > > scenario
> > >
> > > when user defines a custom "RawComparator" and makes it
> "JobConfigurable"
> > > but, org.apache.tez.runtime.library.common.sort.impl.ExternalSorter
> > >
> > > doesn't
> > >
> > > care if the configuration could be instance of
> "org.apache.hadoop.mapred.
> > > JobConf"
> > > Please let me know if there is a problem with Tez or there exist lack
> of
> > >
> > > my
> > >
> > > understanding about how objects should be created in Tez  :-)
> > >
> > > --
> > > Cheers,
> > > *Subroto Sanyal*
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Cheers,
> > > *Subroto Sanyal*
> > >
> >
> >
> >
> > --
> > Cheers,
> > *Subroto Sanyal*
> >
>



-- 
Cheers,
*Subroto Sanyal*

Re: Tez configuration initialization ignoring JobConfigurable

Posted by Siddharth Seth <ss...@apache.org>.
Subroto,
I'm guessing you already have a Comparator in place which makes use of
JobConfigurable ?
In terms of member fields, there isn't a lot of difference between
Configuration and JobConf. JobConf primarily offers methods to look up the
Configuration. In terms of serialization, they're the same.
For things like Sort and Shuffle (which is where the comparators are being
used), we've tried to remove direct MapReduce dependencies. Not very sure
we want to support MR constructs like JobConfigurable for this section of
the runtime, if we can avoid it. That said, I just filed a jira to track
incompatible changes when using yarn-tez as the framework - TEZ-1198, could
you please file this issue as a sub-task of this.

A temporary workaround, if changing your comparator is an option, would be
to use Configurable - and check / create a JobConf based on how it's
configured.

Thanks
- Sid


On Fri, Jun 6, 2014 at 9:42 PM, Subroto Sanyal <sa...@gmail.com>
wrote:

> Hi Hitesh,
>
> Thanks for your inputs.
> I would like to follow the approach mentioned in the trailing mail;
> provided the code/processor implementation is done by non-Tez code.
> But, how about the code which Tez provides; as I mentioned
> the
> org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.ExternalSorter(TezOutputContext,
> Configuration, int, long) gets its configuration
> from org.apache.tez.runtime.library.output.OnFileSortedOutput which
> generates the conf using:
>
> this.conf =
> TezUtils.createConfFromUserPayload(getContext().getUserPayload());
>
> This conf is finally used to create the comparator:
>
> comparator = ConfigUtils.getIntermediateOutputKeyComparator(this.conf);
>
>
> Please let me know how this can be fixed? Do we need to change
> org.apache.tez.runtime.library.output.OnFileSortedOutput or their exist
> some workaround ?
>
>
> On Fri, Jun 6, 2014 at 10:58 PM, Hitesh Shah <hi...@apache.org> wrote:
>
> > Most of the MR compat layer code in Tez does something like the
> following:
> >
> >     byte[] userPayload = context.getUserPayload();
> >     Configuration conf = TezUtils.createConfFromUserPayload(userPayload);
> >     if (conf instanceof JobConf) {
> >       this.jobConf = (JobConf)conf;
> >     } else {
> >       this.jobConf = new JobConf(conf);
> >     }
> >
> > Some of the above should probably be fixed given that the deserialized
> > payload currently cannot be an instance of JobConf but the above should
> > give you an idea as to what is being done. If you look into
> > ReduceProcessor, you will see the comparator being initialized
> > using ConfigUtils::getInputKeySecondaryGroupingComparator() and it will
> > always be passed an instance of JobConf.
> >
> > Let me know if you are following the above approach or if I am missing
> > something which should be addressed in Tez.
> >
> > thanks
> > — Hitesh
> >
> > On Jun 6, 2014, at 10:37 AM, Subroto Sanyal <sa...@gmail.com>
> > wrote:
> >
> > Hi Hitesh,
> >
> > I am trying to build and execute a DAG similar to MR but, not exactly
> > MR(have custom LogicalInput/Output and Processor implementation) which
> > needs intermediate sorting and shuffling (configured via Edge)
> > Lets say we have RawComparator class which looks like:
> >
> > public class CustomRawComparator implements RawComparator,
> JobConfigurable
> > {
> >
> > @Override
> >
> >    public void configure(JobConf conf) {
> >
> >      // some sort of init process
> >
> >       _comparator = blah blah blah
> >
> >    }
> >
> >    @Override
> >
> >    public int compare(Object o1, Object o2) {
> >
> >        return _comparator.compare(o1, o2);
> >
> >    }
> >
> >    @Override
> >
> >    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int
> > l2) {
> >
> >        return _comparator.compare(b1, s1, l1, b2, s2, l2);
> >
> >    }
> >
> > }
> >
> >
> > In my jobclient code I will write something like:
> >
> > jobConf.setOutputKeyComparatorClass(CustomRawComparator.class);
> >
> >
> >
> > On the cluster side (whatever be the framework say MRv1, MRv2 or MR on
> Tez)
> > one would expect to get an object fully configured when
> >
> > ReflectionUtil.newInstance(class, conf) is invoked.
> >
> > The above call is being used in "ExternalSorter" class but, instead of
> > JobConf a Conf object is being passed.which doesn't allows the
> "configure"
> > method of the CustomRawComparator to be invoked. "ExternalSorter" is used
> > in "OnFileSortedOutput" . TezUtils provides utility to provide
> > Configuration but, not JobConf.
> >
> > I think there will other situation/scenario where this problem exist in
> Tez
> > code base.
> >
> >
> > ** I patched the Tez-common so that TezUtils.createConfFromUserPayload
> > returns a JobConf instead on Configuration which solves the problem(may
> not
> > be a good solution).
> >
> >
> > On Fri, Jun 6, 2014 at 6:57 PM, Hitesh Shah <hi...@apache.org> wrote:
> >
> > Hi Subroto
> >
> > Could you provide some more context on what you are trying to do? Are you
> > trying to run MR-on-Tez? or a native Tez job?
> > If you could provide us with some code showing what you are trying to do,
> > we can help further. There are probably some bugs in the MR compatibility
> > that we may have not come across.
> >
> > thanks
> > — Hitesh
> >
> >
> > On Fri, Jun 6, 2014 at 6:53 AM, Subroto Sanyal <sa...@gmail.com>
> > wrote:
> >
> > Hi,
> >
> > Tez has utility which created Configuration object from the payload:
> >
> > TezUtils.createConfFromUserPayload(byte[] payload); this method returns a
> > Configuration object even though the serialized byte[] can be of type
> > JobConf.
> >
> >
> > Once we get the Configuration we try to  create few object using
> > ReflectionUtil.newInstance(class, conf). ReflectionUtil.newInstance
> >
> > makes a
> >
> > check whether the conf is instance of "org.apache.hadoop.mapred.JobConf"
> > and accordingly invokes the "configure" method.
> >
> >
> > This behavior is not working  anymore in Tez scenario. One simple
> >
> > scenario
> >
> > when user defines a custom "RawComparator" and makes it "JobConfigurable"
> > but, org.apache.tez.runtime.library.common.sort.impl.ExternalSorter
> >
> > doesn't
> >
> > care if the configuration could be instance of "org.apache.hadoop.mapred.
> > JobConf"
> > Please let me know if there is a problem with Tez or there exist lack of
> >
> > my
> >
> > understanding about how objects should be created in Tez  :-)
> >
> > --
> > Cheers,
> > *Subroto Sanyal*
> >
> >
> >
> >
> >
> > --
> > Cheers,
> > *Subroto Sanyal*
> >
>
>
>
> --
> Cheers,
> *Subroto Sanyal*
>

Re: Tez configuration initialization ignoring JobConfigurable

Posted by Subroto Sanyal <sa...@gmail.com>.
Hi Hitesh,

Thanks for your inputs.
I would like to follow the approach mentioned in the trailing mail;
provided the code/processor implementation is done by non-Tez code.
But, how about the code which Tez provides; as I mentioned
the org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.ExternalSorter(TezOutputContext,
Configuration, int, long) gets its configuration
from org.apache.tez.runtime.library.output.OnFileSortedOutput which
generates the conf using:

this.conf =
TezUtils.createConfFromUserPayload(getContext().getUserPayload());

This conf is finally used to create the comparator:

comparator = ConfigUtils.getIntermediateOutputKeyComparator(this.conf);


Please let me know how this can be fixed? Do we need to change
org.apache.tez.runtime.library.output.OnFileSortedOutput or their exist
some workaround ?


On Fri, Jun 6, 2014 at 10:58 PM, Hitesh Shah <hi...@apache.org> wrote:

> Most of the MR compat layer code in Tez does something like the following:
>
>     byte[] userPayload = context.getUserPayload();
>     Configuration conf = TezUtils.createConfFromUserPayload(userPayload);
>     if (conf instanceof JobConf) {
>       this.jobConf = (JobConf)conf;
>     } else {
>       this.jobConf = new JobConf(conf);
>     }
>
> Some of the above should probably be fixed given that the deserialized
> payload currently cannot be an instance of JobConf but the above should
> give you an idea as to what is being done. If you look into
> ReduceProcessor, you will see the comparator being initialized
> using ConfigUtils::getInputKeySecondaryGroupingComparator() and it will
> always be passed an instance of JobConf.
>
> Let me know if you are following the above approach or if I am missing
> something which should be addressed in Tez.
>
> thanks
> — Hitesh
>
> On Jun 6, 2014, at 10:37 AM, Subroto Sanyal <sa...@gmail.com>
> wrote:
>
> Hi Hitesh,
>
> I am trying to build and execute a DAG similar to MR but, not exactly
> MR(have custom LogicalInput/Output and Processor implementation) which
> needs intermediate sorting and shuffling (configured via Edge)
> Lets say we have RawComparator class which looks like:
>
> public class CustomRawComparator implements RawComparator, JobConfigurable
> {
>
> @Override
>
>    public void configure(JobConf conf) {
>
>      // some sort of init process
>
>       _comparator = blah blah blah
>
>    }
>
>    @Override
>
>    public int compare(Object o1, Object o2) {
>
>        return _comparator.compare(o1, o2);
>
>    }
>
>    @Override
>
>    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int
> l2) {
>
>        return _comparator.compare(b1, s1, l1, b2, s2, l2);
>
>    }
>
> }
>
>
> In my jobclient code I will write something like:
>
> jobConf.setOutputKeyComparatorClass(CustomRawComparator.class);
>
>
>
> On the cluster side (whatever be the framework say MRv1, MRv2 or MR on Tez)
> one would expect to get an object fully configured when
>
> ReflectionUtil.newInstance(class, conf) is invoked.
>
> The above call is being used in "ExternalSorter" class but, instead of
> JobConf a Conf object is being passed.which doesn't allows the "configure"
> method of the CustomRawComparator to be invoked. "ExternalSorter" is used
> in "OnFileSortedOutput" . TezUtils provides utility to provide
> Configuration but, not JobConf.
>
> I think there will other situation/scenario where this problem exist in Tez
> code base.
>
>
> ** I patched the Tez-common so that TezUtils.createConfFromUserPayload
> returns a JobConf instead on Configuration which solves the problem(may not
> be a good solution).
>
>
> On Fri, Jun 6, 2014 at 6:57 PM, Hitesh Shah <hi...@apache.org> wrote:
>
> Hi Subroto
>
> Could you provide some more context on what you are trying to do? Are you
> trying to run MR-on-Tez? or a native Tez job?
> If you could provide us with some code showing what you are trying to do,
> we can help further. There are probably some bugs in the MR compatibility
> that we may have not come across.
>
> thanks
> — Hitesh
>
>
> On Fri, Jun 6, 2014 at 6:53 AM, Subroto Sanyal <sa...@gmail.com>
> wrote:
>
> Hi,
>
> Tez has utility which created Configuration object from the payload:
>
> TezUtils.createConfFromUserPayload(byte[] payload); this method returns a
> Configuration object even though the serialized byte[] can be of type
> JobConf.
>
>
> Once we get the Configuration we try to  create few object using
> ReflectionUtil.newInstance(class, conf). ReflectionUtil.newInstance
>
> makes a
>
> check whether the conf is instance of "org.apache.hadoop.mapred.JobConf"
> and accordingly invokes the "configure" method.
>
>
> This behavior is not working  anymore in Tez scenario. One simple
>
> scenario
>
> when user defines a custom "RawComparator" and makes it "JobConfigurable"
> but, org.apache.tez.runtime.library.common.sort.impl.ExternalSorter
>
> doesn't
>
> care if the configuration could be instance of "org.apache.hadoop.mapred.
> JobConf"
> Please let me know if there is a problem with Tez or there exist lack of
>
> my
>
> understanding about how objects should be created in Tez  :-)
>
> --
> Cheers,
> *Subroto Sanyal*
>
>
>
>
>
> --
> Cheers,
> *Subroto Sanyal*
>



-- 
Cheers,
*Subroto Sanyal*

Re: Tez configuration initialization ignoring JobConfigurable

Posted by Hitesh Shah <hi...@apache.org>.
Most of the MR compat layer code in Tez does something like the following:

    byte[] userPayload = context.getUserPayload();
    Configuration conf = TezUtils.createConfFromUserPayload(userPayload);
    if (conf instanceof JobConf) {
      this.jobConf = (JobConf)conf;
    } else {
      this.jobConf = new JobConf(conf);
    }

Some of the above should probably be fixed given that the deserialized
payload currently cannot be an instance of JobConf but the above should
give you an idea as to what is being done. If you look into
ReduceProcessor, you will see the comparator being initialized
using ConfigUtils::getInputKeySecondaryGroupingComparator() and it will
always be passed an instance of JobConf.

Let me know if you are following the above approach or if I am missing
something which should be addressed in Tez.

thanks
— Hitesh

On Jun 6, 2014, at 10:37 AM, Subroto Sanyal <sa...@gmail.com> wrote:

Hi Hitesh,

I am trying to build and execute a DAG similar to MR but, not exactly
MR(have custom LogicalInput/Output and Processor implementation) which
needs intermediate sorting and shuffling (configured via Edge)
Lets say we have RawComparator class which looks like:

public class CustomRawComparator implements RawComparator, JobConfigurable {

@Override

   public void configure(JobConf conf) {

     // some sort of init process

      _comparator = blah blah blah

   }

   @Override

   public int compare(Object o1, Object o2) {

       return _comparator.compare(o1, o2);

   }

   @Override

   public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int
l2) {

       return _comparator.compare(b1, s1, l1, b2, s2, l2);

   }

}


In my jobclient code I will write something like:

jobConf.setOutputKeyComparatorClass(CustomRawComparator.class);



On the cluster side (whatever be the framework say MRv1, MRv2 or MR on Tez)
one would expect to get an object fully configured when

ReflectionUtil.newInstance(class, conf) is invoked.

The above call is being used in "ExternalSorter" class but, instead of
JobConf a Conf object is being passed.which doesn't allows the "configure"
method of the CustomRawComparator to be invoked. "ExternalSorter" is used
in "OnFileSortedOutput" . TezUtils provides utility to provide
Configuration but, not JobConf.

I think there will other situation/scenario where this problem exist in Tez
code base.


** I patched the Tez-common so that TezUtils.createConfFromUserPayload
returns a JobConf instead on Configuration which solves the problem(may not
be a good solution).


On Fri, Jun 6, 2014 at 6:57 PM, Hitesh Shah <hi...@apache.org> wrote:

Hi Subroto

Could you provide some more context on what you are trying to do? Are you
trying to run MR-on-Tez? or a native Tez job?
If you could provide us with some code showing what you are trying to do,
we can help further. There are probably some bugs in the MR compatibility
that we may have not come across.

thanks
— Hitesh


On Fri, Jun 6, 2014 at 6:53 AM, Subroto Sanyal <sa...@gmail.com>
wrote:

Hi,

Tez has utility which created Configuration object from the payload:

TezUtils.createConfFromUserPayload(byte[] payload); this method returns a
Configuration object even though the serialized byte[] can be of type
JobConf.


Once we get the Configuration we try to  create few object using
ReflectionUtil.newInstance(class, conf). ReflectionUtil.newInstance

makes a

check whether the conf is instance of "org.apache.hadoop.mapred.JobConf"
and accordingly invokes the "configure" method.


This behavior is not working  anymore in Tez scenario. One simple

scenario

when user defines a custom "RawComparator" and makes it "JobConfigurable"
but, org.apache.tez.runtime.library.common.sort.impl.ExternalSorter

doesn't

care if the configuration could be instance of "org.apache.hadoop.mapred.
JobConf"
Please let me know if there is a problem with Tez or there exist lack of

my

understanding about how objects should be created in Tez  :-)

--
Cheers,
*Subroto Sanyal*





-- 
Cheers,
*Subroto Sanyal*

Re: Tez configuration initialization ignoring JobConfigurable

Posted by Subroto Sanyal <sa...@gmail.com>.
Hi Hitesh,

I am trying to build and execute a DAG similar to MR but, not exactly
MR(have custom LogicalInput/Output and Processor implementation) which
needs intermediate sorting and shuffling (configured via Edge)
Lets say we have RawComparator class which looks like:

public class CustomRawComparator implements RawComparator, JobConfigurable {

@Override

    public void configure(JobConf conf) {

      // some sort of init process

       _comparator = blah blah blah

    }

    @Override

    public int compare(Object o1, Object o2) {

        return _comparator.compare(o1, o2);

    }

    @Override

    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int
l2) {

        return _comparator.compare(b1, s1, l1, b2, s2, l2);

    }

}


In my jobclient code I will write something like:

jobConf.setOutputKeyComparatorClass(CustomRawComparator.class);



On the cluster side (whatever be the framework say MRv1, MRv2 or MR on Tez)
one would expect to get an object fully configured when

ReflectionUtil.newInstance(class, conf) is invoked.

The above call is being used in "ExternalSorter" class but, instead of
JobConf a Conf object is being passed.which doesn't allows the "configure"
method of the CustomRawComparator to be invoked. "ExternalSorter" is used
in "OnFileSortedOutput" . TezUtils provides utility to provide
Configuration but, not JobConf.

I think there will other situation/scenario where this problem exist in Tez
code base.


** I patched the Tez-common so that TezUtils.createConfFromUserPayload
returns a JobConf instead on Configuration which solves the problem(may not
be a good solution).


On Fri, Jun 6, 2014 at 6:57 PM, Hitesh Shah <hi...@apache.org> wrote:

> Hi Subroto
>
> Could you provide some more context on what you are trying to do? Are you
> trying to run MR-on-Tez? or a native Tez job?
> If you could provide us with some code showing what you are trying to do,
> we can help further. There are probably some bugs in the MR compatibility
> that we may have not come across.
>
> thanks
> — Hitesh
>
>
> On Fri, Jun 6, 2014 at 6:53 AM, Subroto Sanyal <sa...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Tez has utility which created Configuration object from the payload:
> >
> > TezUtils.createConfFromUserPayload(byte[] payload); this method returns a
> > Configuration object even though the serialized byte[] can be of type
> > JobConf.
> >
> >
> > Once we get the Configuration we try to  create few object using
> > ReflectionUtil.newInstance(class, conf). ReflectionUtil.newInstance
> makes a
> > check whether the conf is instance of "org.apache.hadoop.mapred.JobConf"
> > and accordingly invokes the "configure" method.
> >
> >
> > This behavior is not working  anymore in Tez scenario. One simple
> scenario
> > when user defines a custom "RawComparator" and makes it "JobConfigurable"
> > but, org.apache.tez.runtime.library.common.sort.impl.ExternalSorter
> doesn't
> > care if the configuration could be instance of "org.apache.hadoop.mapred.
> > JobConf"
> > Please let me know if there is a problem with Tez or there exist lack of
> my
> > understanding about how objects should be created in Tez  :-)
> >
> > --
> > Cheers,
> > *Subroto Sanyal*
> >
>



-- 
Cheers,
*Subroto Sanyal*

Re: Tez configuration initialization ignoring JobConfigurable

Posted by Hitesh Shah <hi...@apache.org>.
Hi Subroto

Could you provide some more context on what you are trying to do? Are you
trying to run MR-on-Tez? or a native Tez job?
If you could provide us with some code showing what you are trying to do,
we can help further. There are probably some bugs in the MR compatibility
that we may have not come across.

thanks
— Hitesh


On Fri, Jun 6, 2014 at 6:53 AM, Subroto Sanyal <sa...@gmail.com>
wrote:

> Hi,
>
> Tez has utility which created Configuration object from the payload:
>
> TezUtils.createConfFromUserPayload(byte[] payload); this method returns a
> Configuration object even though the serialized byte[] can be of type
> JobConf.
>
>
> Once we get the Configuration we try to  create few object using
> ReflectionUtil.newInstance(class, conf). ReflectionUtil.newInstance makes a
> check whether the conf is instance of "org.apache.hadoop.mapred.JobConf"
> and accordingly invokes the "configure" method.
>
>
> This behavior is not working  anymore in Tez scenario. One simple scenario
> when user defines a custom "RawComparator" and makes it "JobConfigurable"
> but, org.apache.tez.runtime.library.common.sort.impl.ExternalSorter doesn't
> care if the configuration could be instance of "org.apache.hadoop.mapred.
> JobConf"
> Please let me know if there is a problem with Tez or there exist lack of my
> understanding about how objects should be created in Tez  :-)
>
> --
> Cheers,
> *Subroto Sanyal*
>