You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Tom White <to...@cloudera.com> on 2010/04/29 18:41:04 UTC

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Hi Yuanyuan,

I think you've found a bug - could you file a JIRA issue for this please?

Thanks,
Tom

On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian <yt...@us.ibm.com> wrote:
>
>
> I have a problem in getting the input file name in the mapper  when uisng
> MultipleInputs. I need to use MultipleInputs to support different formats
> for my inputs to the my MapReduce job. And inside each mapper, I also need
> to know the exact input file that the mapper is processing. However,
> conf.get("map.input.file") returns null. Can anybody help me solve this
> problem? Thanks in advance.
>
> public class Test extends Configured implements Tool{
>
>        static class InnerMapper extends MapReduceBase implements
> Mapper<Writable, Writable, NullWritable, Text>
>        {
>                ................
>                ................
>
>                public void configure(JobConf conf)
>                {
>                        String inputName=conf.get("map.input.file"));
>                        .......................................
>                }
>
>        }
>
>        public int run(String[] arg0) throws Exception {
>                JonConf job;
>                job = new JobConf(Test.class);
>                ...........................................
>
>                MultipleInputs.addInputPath(conf, new Path("A"),
> TextInputFormat.class);
>                MultipleInputs.addInputPath(conf, new Path("B"),
> SequenceFileFormat.class);
>                ...........................................
>        }
> }
>
> Yuanyuan

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Posted by Yuanyuan Tian <yt...@us.ibm.com>.
Hi Farhan,

I believe I have to use the old JobConf MapReduce interface in order to
user MultipleInputs. As a result, I cannot do as you suggested.

Yuanyuan


|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Farhan Husain <fa...@csebuet.org>                                                                                                         |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |common-user@hadoop.apache.org                                                                                                                     |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |04/30/2010 11:46 PM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: conf.get("map.input.file") returns null when using MultipleInputs 	in Hadoop 0.20                                                            |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





Can you try the following code?

((FileSplit) context.getInputSplit()).getPath().getName()

Thanks,
Farhan

On Thu, Apr 29, 2010 at 12:46 PM, Tom White <to...@cloudera.com> wrote:

> Hi Yuanyuan,
>
> Thanks for filing an issue. To work around the issue could you use a
> regular FileInputFormat in a set of map-only jobs (which can read the
> input file names) so you can create a common input for a final MR job?
> This is admittedly less efficient since it needs more jobs.
>
> Cheers,
> Tom
>
> On Thu, Apr 29, 2010 at 10:37 AM, Yuanyuan Tian <yt...@us.ibm.com> wrote:
> >
> > Hi Tom,
> >
> > I have file a JIRA ticket (MAPREDUCE-1743) for this issue. At the mean
> time, can you suggest an alternative approach to achieve what I want
> (supporting different input formats and get the input file name in each
> mapper)?
> >
> > Yuanyuan
> >
> > Tom White ---04/29/2010 09:42:44 AM---Hi Yuanyuan, I think you've found
a
> bug - could you file a JIRA issue for this please?
> >
> >
> > From:
> > Tom White <to...@cloudera.com>
> > To:
> > common-user@hadoop.apache.org
> > Date:
> > 04/29/2010 09:42 AM
> > Subject:
> > Re: conf.get("map.input.file") returns null when using MultipleInputs
in
> Hadoop 0.20
> > ________________________________
> >
> >
> > Hi Yuanyuan,
> >
> > I think you've found a bug - could you file a JIRA issue for this
please?
> >
> > Thanks,
> > Tom
> >
> > On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian <yt...@us.ibm.com>
> wrote:
> > >
> > >
> > > I have a problem in getting the input file name in the mapper  when
> uisng
> > > MultipleInputs. I need to use MultipleInputs to support different
> formats
> > > for my inputs to the my MapReduce job. And inside each mapper, I also
> need
> > > to know the exact input file that the mapper is processing. However,
> > > conf.get("map.input.file") returns null. Can anybody help me solve
this
> > > problem? Thanks in advance.
> > >
> > > public class Test extends Configured implements Tool{
> > >
> > >        static class InnerMapper extends MapReduceBase implements
> > > Mapper<Writable, Writable, NullWritable, Text>
> > >        {
> > >                ................
> > >                ................
> > >
> > >                public void configure(JobConf conf)
> > >                {
> > >                        String inputName=conf.get("map.input.file"));
> > >                        .......................................
> > >                }
> > >
> > >        }
> > >
> > >        public int run(String[] arg0) throws Exception {
> > >                JonConf job;
> > >                job = new JobConf(Test.class);
> > >                ...........................................
> > >
> > >                MultipleInputs.addInputPath(conf, new Path("A"),
> > > TextInputFormat.class);
> > >                MultipleInputs.addInputPath(conf, new Path("B"),
> > > SequenceFileFormat.class);
> > >                ...........................................
> > >        }
> > > }
> > >
> > > Yuanyuan
> >
> >
>


Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Posted by Farhan Husain <fa...@csebuet.org>.
Can you try the following code?

((FileSplit) context.getInputSplit()).getPath().getName()

Thanks,
Farhan

On Thu, Apr 29, 2010 at 12:46 PM, Tom White <to...@cloudera.com> wrote:

> Hi Yuanyuan,
>
> Thanks for filing an issue. To work around the issue could you use a
> regular FileInputFormat in a set of map-only jobs (which can read the
> input file names) so you can create a common input for a final MR job?
> This is admittedly less efficient since it needs more jobs.
>
> Cheers,
> Tom
>
> On Thu, Apr 29, 2010 at 10:37 AM, Yuanyuan Tian <yt...@us.ibm.com> wrote:
> >
> > Hi Tom,
> >
> > I have file a JIRA ticket (MAPREDUCE-1743) for this issue. At the mean
> time, can you suggest an alternative approach to achieve what I want
> (supporting different input formats and get the input file name in each
> mapper)?
> >
> > Yuanyuan
> >
> > Tom White ---04/29/2010 09:42:44 AM---Hi Yuanyuan, I think you've found a
> bug - could you file a JIRA issue for this please?
> >
> >
> > From:
> > Tom White <to...@cloudera.com>
> > To:
> > common-user@hadoop.apache.org
> > Date:
> > 04/29/2010 09:42 AM
> > Subject:
> > Re: conf.get("map.input.file") returns null when using MultipleInputs in
> Hadoop 0.20
> > ________________________________
> >
> >
> > Hi Yuanyuan,
> >
> > I think you've found a bug - could you file a JIRA issue for this please?
> >
> > Thanks,
> > Tom
> >
> > On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian <yt...@us.ibm.com>
> wrote:
> > >
> > >
> > > I have a problem in getting the input file name in the mapper  when
> uisng
> > > MultipleInputs. I need to use MultipleInputs to support different
> formats
> > > for my inputs to the my MapReduce job. And inside each mapper, I also
> need
> > > to know the exact input file that the mapper is processing. However,
> > > conf.get("map.input.file") returns null. Can anybody help me solve this
> > > problem? Thanks in advance.
> > >
> > > public class Test extends Configured implements Tool{
> > >
> > >        static class InnerMapper extends MapReduceBase implements
> > > Mapper<Writable, Writable, NullWritable, Text>
> > >        {
> > >                ................
> > >                ................
> > >
> > >                public void configure(JobConf conf)
> > >                {
> > >                        String inputName=conf.get("map.input.file"));
> > >                        .......................................
> > >                }
> > >
> > >        }
> > >
> > >        public int run(String[] arg0) throws Exception {
> > >                JonConf job;
> > >                job = new JobConf(Test.class);
> > >                ...........................................
> > >
> > >                MultipleInputs.addInputPath(conf, new Path("A"),
> > > TextInputFormat.class);
> > >                MultipleInputs.addInputPath(conf, new Path("B"),
> > > SequenceFileFormat.class);
> > >                ...........................................
> > >        }
> > > }
> > >
> > > Yuanyuan
> >
> >
>

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Posted by Tom White <to...@cloudera.com>.
Hi Yuanyuan,

Thanks for filing an issue. To work around the issue could you use a
regular FileInputFormat in a set of map-only jobs (which can read the
input file names) so you can create a common input for a final MR job?
This is admittedly less efficient since it needs more jobs.

Cheers,
Tom

On Thu, Apr 29, 2010 at 10:37 AM, Yuanyuan Tian <yt...@us.ibm.com> wrote:
>
> Hi Tom,
>
> I have file a JIRA ticket (MAPREDUCE-1743) for this issue. At the mean time, can you suggest an alternative approach to achieve what I want (supporting different input formats and get the input file name in each mapper)?
>
> Yuanyuan
>
> Tom White ---04/29/2010 09:42:44 AM---Hi Yuanyuan, I think you've found a bug - could you file a JIRA issue for this please?
>
>
> From:
> Tom White <to...@cloudera.com>
> To:
> common-user@hadoop.apache.org
> Date:
> 04/29/2010 09:42 AM
> Subject:
> Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20
> ________________________________
>
>
> Hi Yuanyuan,
>
> I think you've found a bug - could you file a JIRA issue for this please?
>
> Thanks,
> Tom
>
> On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian <yt...@us.ibm.com> wrote:
> >
> >
> > I have a problem in getting the input file name in the mapper  when uisng
> > MultipleInputs. I need to use MultipleInputs to support different formats
> > for my inputs to the my MapReduce job. And inside each mapper, I also need
> > to know the exact input file that the mapper is processing. However,
> > conf.get("map.input.file") returns null. Can anybody help me solve this
> > problem? Thanks in advance.
> >
> > public class Test extends Configured implements Tool{
> >
> >        static class InnerMapper extends MapReduceBase implements
> > Mapper<Writable, Writable, NullWritable, Text>
> >        {
> >                ................
> >                ................
> >
> >                public void configure(JobConf conf)
> >                {
> >                        String inputName=conf.get("map.input.file"));
> >                        .......................................
> >                }
> >
> >        }
> >
> >        public int run(String[] arg0) throws Exception {
> >                JonConf job;
> >                job = new JobConf(Test.class);
> >                ...........................................
> >
> >                MultipleInputs.addInputPath(conf, new Path("A"),
> > TextInputFormat.class);
> >                MultipleInputs.addInputPath(conf, new Path("B"),
> > SequenceFileFormat.class);
> >                ...........................................
> >        }
> > }
> >
> > Yuanyuan
>
>

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Posted by Yuanyuan Tian <yt...@us.ibm.com>.
Hi Tom,

I have file a JIRA ticket (MAPREDUCE-1743) for this issue. At the mean
time, can you suggest an alternative approach to achieve what I want
(supporting different input formats and get the input file name in each
mapper)?

Yuanyuan


|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Tom White <to...@cloudera.com>                                                                                                                      |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |common-user@hadoop.apache.org                                                                                                                     |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |04/29/2010 09:42 AM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: conf.get("map.input.file") returns null when using MultipleInputs 	in Hadoop 0.20                                                            |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





Hi Yuanyuan,

I think you've found a bug - could you file a JIRA issue for this please?

Thanks,
Tom

On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian <yt...@us.ibm.com> wrote:
>
>
> I have a problem in getting the input file name in the mapper  when uisng
> MultipleInputs. I need to use MultipleInputs to support different formats
> for my inputs to the my MapReduce job. And inside each mapper, I also
need
> to know the exact input file that the mapper is processing. However,
> conf.get("map.input.file") returns null. Can anybody help me solve this
> problem? Thanks in advance.
>
> public class Test extends Configured implements Tool{
>
>        static class InnerMapper extends MapReduceBase implements
> Mapper<Writable, Writable, NullWritable, Text>
>        {
>                ................
>                ................
>
>                public void configure(JobConf conf)
>                {
>                        String inputName=conf.get("map.input.file"));
>                        .......................................
>                }
>
>        }
>
>        public int run(String[] arg0) throws Exception {
>                JonConf job;
>                job = new JobConf(Test.class);
>                ...........................................
>
>                MultipleInputs.addInputPath(conf, new Path("A"),
> TextInputFormat.class);
>                MultipleInputs.addInputPath(conf, new Path("B"),
> SequenceFileFormat.class);
>                ...........................................
>        }
> }
>
> Yuanyuan