You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Yuanyuan Tian <yt...@us.ibm.com> on 2010/04/29 08:04:23 UTC

conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20


I have a problem in getting the input file name in the mapper  when uisng
MultipleInputs. I need to use MultipleInputs to support different formats
for my inputs to the my MapReduce job. And inside each mapper, I also need
to know the exact input file that the mapper is processing. However,
conf.get("map.input.file") returns null. Can anybody help me solve this
problem? Thanks in advance.

public class Test extends Configured implements Tool{

	static class InnerMapper extends MapReduceBase implements
Mapper<Writable, Writable, NullWritable, Text>
	{
		................
		................

		public void configure(JobConf conf)
		{
			String inputName=conf.get("map.input.file"));
			.......................................
		}

	}

	public int run(String[] arg0) throws Exception {
		JonConf job;
		job = new JobConf(Test.class);
		...........................................

		MultipleInputs.addInputPath(conf, new Path("A"),
TextInputFormat.class);
		MultipleInputs.addInputPath(conf, new Path("B"),
SequenceFileFormat.class);
		...........................................
	}
}

Yuanyuan

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Posted by Yuanyuan Tian <yt...@us.ibm.com>.
Hi Farhan,

I believe I have to use the old JobConf MapReduce interface in order to
user MultipleInputs. As a result, I cannot do as you suggested.

Yuanyuan


|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Farhan Husain <fa...@csebuet.org>                                                                                                         |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |common-user@hadoop.apache.org                                                                                                                     |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |04/30/2010 11:46 PM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: conf.get("map.input.file") returns null when using MultipleInputs 	in Hadoop 0.20                                                            |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





Can you try the following code?

((FileSplit) context.getInputSplit()).getPath().getName()

Thanks,
Farhan

On Thu, Apr 29, 2010 at 12:46 PM, Tom White <to...@cloudera.com> wrote:

> Hi Yuanyuan,
>
> Thanks for filing an issue. To work around the issue could you use a
> regular FileInputFormat in a set of map-only jobs (which can read the
> input file names) so you can create a common input for a final MR job?
> This is admittedly less efficient since it needs more jobs.
>
> Cheers,
> Tom
>
> On Thu, Apr 29, 2010 at 10:37 AM, Yuanyuan Tian <yt...@us.ibm.com> wrote:
> >
> > Hi Tom,
> >
> > I have file a JIRA ticket (MAPREDUCE-1743) for this issue. At the mean
> time, can you suggest an alternative approach to achieve what I want
> (supporting different input formats and get the input file name in each
> mapper)?
> >
> > Yuanyuan
> >
> > Tom White ---04/29/2010 09:42:44 AM---Hi Yuanyuan, I think you've found
a
> bug - could you file a JIRA issue for this please?
> >
> >
> > From:
> > Tom White <to...@cloudera.com>
> > To:
> > common-user@hadoop.apache.org
> > Date:
> > 04/29/2010 09:42 AM
> > Subject:
> > Re: conf.get("map.input.file") returns null when using MultipleInputs
in
> Hadoop 0.20
> > ________________________________
> >
> >
> > Hi Yuanyuan,
> >
> > I think you've found a bug - could you file a JIRA issue for this
please?
> >
> > Thanks,
> > Tom
> >
> > On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian <yt...@us.ibm.com>
> wrote:
> > >
> > >
> > > I have a problem in getting the input file name in the mapper  when
> uisng
> > > MultipleInputs. I need to use MultipleInputs to support different
> formats
> > > for my inputs to the my MapReduce job. And inside each mapper, I also
> need
> > > to know the exact input file that the mapper is processing. However,
> > > conf.get("map.input.file") returns null. Can anybody help me solve
this
> > > problem? Thanks in advance.
> > >
> > > public class Test extends Configured implements Tool{
> > >
> > >        static class InnerMapper extends MapReduceBase implements
> > > Mapper<Writable, Writable, NullWritable, Text>
> > >        {
> > >                ................
> > >                ................
> > >
> > >                public void configure(JobConf conf)
> > >                {
> > >                        String inputName=conf.get("map.input.file"));
> > >                        .......................................
> > >                }
> > >
> > >        }
> > >
> > >        public int run(String[] arg0) throws Exception {
> > >                JonConf job;
> > >                job = new JobConf(Test.class);
> > >                ...........................................
> > >
> > >                MultipleInputs.addInputPath(conf, new Path("A"),
> > > TextInputFormat.class);
> > >                MultipleInputs.addInputPath(conf, new Path("B"),
> > > SequenceFileFormat.class);
> > >                ...........................................
> > >        }
> > > }
> > >
> > > Yuanyuan
> >
> >
>


Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Posted by Farhan Husain <fa...@csebuet.org>.
Can you try the following code?

((FileSplit) context.getInputSplit()).getPath().getName()

Thanks,
Farhan

On Thu, Apr 29, 2010 at 12:46 PM, Tom White <to...@cloudera.com> wrote:

> Hi Yuanyuan,
>
> Thanks for filing an issue. To work around the issue could you use a
> regular FileInputFormat in a set of map-only jobs (which can read the
> input file names) so you can create a common input for a final MR job?
> This is admittedly less efficient since it needs more jobs.
>
> Cheers,
> Tom
>
> On Thu, Apr 29, 2010 at 10:37 AM, Yuanyuan Tian <yt...@us.ibm.com> wrote:
> >
> > Hi Tom,
> >
> > I have file a JIRA ticket (MAPREDUCE-1743) for this issue. At the mean
> time, can you suggest an alternative approach to achieve what I want
> (supporting different input formats and get the input file name in each
> mapper)?
> >
> > Yuanyuan
> >
> > Tom White ---04/29/2010 09:42:44 AM---Hi Yuanyuan, I think you've found a
> bug - could you file a JIRA issue for this please?
> >
> >
> > From:
> > Tom White <to...@cloudera.com>
> > To:
> > common-user@hadoop.apache.org
> > Date:
> > 04/29/2010 09:42 AM
> > Subject:
> > Re: conf.get("map.input.file") returns null when using MultipleInputs in
> Hadoop 0.20
> > ________________________________
> >
> >
> > Hi Yuanyuan,
> >
> > I think you've found a bug - could you file a JIRA issue for this please?
> >
> > Thanks,
> > Tom
> >
> > On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian <yt...@us.ibm.com>
> wrote:
> > >
> > >
> > > I have a problem in getting the input file name in the mapper  when
> uisng
> > > MultipleInputs. I need to use MultipleInputs to support different
> formats
> > > for my inputs to the my MapReduce job. And inside each mapper, I also
> need
> > > to know the exact input file that the mapper is processing. However,
> > > conf.get("map.input.file") returns null. Can anybody help me solve this
> > > problem? Thanks in advance.
> > >
> > > public class Test extends Configured implements Tool{
> > >
> > >        static class InnerMapper extends MapReduceBase implements
> > > Mapper<Writable, Writable, NullWritable, Text>
> > >        {
> > >                ................
> > >                ................
> > >
> > >                public void configure(JobConf conf)
> > >                {
> > >                        String inputName=conf.get("map.input.file"));
> > >                        .......................................
> > >                }
> > >
> > >        }
> > >
> > >        public int run(String[] arg0) throws Exception {
> > >                JonConf job;
> > >                job = new JobConf(Test.class);
> > >                ...........................................
> > >
> > >                MultipleInputs.addInputPath(conf, new Path("A"),
> > > TextInputFormat.class);
> > >                MultipleInputs.addInputPath(conf, new Path("B"),
> > > SequenceFileFormat.class);
> > >                ...........................................
> > >        }
> > > }
> > >
> > > Yuanyuan
> >
> >
>

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Posted by Tom White <to...@cloudera.com>.
Hi Yuanyuan,

Thanks for filing an issue. To work around the issue could you use a
regular FileInputFormat in a set of map-only jobs (which can read the
input file names) so you can create a common input for a final MR job?
This is admittedly less efficient since it needs more jobs.

Cheers,
Tom

On Thu, Apr 29, 2010 at 10:37 AM, Yuanyuan Tian <yt...@us.ibm.com> wrote:
>
> Hi Tom,
>
> I have file a JIRA ticket (MAPREDUCE-1743) for this issue. At the mean time, can you suggest an alternative approach to achieve what I want (supporting different input formats and get the input file name in each mapper)?
>
> Yuanyuan
>
> Tom White ---04/29/2010 09:42:44 AM---Hi Yuanyuan, I think you've found a bug - could you file a JIRA issue for this please?
>
>
> From:
> Tom White <to...@cloudera.com>
> To:
> common-user@hadoop.apache.org
> Date:
> 04/29/2010 09:42 AM
> Subject:
> Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20
> ________________________________
>
>
> Hi Yuanyuan,
>
> I think you've found a bug - could you file a JIRA issue for this please?
>
> Thanks,
> Tom
>
> On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian <yt...@us.ibm.com> wrote:
> >
> >
> > I have a problem in getting the input file name in the mapper  when uisng
> > MultipleInputs. I need to use MultipleInputs to support different formats
> > for my inputs to the my MapReduce job. And inside each mapper, I also need
> > to know the exact input file that the mapper is processing. However,
> > conf.get("map.input.file") returns null. Can anybody help me solve this
> > problem? Thanks in advance.
> >
> > public class Test extends Configured implements Tool{
> >
> >        static class InnerMapper extends MapReduceBase implements
> > Mapper<Writable, Writable, NullWritable, Text>
> >        {
> >                ................
> >                ................
> >
> >                public void configure(JobConf conf)
> >                {
> >                        String inputName=conf.get("map.input.file"));
> >                        .......................................
> >                }
> >
> >        }
> >
> >        public int run(String[] arg0) throws Exception {
> >                JonConf job;
> >                job = new JobConf(Test.class);
> >                ...........................................
> >
> >                MultipleInputs.addInputPath(conf, new Path("A"),
> > TextInputFormat.class);
> >                MultipleInputs.addInputPath(conf, new Path("B"),
> > SequenceFileFormat.class);
> >                ...........................................
> >        }
> > }
> >
> > Yuanyuan
>
>

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Posted by Yuanyuan Tian <yt...@us.ibm.com>.
Hi Tom,

I have file a JIRA ticket (MAPREDUCE-1743) for this issue. At the mean
time, can you suggest an alternative approach to achieve what I want
(supporting different input formats and get the input file name in each
mapper)?

Yuanyuan


|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Tom White <to...@cloudera.com>                                                                                                                      |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |common-user@hadoop.apache.org                                                                                                                     |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |04/29/2010 09:42 AM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: conf.get("map.input.file") returns null when using MultipleInputs 	in Hadoop 0.20                                                            |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





Hi Yuanyuan,

I think you've found a bug - could you file a JIRA issue for this please?

Thanks,
Tom

On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian <yt...@us.ibm.com> wrote:
>
>
> I have a problem in getting the input file name in the mapper  when uisng
> MultipleInputs. I need to use MultipleInputs to support different formats
> for my inputs to the my MapReduce job. And inside each mapper, I also
need
> to know the exact input file that the mapper is processing. However,
> conf.get("map.input.file") returns null. Can anybody help me solve this
> problem? Thanks in advance.
>
> public class Test extends Configured implements Tool{
>
>        static class InnerMapper extends MapReduceBase implements
> Mapper<Writable, Writable, NullWritable, Text>
>        {
>                ................
>                ................
>
>                public void configure(JobConf conf)
>                {
>                        String inputName=conf.get("map.input.file"));
>                        .......................................
>                }
>
>        }
>
>        public int run(String[] arg0) throws Exception {
>                JonConf job;
>                job = new JobConf(Test.class);
>                ...........................................
>
>                MultipleInputs.addInputPath(conf, new Path("A"),
> TextInputFormat.class);
>                MultipleInputs.addInputPath(conf, new Path("B"),
> SequenceFileFormat.class);
>                ...........................................
>        }
> }
>
> Yuanyuan


Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Posted by Tom White <to...@cloudera.com>.
Hi Yuanyuan,

I think you've found a bug - could you file a JIRA issue for this please?

Thanks,
Tom

On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian <yt...@us.ibm.com> wrote:
>
>
> I have a problem in getting the input file name in the mapper  when uisng
> MultipleInputs. I need to use MultipleInputs to support different formats
> for my inputs to the my MapReduce job. And inside each mapper, I also need
> to know the exact input file that the mapper is processing. However,
> conf.get("map.input.file") returns null. Can anybody help me solve this
> problem? Thanks in advance.
>
> public class Test extends Configured implements Tool{
>
>        static class InnerMapper extends MapReduceBase implements
> Mapper<Writable, Writable, NullWritable, Text>
>        {
>                ................
>                ................
>
>                public void configure(JobConf conf)
>                {
>                        String inputName=conf.get("map.input.file"));
>                        .......................................
>                }
>
>        }
>
>        public int run(String[] arg0) throws Exception {
>                JonConf job;
>                job = new JobConf(Test.class);
>                ...........................................
>
>                MultipleInputs.addInputPath(conf, new Path("A"),
> TextInputFormat.class);
>                MultipleInputs.addInputPath(conf, new Path("B"),
> SequenceFileFormat.class);
>                ...........................................
>        }
> }
>
> Yuanyuan