You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2013/11/21 19:10:24 UTC

MultipleInputs.addInputPath

Hi,

  So, I have two different directories.. which i want to process
differently...
For which I have to mappers for the job..

Data1
Data2

and in my driver.. I add the following:
MultipleInputs.addInputPath(job, new Path( args[0]),
     TextInputFormat.class,
     Data1.class);


    MultipleInputs.addInputPath(job, new Path(args[1]),
     TextInputFormat.class,
     Data2.class);


But what I now want is to just select two files from it..

So.. usually this is how we would do this
FileInputFormat.addInputPaths(job,"Data1/part-00000,Data1/part-00000");

But.. how do i specify specific files in MultiInputs object.

Basically.. two mappers.. processing two different inputs... but I want to
specify which files in thsoe two directories to read for processing by
mappers.?
How do i do this in hadoop?

Re: MultipleInputs.addInputPath

Posted by Adam Kawa <ka...@gmail.com>.
Can not you specify such a file to process as Path in
MultipleInputs.addInputPath?

1) MultipleInputs.addInputPath(job, new Path(args[0] + "/part-00000"),
TextInputFormat.class, Data1.class)
or
2) MultipleInputs.addInputPath(job, new Path(args[0] +
"/part-0000{1-2,5,8-9}"), TextInputFormat.class, Data1.class) // I have not
tested that, but I guess that it should work.
or
3) MultipleInputs.addInputPath(job, new Path(args[0] + "/part-0000*"),
TextInputFormat.class, Data1.class) // I have not tested that, but I guess
that it should work.
or
4)
        String[] paths = {"path1", "pathA", "path-to-process"};
        for (String path: paths) {
             MultipleInputs.addInputPath(job, new Path(path),
TextInputFormat.class, Data1.class);
        }



2013/11/21 jamal sasha <ja...@gmail.com>

> Hi,
>
>   So, I have two different directories.. which i want to process
> differently...
> For which I have to mappers for the job..
>
> Data1
> Data2
>
> and in my driver.. I add the following:
> MultipleInputs.addInputPath(job, new Path( args[0]),
>      TextInputFormat.class,
>      Data1.class);
>
>
>     MultipleInputs.addInputPath(job, new Path(args[1]),
>      TextInputFormat.class,
>      Data2.class);
>
>
> But what I now want is to just select two files from it..
>
> So.. usually this is how we would do this
> FileInputFormat.addInputPaths(job,"Data1/part-00000,Data1/part-00000");
>
> But.. how do i specify specific files in MultiInputs object.
>
> Basically.. two mappers.. processing two different inputs... but I want to
> specify which files in thsoe two directories to read for processing by
> mappers.?
> How do i do this in hadoop?
>

Re: MultipleInputs.addInputPath

Posted by Adam Kawa <ka...@gmail.com>.
Can not you specify such a file to process as Path in
MultipleInputs.addInputPath?

1) MultipleInputs.addInputPath(job, new Path(args[0] + "/part-00000"),
TextInputFormat.class, Data1.class)
or
2) MultipleInputs.addInputPath(job, new Path(args[0] +
"/part-0000{1-2,5,8-9}"), TextInputFormat.class, Data1.class) // I have not
tested that, but I guess that it should work.
or
3) MultipleInputs.addInputPath(job, new Path(args[0] + "/part-0000*"),
TextInputFormat.class, Data1.class) // I have not tested that, but I guess
that it should work.
or
4)
        String[] paths = {"path1", "pathA", "path-to-process"};
        for (String path: paths) {
             MultipleInputs.addInputPath(job, new Path(path),
TextInputFormat.class, Data1.class);
        }



2013/11/21 jamal sasha <ja...@gmail.com>

> Hi,
>
>   So, I have two different directories.. which i want to process
> differently...
> For which I have to mappers for the job..
>
> Data1
> Data2
>
> and in my driver.. I add the following:
> MultipleInputs.addInputPath(job, new Path( args[0]),
>      TextInputFormat.class,
>      Data1.class);
>
>
>     MultipleInputs.addInputPath(job, new Path(args[1]),
>      TextInputFormat.class,
>      Data2.class);
>
>
> But what I now want is to just select two files from it..
>
> So.. usually this is how we would do this
> FileInputFormat.addInputPaths(job,"Data1/part-00000,Data1/part-00000");
>
> But.. how do i specify specific files in MultiInputs object.
>
> Basically.. two mappers.. processing two different inputs... but I want to
> specify which files in thsoe two directories to read for processing by
> mappers.?
> How do i do this in hadoop?
>

Re: MultipleInputs.addInputPath

Posted by Adam Kawa <ka...@gmail.com>.
Can not you specify such a file to process as Path in
MultipleInputs.addInputPath?

1) MultipleInputs.addInputPath(job, new Path(args[0] + "/part-00000"),
TextInputFormat.class, Data1.class)
or
2) MultipleInputs.addInputPath(job, new Path(args[0] +
"/part-0000{1-2,5,8-9}"), TextInputFormat.class, Data1.class) // I have not
tested that, but I guess that it should work.
or
3) MultipleInputs.addInputPath(job, new Path(args[0] + "/part-0000*"),
TextInputFormat.class, Data1.class) // I have not tested that, but I guess
that it should work.
or
4)
        String[] paths = {"path1", "pathA", "path-to-process"};
        for (String path: paths) {
             MultipleInputs.addInputPath(job, new Path(path),
TextInputFormat.class, Data1.class);
        }



2013/11/21 jamal sasha <ja...@gmail.com>

> Hi,
>
>   So, I have two different directories.. which i want to process
> differently...
> For which I have to mappers for the job..
>
> Data1
> Data2
>
> and in my driver.. I add the following:
> MultipleInputs.addInputPath(job, new Path( args[0]),
>      TextInputFormat.class,
>      Data1.class);
>
>
>     MultipleInputs.addInputPath(job, new Path(args[1]),
>      TextInputFormat.class,
>      Data2.class);
>
>
> But what I now want is to just select two files from it..
>
> So.. usually this is how we would do this
> FileInputFormat.addInputPaths(job,"Data1/part-00000,Data1/part-00000");
>
> But.. how do i specify specific files in MultiInputs object.
>
> Basically.. two mappers.. processing two different inputs... but I want to
> specify which files in thsoe two directories to read for processing by
> mappers.?
> How do i do this in hadoop?
>

Re: MultipleInputs.addInputPath

Posted by Adam Kawa <ka...@gmail.com>.
Can not you specify such a file to process as Path in
MultipleInputs.addInputPath?

1) MultipleInputs.addInputPath(job, new Path(args[0] + "/part-00000"),
TextInputFormat.class, Data1.class)
or
2) MultipleInputs.addInputPath(job, new Path(args[0] +
"/part-0000{1-2,5,8-9}"), TextInputFormat.class, Data1.class) // I have not
tested that, but I guess that it should work.
or
3) MultipleInputs.addInputPath(job, new Path(args[0] + "/part-0000*"),
TextInputFormat.class, Data1.class) // I have not tested that, but I guess
that it should work.
or
4)
        String[] paths = {"path1", "pathA", "path-to-process"};
        for (String path: paths) {
             MultipleInputs.addInputPath(job, new Path(path),
TextInputFormat.class, Data1.class);
        }



2013/11/21 jamal sasha <ja...@gmail.com>

> Hi,
>
>   So, I have two different directories.. which i want to process
> differently...
> For which I have to mappers for the job..
>
> Data1
> Data2
>
> and in my driver.. I add the following:
> MultipleInputs.addInputPath(job, new Path( args[0]),
>      TextInputFormat.class,
>      Data1.class);
>
>
>     MultipleInputs.addInputPath(job, new Path(args[1]),
>      TextInputFormat.class,
>      Data2.class);
>
>
> But what I now want is to just select two files from it..
>
> So.. usually this is how we would do this
> FileInputFormat.addInputPaths(job,"Data1/part-00000,Data1/part-00000");
>
> But.. how do i specify specific files in MultiInputs object.
>
> Basically.. two mappers.. processing two different inputs... but I want to
> specify which files in thsoe two directories to read for processing by
> mappers.?
> How do i do this in hadoop?
>