You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Andrew Rendle <an...@gmail.com> on 2012/03/02 15:14:52 UTC

Multiple Input Path config

Hi all

Has anyone found any pitfalls when setting up multiple  input mappers for
use with ooze?

Have you got an example workflow?

Thanks

Andrew Rendle

Re: Multiple Input Path config

Posted by Mona Chitnis <ch...@yahoo-inc.com>.
Hi Andrew,

Glad to know you could get it working by config change. The reason config related issues are not reported below DEBUG level is there can be numerous combinations of config settings – not necessarily wrong – but just unsuitable for a particular use-case like yours for example. If a use-case becomes a common occurrence, then we will made modifications.

Thanks for letting us know.

Mona


On 3/8/12 11:50 PM, "Andrew Rendle" <an...@gmail.com> wrote:

Hi Mona

I managed to get it working and of course it was a config omission.

I had missed setting the Delegate mapper and input format.

One thing though, the config was throwing an exception that was hidden with
the default log settings. I would have expected config errors to be
reported below debug.

Thanks

Andrew Rendle
 On Mar 6, 2012 6:38 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:

> Hi,
>
> Can you paste the configuration section of your workflow that sets the
> input paths and the classes required to handle your multiple input formats?
>
> --Mona
>
>
> On 3/6/12 9:04 AM, "Andrew Rendle" <an...@gmail.com> wrote:
>
> Hi Mona
>
> The problem I have is using multiple input paths with multiple mappers and
> input formats.
>
> It seems the workflow config is ignoring our setup, maybe throwing an
> exception, as the job config is missing the path, format, class entries.
>
> Any ideas?
>
> Andrew Rendle
>  On Mar 5, 2012 10:56 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
>
> >  Hi Andrew,
> >
> > Some options while specifying multiple HDFS input paths:
> >
> >
> >    - If using regex to specify them as comma separated values for the
> >    configuration property - “mapred.input.dir”, you should *escape the
> >    commas
> >    *
> >
> >            E.g. mapred.input.dir =
> > hdfs://path_to_directory/{subdir1\,subdir2\,subdir3}/file_name
> >
> >
> >    - You can write your own pig script to read from multiple input paths
> >
> >     E.g. <property>
> >     <name>input</name>
> >     <value>hdfs://path_to_directory/subdir1,
> > hdfs://path_to_directory/subdir2,
> hdfs://path_to_directory/subdir3</value>
> >     </property>
> >     ...
> >     <script>myscript.pig</script>
> >         <param>input=${input}</param>
> >
> >
> >    - You can write your own map function passed as property
> >    “mapred.mapper.class” that uses a custom delimiter to split multiple
> input
> >    paths among multiple mappers.
> >
> >
> > Above should be supported by Oozie. If you encounter any problems, please
> > provide corresponding details and I can help debug.
> >
> > Thanks,
> >
> > Mona
> >
> >
> > On 3/2/12 1:13 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
> >
> > Hi Andrew,
> >
> > I’m taking a look into providing an example workflow for this use-case.
> > Are there any specific errors you encountered?
> >
> > Also, what version of Oozie are you working with?
> >
> > --Mona
> >
> >
> > On 3/2/12 6:14 AM, "Andrew Rendle" <an...@gmail.com> wrote:
> >
> > Hi all
> >
> > Has anyone found any pitfalls when setting up multiple  input mappers for
> > use with ooze?
> >
> > Have you got an example workflow?
> >
> > Thanks
> >
> > Andrew Rendle
> >
> >
> > --
> > *mona
> > **chitnis
> > *software developer
> >
> > chitnis@yahoo-inc.com
> > direct 408-336-7908    mobile 864-650-0100
> >
> > 701 first avenue, sunnyvale, ca, 94089-0703, us
> > phone (408) 349 3300    fax (408) 349 3301
> >
> >
> >
>
>


Re: Multiple Input Path config

Posted by Andrew Rendle <an...@gmail.com>.
Hi Mona

I managed to get it working and of course it was a config omission.

I had missed setting the Delegate mapper and input format.

One thing though, the config was throwing an exception that was hidden with
the default log settings. I would have expected config errors to be
reported below debug.

Thanks

Andrew Rendle
 On Mar 6, 2012 6:38 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:

> Hi,
>
> Can you paste the configuration section of your workflow that sets the
> input paths and the classes required to handle your multiple input formats?
>
> --Mona
>
>
> On 3/6/12 9:04 AM, "Andrew Rendle" <an...@gmail.com> wrote:
>
> Hi Mona
>
> The problem I have is using multiple input paths with multiple mappers and
> input formats.
>
> It seems the workflow config is ignoring our setup, maybe throwing an
> exception, as the job config is missing the path, format, class entries.
>
> Any ideas?
>
> Andrew Rendle
>  On Mar 5, 2012 10:56 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
>
> >  Hi Andrew,
> >
> > Some options while specifying multiple HDFS input paths:
> >
> >
> >    - If using regex to specify them as comma separated values for the
> >    configuration property - “mapred.input.dir”, you should *escape the
> >    commas
> >    *
> >
> >            E.g. mapred.input.dir =
> > hdfs://path_to_directory/{subdir1\,subdir2\,subdir3}/file_name
> >
> >
> >    - You can write your own pig script to read from multiple input paths
> >
> >     E.g. <property>
> >     <name>input</name>
> >     <value>hdfs://path_to_directory/subdir1,
> > hdfs://path_to_directory/subdir2,
> hdfs://path_to_directory/subdir3</value>
> >     </property>
> >     ...
> >     <script>myscript.pig</script>
> >         <param>input=${input}</param>
> >
> >
> >    - You can write your own map function passed as property
> >    “mapred.mapper.class” that uses a custom delimiter to split multiple
> input
> >    paths among multiple mappers.
> >
> >
> > Above should be supported by Oozie. If you encounter any problems, please
> > provide corresponding details and I can help debug.
> >
> > Thanks,
> >
> > Mona
> >
> >
> > On 3/2/12 1:13 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
> >
> > Hi Andrew,
> >
> > I’m taking a look into providing an example workflow for this use-case.
> > Are there any specific errors you encountered?
> >
> > Also, what version of Oozie are you working with?
> >
> > --Mona
> >
> >
> > On 3/2/12 6:14 AM, "Andrew Rendle" <an...@gmail.com> wrote:
> >
> > Hi all
> >
> > Has anyone found any pitfalls when setting up multiple  input mappers for
> > use with ooze?
> >
> > Have you got an example workflow?
> >
> > Thanks
> >
> > Andrew Rendle
> >
> >
> > --
> > *mona
> > **chitnis
> > *software developer
> >
> > chitnis@yahoo-inc.com
> > direct 408-336-7908    mobile 864-650-0100
> >
> > 701 first avenue, sunnyvale, ca, 94089-0703, us
> > phone (408) 349 3300    fax (408) 349 3301
> >
> >
> >
>
>

Re: Multiple Input Path config

Posted by Mona Chitnis <ch...@yahoo-inc.com>.
Hi,

Can you paste the configuration section of your workflow that sets the input paths and the classes required to handle your multiple input formats?

--Mona


On 3/6/12 9:04 AM, "Andrew Rendle" <an...@gmail.com> wrote:

Hi Mona

The problem I have is using multiple input paths with multiple mappers and
input formats.

It seems the workflow config is ignoring our setup, maybe throwing an
exception, as the job config is missing the path, format, class entries.

Any ideas?

Andrew Rendle
 On Mar 5, 2012 10:56 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:

>  Hi Andrew,
>
> Some options while specifying multiple HDFS input paths:
>
>
>    - If using regex to specify them as comma separated values for the
>    configuration property - “mapred.input.dir”, you should *escape the
>    commas
>    *
>
>            E.g. mapred.input.dir =
> hdfs://path_to_directory/{subdir1\,subdir2\,subdir3}/file_name
>
>
>    - You can write your own pig script to read from multiple input paths
>
>     E.g. <property>
>     <name>input</name>
>     <value>hdfs://path_to_directory/subdir1,
> hdfs://path_to_directory/subdir2, hdfs://path_to_directory/subdir3</value>
>     </property>
>     ...
>     <script>myscript.pig</script>
>         <param>input=${input}</param>
>
>
>    - You can write your own map function passed as property
>    “mapred.mapper.class” that uses a custom delimiter to split multiple input
>    paths among multiple mappers.
>
>
> Above should be supported by Oozie. If you encounter any problems, please
> provide corresponding details and I can help debug.
>
> Thanks,
>
> Mona
>
>
> On 3/2/12 1:13 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
>
> Hi Andrew,
>
> I’m taking a look into providing an example workflow for this use-case.
> Are there any specific errors you encountered?
>
> Also, what version of Oozie are you working with?
>
> --Mona
>
>
> On 3/2/12 6:14 AM, "Andrew Rendle" <an...@gmail.com> wrote:
>
> Hi all
>
> Has anyone found any pitfalls when setting up multiple  input mappers for
> use with ooze?
>
> Have you got an example workflow?
>
> Thanks
>
> Andrew Rendle
>
>
> --
> *mona
> **chitnis
> *software developer
>
> chitnis@yahoo-inc.com
> direct 408-336-7908    mobile 864-650-0100
>
> 701 first avenue, sunnyvale, ca, 94089-0703, us
> phone (408) 349 3300    fax (408) 349 3301
>
>
>


Re: Multiple Input Path config

Posted by Andrew Rendle <an...@gmail.com>.
Hi Mona

The problem I have is using multiple input paths with multiple mappers and
input formats.

It seems the workflow config is ignoring our setup, maybe throwing an
exception, as the job config is missing the path, format, class entries.

Any ideas?

Andrew Rendle
 On Mar 5, 2012 10:56 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:

>  Hi Andrew,
>
> Some options while specifying multiple HDFS input paths:
>
>
>    - If using regex to specify them as comma separated values for the
>    configuration property - “mapred.input.dir”, you should *escape the
>    commas
>    *
>
>            E.g. mapred.input.dir =
> hdfs://path_to_directory/{subdir1\,subdir2\,subdir3}/file_name
>
>
>    - You can write your own pig script to read from multiple input paths
>
>     E.g. <property>
>     <name>input</name>
>     <value>hdfs://path_to_directory/subdir1,
> hdfs://path_to_directory/subdir2, hdfs://path_to_directory/subdir3</value>
>     </property>
>     ...
>     <script>myscript.pig</script>
>         <param>input=${input}</param>
>
>
>    - You can write your own map function passed as property
>    “mapred.mapper.class” that uses a custom delimiter to split multiple input
>    paths among multiple mappers.
>
>
> Above should be supported by Oozie. If you encounter any problems, please
> provide corresponding details and I can help debug.
>
> Thanks,
>
> Mona
>
>
> On 3/2/12 1:13 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
>
> Hi Andrew,
>
> I’m taking a look into providing an example workflow for this use-case.
> Are there any specific errors you encountered?
>
> Also, what version of Oozie are you working with?
>
> --Mona
>
>
> On 3/2/12 6:14 AM, "Andrew Rendle" <an...@gmail.com> wrote:
>
> Hi all
>
> Has anyone found any pitfalls when setting up multiple  input mappers for
> use with ooze?
>
> Have you got an example workflow?
>
> Thanks
>
> Andrew Rendle
>
>
> --
> *mona
> **chitnis
> *software developer
>
> chitnis@yahoo-inc.com
> direct 408-336-7908    mobile 864-650-0100
>
> 701 first avenue, sunnyvale, ca, 94089-0703, us
> phone (408) 349 3300    fax (408) 349 3301
>
>
>

Re: Multiple Input Path config

Posted by Mona Chitnis <ch...@yahoo-inc.com>.
Hi Andrew,

Some options while specifying multiple HDFS input paths:


 *   If using regex to specify them as comma separated values for the configuration property - “mapred.input.dir”, you should escape the commas

           E.g. mapred.input.dir = hdfs://path_to_directory/{subdir1\,subdir2\,subdir3}/file_name


 *   You can write your own pig script to read from multiple input paths

    E.g. <property>
    <name>input</name>
    <value>hdfs://path_to_directory/subdir1, hdfs://path_to_directory/subdir2, hdfs://path_to_directory/subdir3</value>
    </property>
    ...
    <script>myscript.pig</script>
        <param>input=${input}</param>


 *   You can write your own map function passed as property “mapred.mapper.class” that uses a custom delimiter to split multiple input paths among multiple mappers.

Above should be supported by Oozie. If you encounter any problems, please provide corresponding details and I can help debug.

Thanks,

Mona


On 3/2/12 1:13 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:

Hi Andrew,

I’m taking a look into providing an example workflow for this use-case. Are there any specific errors you encountered?

Also, what version of Oozie are you working with?

--Mona


On 3/2/12 6:14 AM, "Andrew Rendle" <an...@gmail.com> wrote:

Hi all

Has anyone found any pitfalls when setting up multiple  input mappers for
use with ooze?

Have you got an example workflow?

Thanks

Andrew Rendle


--
mona
chitnis
software developer

chitnis@yahoo-inc.com
direct 408-336-7908    mobile 864-650-0100

701 first avenue, sunnyvale, ca, 94089-0703, us
phone (408) 349 3300    fax (408) 349 3301

[cid:3413804441_27536610]

Re: Multiple Input Path config

Posted by Mona Chitnis <ch...@yahoo-inc.com>.
Hi Andrew,

I’m taking a look into providing an example workflow for this use-case. Are there any specific errors you encountered?

Also, what version of Oozie are you working with?

--Mona


On 3/2/12 6:14 AM, "Andrew Rendle" <an...@gmail.com> wrote:

Hi all

Has anyone found any pitfalls when setting up multiple  input mappers for
use with ooze?

Have you got an example workflow?

Thanks

Andrew Rendle