You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Andrew Rendle <an...@gmail.com> on 2012/03/02 15:14:52 UTC
Multiple Input Path config
Hi all
Has anyone found any pitfalls when setting up multiple input mappers for
use with ooze?
Have you got an example workflow?
Thanks
Andrew Rendle
Re: Multiple Input Path config
Posted by Mona Chitnis <ch...@yahoo-inc.com>.
Hi Andrew,
Glad to know you could get it working by config change. The reason config related issues are not reported below DEBUG level is there can be numerous combinations of config settings – not necessarily wrong – but just unsuitable for a particular use-case like yours for example. If a use-case becomes a common occurrence, then we will made modifications.
Thanks for letting us know.
Mona
On 3/8/12 11:50 PM, "Andrew Rendle" <an...@gmail.com> wrote:
Hi Mona
I managed to get it working and of course it was a config omission.
I had missed setting the Delegate mapper and input format.
One thing though, the config was throwing an exception that was hidden with
the default log settings. I would have expected config errors to be
reported below debug.
Thanks
Andrew Rendle
On Mar 6, 2012 6:38 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
> Hi,
>
> Can you paste the configuration section of your workflow that sets the
> input paths and the classes required to handle your multiple input formats?
>
> --Mona
>
>
> On 3/6/12 9:04 AM, "Andrew Rendle" <an...@gmail.com> wrote:
>
> Hi Mona
>
> The problem I have is using multiple input paths with multiple mappers and
> input formats.
>
> It seems the workflow config is ignoring our setup, maybe throwing an
> exception, as the job config is missing the path, format, class entries.
>
> Any ideas?
>
> Andrew Rendle
> On Mar 5, 2012 10:56 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
>
> > Hi Andrew,
> >
> > Some options while specifying multiple HDFS input paths:
> >
> >
> > - If using regex to specify them as comma separated values for the
> > configuration property - “mapred.input.dir”, you should *escape the
> > commas
> > *
> >
> > E.g. mapred.input.dir =
> > hdfs://path_to_directory/{subdir1\,subdir2\,subdir3}/file_name
> >
> >
> > - You can write your own pig script to read from multiple input paths
> >
> > E.g. <property>
> > <name>input</name>
> > <value>hdfs://path_to_directory/subdir1,
> > hdfs://path_to_directory/subdir2,
> hdfs://path_to_directory/subdir3</value>
> > </property>
> > ...
> > <script>myscript.pig</script>
> > <param>input=${input}</param>
> >
> >
> > - You can write your own map function passed as property
> > “mapred.mapper.class” that uses a custom delimiter to split multiple
> input
> > paths among multiple mappers.
> >
> >
> > Above should be supported by Oozie. If you encounter any problems, please
> > provide corresponding details and I can help debug.
> >
> > Thanks,
> >
> > Mona
> >
> >
> > On 3/2/12 1:13 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
> >
> > Hi Andrew,
> >
> > I’m taking a look into providing an example workflow for this use-case.
> > Are there any specific errors you encountered?
> >
> > Also, what version of Oozie are you working with?
> >
> > --Mona
> >
> >
> > On 3/2/12 6:14 AM, "Andrew Rendle" <an...@gmail.com> wrote:
> >
> > Hi all
> >
> > Has anyone found any pitfalls when setting up multiple input mappers for
> > use with ooze?
> >
> > Have you got an example workflow?
> >
> > Thanks
> >
> > Andrew Rendle
> >
> >
> > --
> > *mona
> > **chitnis
> > *software developer
> >
> > chitnis@yahoo-inc.com
> > direct 408-336-7908 mobile 864-650-0100
> >
> > 701 first avenue, sunnyvale, ca, 94089-0703, us
> > phone (408) 349 3300 fax (408) 349 3301
> >
> >
> >
>
>
Re: Multiple Input Path config
Posted by Andrew Rendle <an...@gmail.com>.
Hi Mona
I managed to get it working and of course it was a config omission.
I had missed setting the Delegate mapper and input format.
One thing though, the config was throwing an exception that was hidden with
the default log settings. I would have expected config errors to be
reported below debug.
Thanks
Andrew Rendle
On Mar 6, 2012 6:38 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
> Hi,
>
> Can you paste the configuration section of your workflow that sets the
> input paths and the classes required to handle your multiple input formats?
>
> --Mona
>
>
> On 3/6/12 9:04 AM, "Andrew Rendle" <an...@gmail.com> wrote:
>
> Hi Mona
>
> The problem I have is using multiple input paths with multiple mappers and
> input formats.
>
> It seems the workflow config is ignoring our setup, maybe throwing an
> exception, as the job config is missing the path, format, class entries.
>
> Any ideas?
>
> Andrew Rendle
> On Mar 5, 2012 10:56 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
>
> > Hi Andrew,
> >
> > Some options while specifying multiple HDFS input paths:
> >
> >
> > - If using regex to specify them as comma separated values for the
> > configuration property - “mapred.input.dir”, you should *escape the
> > commas
> > *
> >
> > E.g. mapred.input.dir =
> > hdfs://path_to_directory/{subdir1\,subdir2\,subdir3}/file_name
> >
> >
> > - You can write your own pig script to read from multiple input paths
> >
> > E.g. <property>
> > <name>input</name>
> > <value>hdfs://path_to_directory/subdir1,
> > hdfs://path_to_directory/subdir2,
> hdfs://path_to_directory/subdir3</value>
> > </property>
> > ...
> > <script>myscript.pig</script>
> > <param>input=${input}</param>
> >
> >
> > - You can write your own map function passed as property
> > “mapred.mapper.class” that uses a custom delimiter to split multiple
> input
> > paths among multiple mappers.
> >
> >
> > Above should be supported by Oozie. If you encounter any problems, please
> > provide corresponding details and I can help debug.
> >
> > Thanks,
> >
> > Mona
> >
> >
> > On 3/2/12 1:13 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
> >
> > Hi Andrew,
> >
> > I’m taking a look into providing an example workflow for this use-case.
> > Are there any specific errors you encountered?
> >
> > Also, what version of Oozie are you working with?
> >
> > --Mona
> >
> >
> > On 3/2/12 6:14 AM, "Andrew Rendle" <an...@gmail.com> wrote:
> >
> > Hi all
> >
> > Has anyone found any pitfalls when setting up multiple input mappers for
> > use with ooze?
> >
> > Have you got an example workflow?
> >
> > Thanks
> >
> > Andrew Rendle
> >
> >
> > --
> > *mona
> > **chitnis
> > *software developer
> >
> > chitnis@yahoo-inc.com
> > direct 408-336-7908 mobile 864-650-0100
> >
> > 701 first avenue, sunnyvale, ca, 94089-0703, us
> > phone (408) 349 3300 fax (408) 349 3301
> >
> >
> >
>
>
Re: Multiple Input Path config
Posted by Mona Chitnis <ch...@yahoo-inc.com>.
Hi,
Can you paste the configuration section of your workflow that sets the input paths and the classes required to handle your multiple input formats?
--Mona
On 3/6/12 9:04 AM, "Andrew Rendle" <an...@gmail.com> wrote:
Hi Mona
The problem I have is using multiple input paths with multiple mappers and
input formats.
It seems the workflow config is ignoring our setup, maybe throwing an
exception, as the job config is missing the path, format, class entries.
Any ideas?
Andrew Rendle
On Mar 5, 2012 10:56 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
> Hi Andrew,
>
> Some options while specifying multiple HDFS input paths:
>
>
> - If using regex to specify them as comma separated values for the
> configuration property - “mapred.input.dir”, you should *escape the
> commas
> *
>
> E.g. mapred.input.dir =
> hdfs://path_to_directory/{subdir1\,subdir2\,subdir3}/file_name
>
>
> - You can write your own pig script to read from multiple input paths
>
> E.g. <property>
> <name>input</name>
> <value>hdfs://path_to_directory/subdir1,
> hdfs://path_to_directory/subdir2, hdfs://path_to_directory/subdir3</value>
> </property>
> ...
> <script>myscript.pig</script>
> <param>input=${input}</param>
>
>
> - You can write your own map function passed as property
> “mapred.mapper.class” that uses a custom delimiter to split multiple input
> paths among multiple mappers.
>
>
> Above should be supported by Oozie. If you encounter any problems, please
> provide corresponding details and I can help debug.
>
> Thanks,
>
> Mona
>
>
> On 3/2/12 1:13 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
>
> Hi Andrew,
>
> I’m taking a look into providing an example workflow for this use-case.
> Are there any specific errors you encountered?
>
> Also, what version of Oozie are you working with?
>
> --Mona
>
>
> On 3/2/12 6:14 AM, "Andrew Rendle" <an...@gmail.com> wrote:
>
> Hi all
>
> Has anyone found any pitfalls when setting up multiple input mappers for
> use with ooze?
>
> Have you got an example workflow?
>
> Thanks
>
> Andrew Rendle
>
>
> --
> *mona
> **chitnis
> *software developer
>
> chitnis@yahoo-inc.com
> direct 408-336-7908 mobile 864-650-0100
>
> 701 first avenue, sunnyvale, ca, 94089-0703, us
> phone (408) 349 3300 fax (408) 349 3301
>
>
>
Re: Multiple Input Path config
Posted by Andrew Rendle <an...@gmail.com>.
Hi Mona
The problem I have is using multiple input paths with multiple mappers and
input formats.
It seems the workflow config is ignoring our setup, maybe throwing an
exception, as the job config is missing the path, format, class entries.
Any ideas?
Andrew Rendle
On Mar 5, 2012 10:56 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
> Hi Andrew,
>
> Some options while specifying multiple HDFS input paths:
>
>
> - If using regex to specify them as comma separated values for the
> configuration property - “mapred.input.dir”, you should *escape the
> commas
> *
>
> E.g. mapred.input.dir =
> hdfs://path_to_directory/{subdir1\,subdir2\,subdir3}/file_name
>
>
> - You can write your own pig script to read from multiple input paths
>
> E.g. <property>
> <name>input</name>
> <value>hdfs://path_to_directory/subdir1,
> hdfs://path_to_directory/subdir2, hdfs://path_to_directory/subdir3</value>
> </property>
> ...
> <script>myscript.pig</script>
> <param>input=${input}</param>
>
>
> - You can write your own map function passed as property
> “mapred.mapper.class” that uses a custom delimiter to split multiple input
> paths among multiple mappers.
>
>
> Above should be supported by Oozie. If you encounter any problems, please
> provide corresponding details and I can help debug.
>
> Thanks,
>
> Mona
>
>
> On 3/2/12 1:13 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
>
> Hi Andrew,
>
> I’m taking a look into providing an example workflow for this use-case.
> Are there any specific errors you encountered?
>
> Also, what version of Oozie are you working with?
>
> --Mona
>
>
> On 3/2/12 6:14 AM, "Andrew Rendle" <an...@gmail.com> wrote:
>
> Hi all
>
> Has anyone found any pitfalls when setting up multiple input mappers for
> use with ooze?
>
> Have you got an example workflow?
>
> Thanks
>
> Andrew Rendle
>
>
> --
> *mona
> **chitnis
> *software developer
>
> chitnis@yahoo-inc.com
> direct 408-336-7908 mobile 864-650-0100
>
> 701 first avenue, sunnyvale, ca, 94089-0703, us
> phone (408) 349 3300 fax (408) 349 3301
>
>
>
Re: Multiple Input Path config
Posted by Mona Chitnis <ch...@yahoo-inc.com>.
Hi Andrew,
Some options while specifying multiple HDFS input paths:
* If using regex to specify them as comma separated values for the configuration property - “mapred.input.dir”, you should escape the commas
E.g. mapred.input.dir = hdfs://path_to_directory/{subdir1\,subdir2\,subdir3}/file_name
* You can write your own pig script to read from multiple input paths
E.g. <property>
<name>input</name>
<value>hdfs://path_to_directory/subdir1, hdfs://path_to_directory/subdir2, hdfs://path_to_directory/subdir3</value>
</property>
...
<script>myscript.pig</script>
<param>input=${input}</param>
* You can write your own map function passed as property “mapred.mapper.class” that uses a custom delimiter to split multiple input paths among multiple mappers.
Above should be supported by Oozie. If you encounter any problems, please provide corresponding details and I can help debug.
Thanks,
Mona
On 3/2/12 1:13 PM, "Mona Chitnis" <ch...@yahoo-inc.com> wrote:
Hi Andrew,
I’m taking a look into providing an example workflow for this use-case. Are there any specific errors you encountered?
Also, what version of Oozie are you working with?
--Mona
On 3/2/12 6:14 AM, "Andrew Rendle" <an...@gmail.com> wrote:
Hi all
Has anyone found any pitfalls when setting up multiple input mappers for
use with ooze?
Have you got an example workflow?
Thanks
Andrew Rendle
--
mona
chitnis
software developer
chitnis@yahoo-inc.com
direct 408-336-7908 mobile 864-650-0100
701 first avenue, sunnyvale, ca, 94089-0703, us
phone (408) 349 3300 fax (408) 349 3301
[cid:3413804441_27536610]
Re: Multiple Input Path config
Posted by Mona Chitnis <ch...@yahoo-inc.com>.
Hi Andrew,
I’m taking a look into providing an example workflow for this use-case. Are there any specific errors you encountered?
Also, what version of Oozie are you working with?
--Mona
On 3/2/12 6:14 AM, "Andrew Rendle" <an...@gmail.com> wrote:
Hi all
Has anyone found any pitfalls when setting up multiple input mappers for
use with ooze?
Have you got an example workflow?
Thanks
Andrew Rendle