You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Calvin <ip...@gmail.com> on 2014/08/12 20:29:41 UTC

hadoop/yarn and task parallelization on non-hdfs filesystems

Hi all,

I've instantiated a Hadoop 2.4.1 cluster and I've found that running
MapReduce applications will parallelize differently depending on what
kind of filesystem the input data is on.

Using HDFS, a MapReduce job will spawn enough containers to maximize
use of all available memory. For example, a 3-node cluster with 172GB
of memory with each map task allocating 2GB, about 86 application
containers will be created.

On a filesystem that isn't HDFS (like NFS or in my use case, a
parallel filesystem), a MapReduce job will only allocate a subset of
available tasks (e.g., with the same 3-node cluster, about 25-40
containers are created). Since I'm using a parallel filesystem, I'm
not as concerned with the bottlenecks one would find if one were to
use NFS.

Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
configuration that will allow me to effectively maximize resource
utilization?

Thanks,
Calvin

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

Oops, one of the settings should read
"yarn.nodemanager.vmem-check-enabled". The blog post has a typo and a
comment pointed that out as well.

Thanks,
Calvin

On Mon, Aug 18, 2014 at 4:45 PM, Calvin <ip...@gmail.com> wrote:
> OK, I figured out exactly what was happening.
>
> I had set the configuration value "yarn.nodemanager.vmem-pmem-ratio"
> to 10. Since there is no swap space available for use, every task
> which is requesting 2 GB of memory is also requesting an additional 20
> GB of memory. This 20 GB isn't represented in the "Memory Used" column
> on the YARN applications status page and thus it seemed like I was
> underutilizing the YARN cluster (when in actuality I had allocated all
> the memory available).
>
> (The cluster "underutilization" occurs regardless of using HDFS or
> LocalFileSystem; I must have made this configuration change after
> testing HDFS and before testing the local filesystem.)
>
> The solution is to set  "yarn.nodemanager.vmem-pmem-ratio" to 1 (since
> I have no swap) *and* "yarn.nodemanager.vmem-check.enabled" to false.
>
> Part of the reason why I had set such a high setting was due to
> containers being killed because of virtual memory usage. The Cloudera
> folks have a good blog post [1] on this topic (see #6) and I wish I
> had read that sooner.
>
> With the above configuration values, I can now utilize the cluster at 100%.
>
> Thanks for everyone's input!
>
> Calvin
>
> [1] http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/
>
> On Fri, Aug 15, 2014 at 2:11 PM, java8964 <ja...@hotmail.com> wrote:
>> Interesting to know that.
>>
>> I also want to know what underline logic holding the force to only generate
>> 25-35 parallelized containers, instead of up to 1300.
>>
>> Another suggestion I can give is following:
>>
>> 1) In your driver, generate a text file, including all your 1300 bz2 file
>> names with absolute path.
>> 2) In your MR job, use the NLineInputFormat, with default setting, each line
>> content will trigger one mapper task.
>> 3) In your mapper, key/value pair will be offset byte loc/line content, just
>> start to process the file, as it should be available from the mount path in
>> the local data nodes.
>> 4) I assume that you are using Yarn. In this case, at least 1300 container
>> requests will be issued to the cluster. You generate 1300 parallelized
>> request, now it is up to the cluster to decide how many containers can be
>> parallel run.
>>
>> Yong
>>
>>> Date: Fri, 15 Aug 2014 12:30:09 -0600
>>
>>> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
>>> From: iphcalvin@gmail.com
>>> To: user@hadoop.apache.org
>>
>>>
>>> Thanks for the responses!
>>>
>>> To clarify, I'm not using any special FileSystem implementation. An
>>> example input parameter to a MapReduce job would be something like
>>> "-input file:///scratch/data". Thus I think (any clarification would
>>> be helpful) Hadoop is then utilizing LocalFileSystem
>>> (org.apache.hadoop.fs.LocalFileSystem).
>>>
>>> The input data is large enough and splittable (1300 .bz2 files, 274MB
>>> each, 350GB total). Thus even if it the input data weren't splittable,
>>> Hadoop should be able to parallelize up to 1300 map tasks if capacity
>>> is available; in my case, I find that the Hadoop cluster is not fully
>>> utilized (i.e., ~25-35 containers running when it can scale up to ~80
>>> containers) when not using HDFS, while achieving maximum use when
>>> using HDFS.
>>>
>>> I'm wondering if Hadoop is "holding back" or throttling the I/O if
>>> LocalFileSystem is being used, and what changes I can make to have the
>>> Hadoop tasks scale.
>>>
>>> In the meantime, I'll take a look at the API calls that Harsh mentioned.
>>>
>>>
>>> On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
>>> > The split configurations in FIF mentioned earlier would work for local
>>> > files
>>> > as well. They aren't deemed unsplitable, just considered as one single
>>> > block.
>>> >
>>> > If the FS in use has its advantages it's better to implement a proper
>>> > interface to it making use of them, than to rely on the LFS by mounting
>>> > it.
>>> > This is what we do with HDFS.
>>> >
>>> > On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
>>> >>
>>> >> I believe that Calvin mentioned before that this parallel file system
>>> >> mounted into local file system.
>>> >>
>>> >> In this case, will Hadoop just use java.io.File as local File system to
>>> >> treat them as local file and not split the file?
>>> >>
>>> >> Just want to know the logic in hadoop handling the local file.
>>> >>
>>> >> One suggestion I can think is to split the files manually outside of
>>> >> hadoop. For example, generate lots of small files as 128M or 256M size.
>>> >>
>>> >> In this case, each mapper will process one small file, so you can get
>>> >> good
>>> >> utilization of your cluster, assume you have a lot of small files.
>>> >>
>>> >> Yong
>>> >>
>>> >> > From: harsh@cloudera.com
>>> >> > Date: Fri, 15 Aug 2014 16:45:02 +0530
>>> >> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
>>> >> > filesystems
>>> >> > To: user@hadoop.apache.org
>>> >> >
>>> >> > Does your non-HDFS filesystem implement a getBlockLocations API, that
>>> >> > MR relies on to know how to split files?
>>> >> >
>>> >> > The API is at
>>> >> >
>>> >> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
>>> >> > long, long), and MR calls it at
>>> >> >
>>> >> >
>>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
>>> >> >
>>> >> > If not, perhaps you can enforce a manual chunking by asking MR to use
>>> >> > custom min/max split sizes values via config properties:
>>> >> >
>>> >> >
>>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
>>> >> >
>>> >> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
>>> >> > > I've looked a bit into this problem some more, and from what
>>> >> > > another
>>> >> > > person has written, HDFS is tuned to scale appropriately [1] given
>>> >> > > the
>>> >> > > number of input splits, etc.
>>> >> > >
>>> >> > > In the case of utilizing the local filesystem (which is really a
>>> >> > > network share on a parallel filesystem), the settings might be set
>>> >> > > conservatively in order not to thrash the local disks or present a
>>> >> > > bottleneck in processing.
>>> >> > >
>>> >> > > Since this isn't a big concern, I'd rather tune the settings to
>>> >> > > efficiently utilize the local filesystem.
>>> >> > >
>>> >> > > Are there any pointers to where in the source code I could look in
>>> >> > > order to tweak such parameters?
>>> >> > >
>>> >> > > Thanks,
>>> >> > > Calvin
>>> >> > >
>>> >> > > [1]
>>> >> > >
>>> >> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>>> >> > >
>>> >> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com>
>>> >> > > wrote:
>>> >> > >> Hi all,
>>> >> > >>
>>> >> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that
>>> >> > >> running
>>> >> > >> MapReduce applications will parallelize differently depending on
>>> >> > >> what
>>> >> > >> kind of filesystem the input data is on.
>>> >> > >>
>>> >> > >> Using HDFS, a MapReduce job will spawn enough containers to
>>> >> > >> maximize
>>> >> > >> use of all available memory. For example, a 3-node cluster with
>>> >> > >> 172GB
>>> >> > >> of memory with each map task allocating 2GB, about 86 application
>>> >> > >> containers will be created.
>>> >> > >>
>>> >> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
>>> >> > >> parallel filesystem), a MapReduce job will only allocate a subset
>>> >> > >> of
>>> >> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
>>> >> > >> containers are created). Since I'm using a parallel filesystem,
>>> >> > >> I'm
>>> >> > >> not as concerned with the bottlenecks one would find if one were
>>> >> > >> to
>>> >> > >> use NFS.
>>> >> > >>
>>> >> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>>> >> > >> configuration that will allow me to effectively maximize resource
>>> >> > >> utilization?
>>> >> > >>
>>> >> > >> Thanks,
>>> >> > >> Calvin
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

Oops, one of the settings should read
"yarn.nodemanager.vmem-check-enabled". The blog post has a typo and a
comment pointed that out as well.

Thanks,
Calvin

On Mon, Aug 18, 2014 at 4:45 PM, Calvin <ip...@gmail.com> wrote:
> OK, I figured out exactly what was happening.
>
> I had set the configuration value "yarn.nodemanager.vmem-pmem-ratio"
> to 10. Since there is no swap space available for use, every task
> which is requesting 2 GB of memory is also requesting an additional 20
> GB of memory. This 20 GB isn't represented in the "Memory Used" column
> on the YARN applications status page and thus it seemed like I was
> underutilizing the YARN cluster (when in actuality I had allocated all
> the memory available).
>
> (The cluster "underutilization" occurs regardless of using HDFS or
> LocalFileSystem; I must have made this configuration change after
> testing HDFS and before testing the local filesystem.)
>
> The solution is to set  "yarn.nodemanager.vmem-pmem-ratio" to 1 (since
> I have no swap) *and* "yarn.nodemanager.vmem-check.enabled" to false.
>
> Part of the reason why I had set such a high setting was due to
> containers being killed because of virtual memory usage. The Cloudera
> folks have a good blog post [1] on this topic (see #6) and I wish I
> had read that sooner.
>
> With the above configuration values, I can now utilize the cluster at 100%.
>
> Thanks for everyone's input!
>
> Calvin
>
> [1] http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/
>
> On Fri, Aug 15, 2014 at 2:11 PM, java8964 <ja...@hotmail.com> wrote:
>> Interesting to know that.
>>
>> I also want to know what underline logic holding the force to only generate
>> 25-35 parallelized containers, instead of up to 1300.
>>
>> Another suggestion I can give is following:
>>
>> 1) In your driver, generate a text file, including all your 1300 bz2 file
>> names with absolute path.
>> 2) In your MR job, use the NLineInputFormat, with default setting, each line
>> content will trigger one mapper task.
>> 3) In your mapper, key/value pair will be offset byte loc/line content, just
>> start to process the file, as it should be available from the mount path in
>> the local data nodes.
>> 4) I assume that you are using Yarn. In this case, at least 1300 container
>> requests will be issued to the cluster. You generate 1300 parallelized
>> request, now it is up to the cluster to decide how many containers can be
>> parallel run.
>>
>> Yong
>>
>>> Date: Fri, 15 Aug 2014 12:30:09 -0600
>>
>>> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
>>> From: iphcalvin@gmail.com
>>> To: user@hadoop.apache.org
>>
>>>
>>> Thanks for the responses!
>>>
>>> To clarify, I'm not using any special FileSystem implementation. An
>>> example input parameter to a MapReduce job would be something like
>>> "-input file:///scratch/data". Thus I think (any clarification would
>>> be helpful) Hadoop is then utilizing LocalFileSystem
>>> (org.apache.hadoop.fs.LocalFileSystem).
>>>
>>> The input data is large enough and splittable (1300 .bz2 files, 274MB
>>> each, 350GB total). Thus even if it the input data weren't splittable,
>>> Hadoop should be able to parallelize up to 1300 map tasks if capacity
>>> is available; in my case, I find that the Hadoop cluster is not fully
>>> utilized (i.e., ~25-35 containers running when it can scale up to ~80
>>> containers) when not using HDFS, while achieving maximum use when
>>> using HDFS.
>>>
>>> I'm wondering if Hadoop is "holding back" or throttling the I/O if
>>> LocalFileSystem is being used, and what changes I can make to have the
>>> Hadoop tasks scale.
>>>
>>> In the meantime, I'll take a look at the API calls that Harsh mentioned.
>>>
>>>
>>> On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
>>> > The split configurations in FIF mentioned earlier would work for local
>>> > files
>>> > as well. They aren't deemed unsplitable, just considered as one single
>>> > block.
>>> >
>>> > If the FS in use has its advantages it's better to implement a proper
>>> > interface to it making use of them, than to rely on the LFS by mounting
>>> > it.
>>> > This is what we do with HDFS.
>>> >
>>> > On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
>>> >>
>>> >> I believe that Calvin mentioned before that this parallel file system
>>> >> mounted into local file system.
>>> >>
>>> >> In this case, will Hadoop just use java.io.File as local File system to
>>> >> treat them as local file and not split the file?
>>> >>
>>> >> Just want to know the logic in hadoop handling the local file.
>>> >>
>>> >> One suggestion I can think is to split the files manually outside of
>>> >> hadoop. For example, generate lots of small files as 128M or 256M size.
>>> >>
>>> >> In this case, each mapper will process one small file, so you can get
>>> >> good
>>> >> utilization of your cluster, assume you have a lot of small files.
>>> >>
>>> >> Yong
>>> >>
>>> >> > From: harsh@cloudera.com
>>> >> > Date: Fri, 15 Aug 2014 16:45:02 +0530
>>> >> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
>>> >> > filesystems
>>> >> > To: user@hadoop.apache.org
>>> >> >
>>> >> > Does your non-HDFS filesystem implement a getBlockLocations API, that
>>> >> > MR relies on to know how to split files?
>>> >> >
>>> >> > The API is at
>>> >> >
>>> >> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
>>> >> > long, long), and MR calls it at
>>> >> >
>>> >> >
>>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
>>> >> >
>>> >> > If not, perhaps you can enforce a manual chunking by asking MR to use
>>> >> > custom min/max split sizes values via config properties:
>>> >> >
>>> >> >
>>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
>>> >> >
>>> >> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
>>> >> > > I've looked a bit into this problem some more, and from what
>>> >> > > another
>>> >> > > person has written, HDFS is tuned to scale appropriately [1] given
>>> >> > > the
>>> >> > > number of input splits, etc.
>>> >> > >
>>> >> > > In the case of utilizing the local filesystem (which is really a
>>> >> > > network share on a parallel filesystem), the settings might be set
>>> >> > > conservatively in order not to thrash the local disks or present a
>>> >> > > bottleneck in processing.
>>> >> > >
>>> >> > > Since this isn't a big concern, I'd rather tune the settings to
>>> >> > > efficiently utilize the local filesystem.
>>> >> > >
>>> >> > > Are there any pointers to where in the source code I could look in
>>> >> > > order to tweak such parameters?
>>> >> > >
>>> >> > > Thanks,
>>> >> > > Calvin
>>> >> > >
>>> >> > > [1]
>>> >> > >
>>> >> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>>> >> > >
>>> >> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com>
>>> >> > > wrote:
>>> >> > >> Hi all,
>>> >> > >>
>>> >> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that
>>> >> > >> running
>>> >> > >> MapReduce applications will parallelize differently depending on
>>> >> > >> what
>>> >> > >> kind of filesystem the input data is on.
>>> >> > >>
>>> >> > >> Using HDFS, a MapReduce job will spawn enough containers to
>>> >> > >> maximize
>>> >> > >> use of all available memory. For example, a 3-node cluster with
>>> >> > >> 172GB
>>> >> > >> of memory with each map task allocating 2GB, about 86 application
>>> >> > >> containers will be created.
>>> >> > >>
>>> >> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
>>> >> > >> parallel filesystem), a MapReduce job will only allocate a subset
>>> >> > >> of
>>> >> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
>>> >> > >> containers are created). Since I'm using a parallel filesystem,
>>> >> > >> I'm
>>> >> > >> not as concerned with the bottlenecks one would find if one were
>>> >> > >> to
>>> >> > >> use NFS.
>>> >> > >>
>>> >> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>>> >> > >> configuration that will allow me to effectively maximize resource
>>> >> > >> utilization?
>>> >> > >>
>>> >> > >> Thanks,
>>> >> > >> Calvin
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

Oops, one of the settings should read
"yarn.nodemanager.vmem-check-enabled". The blog post has a typo and a
comment pointed that out as well.

Thanks,
Calvin

On Mon, Aug 18, 2014 at 4:45 PM, Calvin <ip...@gmail.com> wrote:
> OK, I figured out exactly what was happening.
>
> I had set the configuration value "yarn.nodemanager.vmem-pmem-ratio"
> to 10. Since there is no swap space available for use, every task
> which is requesting 2 GB of memory is also requesting an additional 20
> GB of memory. This 20 GB isn't represented in the "Memory Used" column
> on the YARN applications status page and thus it seemed like I was
> underutilizing the YARN cluster (when in actuality I had allocated all
> the memory available).
>
> (The cluster "underutilization" occurs regardless of using HDFS or
> LocalFileSystem; I must have made this configuration change after
> testing HDFS and before testing the local filesystem.)
>
> The solution is to set  "yarn.nodemanager.vmem-pmem-ratio" to 1 (since
> I have no swap) *and* "yarn.nodemanager.vmem-check.enabled" to false.
>
> Part of the reason why I had set such a high setting was due to
> containers being killed because of virtual memory usage. The Cloudera
> folks have a good blog post [1] on this topic (see #6) and I wish I
> had read that sooner.
>
> With the above configuration values, I can now utilize the cluster at 100%.
>
> Thanks for everyone's input!
>
> Calvin
>
> [1] http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/
>
> On Fri, Aug 15, 2014 at 2:11 PM, java8964 <ja...@hotmail.com> wrote:
>> Interesting to know that.
>>
>> I also want to know what underline logic holding the force to only generate
>> 25-35 parallelized containers, instead of up to 1300.
>>
>> Another suggestion I can give is following:
>>
>> 1) In your driver, generate a text file, including all your 1300 bz2 file
>> names with absolute path.
>> 2) In your MR job, use the NLineInputFormat, with default setting, each line
>> content will trigger one mapper task.
>> 3) In your mapper, key/value pair will be offset byte loc/line content, just
>> start to process the file, as it should be available from the mount path in
>> the local data nodes.
>> 4) I assume that you are using Yarn. In this case, at least 1300 container
>> requests will be issued to the cluster. You generate 1300 parallelized
>> request, now it is up to the cluster to decide how many containers can be
>> parallel run.
>>
>> Yong
>>
>>> Date: Fri, 15 Aug 2014 12:30:09 -0600
>>
>>> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
>>> From: iphcalvin@gmail.com
>>> To: user@hadoop.apache.org
>>
>>>
>>> Thanks for the responses!
>>>
>>> To clarify, I'm not using any special FileSystem implementation. An
>>> example input parameter to a MapReduce job would be something like
>>> "-input file:///scratch/data". Thus I think (any clarification would
>>> be helpful) Hadoop is then utilizing LocalFileSystem
>>> (org.apache.hadoop.fs.LocalFileSystem).
>>>
>>> The input data is large enough and splittable (1300 .bz2 files, 274MB
>>> each, 350GB total). Thus even if it the input data weren't splittable,
>>> Hadoop should be able to parallelize up to 1300 map tasks if capacity
>>> is available; in my case, I find that the Hadoop cluster is not fully
>>> utilized (i.e., ~25-35 containers running when it can scale up to ~80
>>> containers) when not using HDFS, while achieving maximum use when
>>> using HDFS.
>>>
>>> I'm wondering if Hadoop is "holding back" or throttling the I/O if
>>> LocalFileSystem is being used, and what changes I can make to have the
>>> Hadoop tasks scale.
>>>
>>> In the meantime, I'll take a look at the API calls that Harsh mentioned.
>>>
>>>
>>> On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
>>> > The split configurations in FIF mentioned earlier would work for local
>>> > files
>>> > as well. They aren't deemed unsplitable, just considered as one single
>>> > block.
>>> >
>>> > If the FS in use has its advantages it's better to implement a proper
>>> > interface to it making use of them, than to rely on the LFS by mounting
>>> > it.
>>> > This is what we do with HDFS.
>>> >
>>> > On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
>>> >>
>>> >> I believe that Calvin mentioned before that this parallel file system
>>> >> mounted into local file system.
>>> >>
>>> >> In this case, will Hadoop just use java.io.File as local File system to
>>> >> treat them as local file and not split the file?
>>> >>
>>> >> Just want to know the logic in hadoop handling the local file.
>>> >>
>>> >> One suggestion I can think is to split the files manually outside of
>>> >> hadoop. For example, generate lots of small files as 128M or 256M size.
>>> >>
>>> >> In this case, each mapper will process one small file, so you can get
>>> >> good
>>> >> utilization of your cluster, assume you have a lot of small files.
>>> >>
>>> >> Yong
>>> >>
>>> >> > From: harsh@cloudera.com
>>> >> > Date: Fri, 15 Aug 2014 16:45:02 +0530
>>> >> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
>>> >> > filesystems
>>> >> > To: user@hadoop.apache.org
>>> >> >
>>> >> > Does your non-HDFS filesystem implement a getBlockLocations API, that
>>> >> > MR relies on to know how to split files?
>>> >> >
>>> >> > The API is at
>>> >> >
>>> >> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
>>> >> > long, long), and MR calls it at
>>> >> >
>>> >> >
>>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
>>> >> >
>>> >> > If not, perhaps you can enforce a manual chunking by asking MR to use
>>> >> > custom min/max split sizes values via config properties:
>>> >> >
>>> >> >
>>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
>>> >> >
>>> >> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
>>> >> > > I've looked a bit into this problem some more, and from what
>>> >> > > another
>>> >> > > person has written, HDFS is tuned to scale appropriately [1] given
>>> >> > > the
>>> >> > > number of input splits, etc.
>>> >> > >
>>> >> > > In the case of utilizing the local filesystem (which is really a
>>> >> > > network share on a parallel filesystem), the settings might be set
>>> >> > > conservatively in order not to thrash the local disks or present a
>>> >> > > bottleneck in processing.
>>> >> > >
>>> >> > > Since this isn't a big concern, I'd rather tune the settings to
>>> >> > > efficiently utilize the local filesystem.
>>> >> > >
>>> >> > > Are there any pointers to where in the source code I could look in
>>> >> > > order to tweak such parameters?
>>> >> > >
>>> >> > > Thanks,
>>> >> > > Calvin
>>> >> > >
>>> >> > > [1]
>>> >> > >
>>> >> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>>> >> > >
>>> >> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com>
>>> >> > > wrote:
>>> >> > >> Hi all,
>>> >> > >>
>>> >> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that
>>> >> > >> running
>>> >> > >> MapReduce applications will parallelize differently depending on
>>> >> > >> what
>>> >> > >> kind of filesystem the input data is on.
>>> >> > >>
>>> >> > >> Using HDFS, a MapReduce job will spawn enough containers to
>>> >> > >> maximize
>>> >> > >> use of all available memory. For example, a 3-node cluster with
>>> >> > >> 172GB
>>> >> > >> of memory with each map task allocating 2GB, about 86 application
>>> >> > >> containers will be created.
>>> >> > >>
>>> >> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
>>> >> > >> parallel filesystem), a MapReduce job will only allocate a subset
>>> >> > >> of
>>> >> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
>>> >> > >> containers are created). Since I'm using a parallel filesystem,
>>> >> > >> I'm
>>> >> > >> not as concerned with the bottlenecks one would find if one were
>>> >> > >> to
>>> >> > >> use NFS.
>>> >> > >>
>>> >> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>>> >> > >> configuration that will allow me to effectively maximize resource
>>> >> > >> utilization?
>>> >> > >>
>>> >> > >> Thanks,
>>> >> > >> Calvin
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

Oops, one of the settings should read
"yarn.nodemanager.vmem-check-enabled". The blog post has a typo and a
comment pointed that out as well.

Thanks,
Calvin

On Mon, Aug 18, 2014 at 4:45 PM, Calvin <ip...@gmail.com> wrote:
> OK, I figured out exactly what was happening.
>
> I had set the configuration value "yarn.nodemanager.vmem-pmem-ratio"
> to 10. Since there is no swap space available for use, every task
> which is requesting 2 GB of memory is also requesting an additional 20
> GB of memory. This 20 GB isn't represented in the "Memory Used" column
> on the YARN applications status page and thus it seemed like I was
> underutilizing the YARN cluster (when in actuality I had allocated all
> the memory available).
>
> (The cluster "underutilization" occurs regardless of using HDFS or
> LocalFileSystem; I must have made this configuration change after
> testing HDFS and before testing the local filesystem.)
>
> The solution is to set  "yarn.nodemanager.vmem-pmem-ratio" to 1 (since
> I have no swap) *and* "yarn.nodemanager.vmem-check.enabled" to false.
>
> Part of the reason why I had set such a high setting was due to
> containers being killed because of virtual memory usage. The Cloudera
> folks have a good blog post [1] on this topic (see #6) and I wish I
> had read that sooner.
>
> With the above configuration values, I can now utilize the cluster at 100%.
>
> Thanks for everyone's input!
>
> Calvin
>
> [1] http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/
>
> On Fri, Aug 15, 2014 at 2:11 PM, java8964 <ja...@hotmail.com> wrote:
>> Interesting to know that.
>>
>> I also want to know what underline logic holding the force to only generate
>> 25-35 parallelized containers, instead of up to 1300.
>>
>> Another suggestion I can give is following:
>>
>> 1) In your driver, generate a text file, including all your 1300 bz2 file
>> names with absolute path.
>> 2) In your MR job, use the NLineInputFormat, with default setting, each line
>> content will trigger one mapper task.
>> 3) In your mapper, key/value pair will be offset byte loc/line content, just
>> start to process the file, as it should be available from the mount path in
>> the local data nodes.
>> 4) I assume that you are using Yarn. In this case, at least 1300 container
>> requests will be issued to the cluster. You generate 1300 parallelized
>> request, now it is up to the cluster to decide how many containers can be
>> parallel run.
>>
>> Yong
>>
>>> Date: Fri, 15 Aug 2014 12:30:09 -0600
>>
>>> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
>>> From: iphcalvin@gmail.com
>>> To: user@hadoop.apache.org
>>
>>>
>>> Thanks for the responses!
>>>
>>> To clarify, I'm not using any special FileSystem implementation. An
>>> example input parameter to a MapReduce job would be something like
>>> "-input file:///scratch/data". Thus I think (any clarification would
>>> be helpful) Hadoop is then utilizing LocalFileSystem
>>> (org.apache.hadoop.fs.LocalFileSystem).
>>>
>>> The input data is large enough and splittable (1300 .bz2 files, 274MB
>>> each, 350GB total). Thus even if it the input data weren't splittable,
>>> Hadoop should be able to parallelize up to 1300 map tasks if capacity
>>> is available; in my case, I find that the Hadoop cluster is not fully
>>> utilized (i.e., ~25-35 containers running when it can scale up to ~80
>>> containers) when not using HDFS, while achieving maximum use when
>>> using HDFS.
>>>
>>> I'm wondering if Hadoop is "holding back" or throttling the I/O if
>>> LocalFileSystem is being used, and what changes I can make to have the
>>> Hadoop tasks scale.
>>>
>>> In the meantime, I'll take a look at the API calls that Harsh mentioned.
>>>
>>>
>>> On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
>>> > The split configurations in FIF mentioned earlier would work for local
>>> > files
>>> > as well. They aren't deemed unsplitable, just considered as one single
>>> > block.
>>> >
>>> > If the FS in use has its advantages it's better to implement a proper
>>> > interface to it making use of them, than to rely on the LFS by mounting
>>> > it.
>>> > This is what we do with HDFS.
>>> >
>>> > On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
>>> >>
>>> >> I believe that Calvin mentioned before that this parallel file system
>>> >> mounted into local file system.
>>> >>
>>> >> In this case, will Hadoop just use java.io.File as local File system to
>>> >> treat them as local file and not split the file?
>>> >>
>>> >> Just want to know the logic in hadoop handling the local file.
>>> >>
>>> >> One suggestion I can think is to split the files manually outside of
>>> >> hadoop. For example, generate lots of small files as 128M or 256M size.
>>> >>
>>> >> In this case, each mapper will process one small file, so you can get
>>> >> good
>>> >> utilization of your cluster, assume you have a lot of small files.
>>> >>
>>> >> Yong
>>> >>
>>> >> > From: harsh@cloudera.com
>>> >> > Date: Fri, 15 Aug 2014 16:45:02 +0530
>>> >> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
>>> >> > filesystems
>>> >> > To: user@hadoop.apache.org
>>> >> >
>>> >> > Does your non-HDFS filesystem implement a getBlockLocations API, that
>>> >> > MR relies on to know how to split files?
>>> >> >
>>> >> > The API is at
>>> >> >
>>> >> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
>>> >> > long, long), and MR calls it at
>>> >> >
>>> >> >
>>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
>>> >> >
>>> >> > If not, perhaps you can enforce a manual chunking by asking MR to use
>>> >> > custom min/max split sizes values via config properties:
>>> >> >
>>> >> >
>>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
>>> >> >
>>> >> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
>>> >> > > I've looked a bit into this problem some more, and from what
>>> >> > > another
>>> >> > > person has written, HDFS is tuned to scale appropriately [1] given
>>> >> > > the
>>> >> > > number of input splits, etc.
>>> >> > >
>>> >> > > In the case of utilizing the local filesystem (which is really a
>>> >> > > network share on a parallel filesystem), the settings might be set
>>> >> > > conservatively in order not to thrash the local disks or present a
>>> >> > > bottleneck in processing.
>>> >> > >
>>> >> > > Since this isn't a big concern, I'd rather tune the settings to
>>> >> > > efficiently utilize the local filesystem.
>>> >> > >
>>> >> > > Are there any pointers to where in the source code I could look in
>>> >> > > order to tweak such parameters?
>>> >> > >
>>> >> > > Thanks,
>>> >> > > Calvin
>>> >> > >
>>> >> > > [1]
>>> >> > >
>>> >> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>>> >> > >
>>> >> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com>
>>> >> > > wrote:
>>> >> > >> Hi all,
>>> >> > >>
>>> >> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that
>>> >> > >> running
>>> >> > >> MapReduce applications will parallelize differently depending on
>>> >> > >> what
>>> >> > >> kind of filesystem the input data is on.
>>> >> > >>
>>> >> > >> Using HDFS, a MapReduce job will spawn enough containers to
>>> >> > >> maximize
>>> >> > >> use of all available memory. For example, a 3-node cluster with
>>> >> > >> 172GB
>>> >> > >> of memory with each map task allocating 2GB, about 86 application
>>> >> > >> containers will be created.
>>> >> > >>
>>> >> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
>>> >> > >> parallel filesystem), a MapReduce job will only allocate a subset
>>> >> > >> of
>>> >> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
>>> >> > >> containers are created). Since I'm using a parallel filesystem,
>>> >> > >> I'm
>>> >> > >> not as concerned with the bottlenecks one would find if one were
>>> >> > >> to
>>> >> > >> use NFS.
>>> >> > >>
>>> >> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>>> >> > >> configuration that will allow me to effectively maximize resource
>>> >> > >> utilization?
>>> >> > >>
>>> >> > >> Thanks,
>>> >> > >> Calvin
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

OK, I figured out exactly what was happening.

I had set the configuration value "yarn.nodemanager.vmem-pmem-ratio"
to 10. Since there is no swap space available for use, every task
which is requesting 2 GB of memory is also requesting an additional 20
GB of memory. This 20 GB isn't represented in the "Memory Used" column
on the YARN applications status page and thus it seemed like I was
underutilizing the YARN cluster (when in actuality I had allocated all
the memory available).

(The cluster "underutilization" occurs regardless of using HDFS or
LocalFileSystem; I must have made this configuration change after
testing HDFS and before testing the local filesystem.)

The solution is to set  "yarn.nodemanager.vmem-pmem-ratio" to 1 (since
I have no swap) *and* "yarn.nodemanager.vmem-check.enabled" to false.

Part of the reason why I had set such a high setting was due to
containers being killed because of virtual memory usage. The Cloudera
folks have a good blog post [1] on this topic (see #6) and I wish I
had read that sooner.

With the above configuration values, I can now utilize the cluster at 100%.

Thanks for everyone's input!

Calvin

[1] http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/

On Fri, Aug 15, 2014 at 2:11 PM, java8964 <ja...@hotmail.com> wrote:
> Interesting to know that.
>
> I also want to know what underline logic holding the force to only generate
> 25-35 parallelized containers, instead of up to 1300.
>
> Another suggestion I can give is following:
>
> 1) In your driver, generate a text file, including all your 1300 bz2 file
> names with absolute path.
> 2) In your MR job, use the NLineInputFormat, with default setting, each line
> content will trigger one mapper task.
> 3) In your mapper, key/value pair will be offset byte loc/line content, just
> start to process the file, as it should be available from the mount path in
> the local data nodes.
> 4) I assume that you are using Yarn. In this case, at least 1300 container
> requests will be issued to the cluster. You generate 1300 parallelized
> request, now it is up to the cluster to decide how many containers can be
> parallel run.
>
> Yong
>
>> Date: Fri, 15 Aug 2014 12:30:09 -0600
>
>> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
>> From: iphcalvin@gmail.com
>> To: user@hadoop.apache.org
>
>>
>> Thanks for the responses!
>>
>> To clarify, I'm not using any special FileSystem implementation. An
>> example input parameter to a MapReduce job would be something like
>> "-input file:///scratch/data". Thus I think (any clarification would
>> be helpful) Hadoop is then utilizing LocalFileSystem
>> (org.apache.hadoop.fs.LocalFileSystem).
>>
>> The input data is large enough and splittable (1300 .bz2 files, 274MB
>> each, 350GB total). Thus even if it the input data weren't splittable,
>> Hadoop should be able to parallelize up to 1300 map tasks if capacity
>> is available; in my case, I find that the Hadoop cluster is not fully
>> utilized (i.e., ~25-35 containers running when it can scale up to ~80
>> containers) when not using HDFS, while achieving maximum use when
>> using HDFS.
>>
>> I'm wondering if Hadoop is "holding back" or throttling the I/O if
>> LocalFileSystem is being used, and what changes I can make to have the
>> Hadoop tasks scale.
>>
>> In the meantime, I'll take a look at the API calls that Harsh mentioned.
>>
>>
>> On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
>> > The split configurations in FIF mentioned earlier would work for local
>> > files
>> > as well. They aren't deemed unsplitable, just considered as one single
>> > block.
>> >
>> > If the FS in use has its advantages it's better to implement a proper
>> > interface to it making use of them, than to rely on the LFS by mounting
>> > it.
>> > This is what we do with HDFS.
>> >
>> > On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
>> >>
>> >> I believe that Calvin mentioned before that this parallel file system
>> >> mounted into local file system.
>> >>
>> >> In this case, will Hadoop just use java.io.File as local File system to
>> >> treat them as local file and not split the file?
>> >>
>> >> Just want to know the logic in hadoop handling the local file.
>> >>
>> >> One suggestion I can think is to split the files manually outside of
>> >> hadoop. For example, generate lots of small files as 128M or 256M size.
>> >>
>> >> In this case, each mapper will process one small file, so you can get
>> >> good
>> >> utilization of your cluster, assume you have a lot of small files.
>> >>
>> >> Yong
>> >>
>> >> > From: harsh@cloudera.com
>> >> > Date: Fri, 15 Aug 2014 16:45:02 +0530
>> >> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
>> >> > filesystems
>> >> > To: user@hadoop.apache.org
>> >> >
>> >> > Does your non-HDFS filesystem implement a getBlockLocations API, that
>> >> > MR relies on to know how to split files?
>> >> >
>> >> > The API is at
>> >> >
>> >> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
>> >> > long, long), and MR calls it at
>> >> >
>> >> >
>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
>> >> >
>> >> > If not, perhaps you can enforce a manual chunking by asking MR to use
>> >> > custom min/max split sizes values via config properties:
>> >> >
>> >> >
>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
>> >> >
>> >> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
>> >> > > I've looked a bit into this problem some more, and from what
>> >> > > another
>> >> > > person has written, HDFS is tuned to scale appropriately [1] given
>> >> > > the
>> >> > > number of input splits, etc.
>> >> > >
>> >> > > In the case of utilizing the local filesystem (which is really a
>> >> > > network share on a parallel filesystem), the settings might be set
>> >> > > conservatively in order not to thrash the local disks or present a
>> >> > > bottleneck in processing.
>> >> > >
>> >> > > Since this isn't a big concern, I'd rather tune the settings to
>> >> > > efficiently utilize the local filesystem.
>> >> > >
>> >> > > Are there any pointers to where in the source code I could look in
>> >> > > order to tweak such parameters?
>> >> > >
>> >> > > Thanks,
>> >> > > Calvin
>> >> > >
>> >> > > [1]
>> >> > >
>> >> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>> >> > >
>> >> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com>
>> >> > > wrote:
>> >> > >> Hi all,
>> >> > >>
>> >> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that
>> >> > >> running
>> >> > >> MapReduce applications will parallelize differently depending on
>> >> > >> what
>> >> > >> kind of filesystem the input data is on.
>> >> > >>
>> >> > >> Using HDFS, a MapReduce job will spawn enough containers to
>> >> > >> maximize
>> >> > >> use of all available memory. For example, a 3-node cluster with
>> >> > >> 172GB
>> >> > >> of memory with each map task allocating 2GB, about 86 application
>> >> > >> containers will be created.
>> >> > >>
>> >> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
>> >> > >> parallel filesystem), a MapReduce job will only allocate a subset
>> >> > >> of
>> >> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
>> >> > >> containers are created). Since I'm using a parallel filesystem,
>> >> > >> I'm
>> >> > >> not as concerned with the bottlenecks one would find if one were
>> >> > >> to
>> >> > >> use NFS.
>> >> > >>
>> >> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>> >> > >> configuration that will allow me to effectively maximize resource
>> >> > >> utilization?
>> >> > >>
>> >> > >> Thanks,
>> >> > >> Calvin
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

OK, I figured out exactly what was happening.

I had set the configuration value "yarn.nodemanager.vmem-pmem-ratio"
to 10. Since there is no swap space available for use, every task
which is requesting 2 GB of memory is also requesting an additional 20
GB of memory. This 20 GB isn't represented in the "Memory Used" column
on the YARN applications status page and thus it seemed like I was
underutilizing the YARN cluster (when in actuality I had allocated all
the memory available).

(The cluster "underutilization" occurs regardless of using HDFS or
LocalFileSystem; I must have made this configuration change after
testing HDFS and before testing the local filesystem.)

The solution is to set  "yarn.nodemanager.vmem-pmem-ratio" to 1 (since
I have no swap) *and* "yarn.nodemanager.vmem-check.enabled" to false.

Part of the reason why I had set such a high setting was due to
containers being killed because of virtual memory usage. The Cloudera
folks have a good blog post [1] on this topic (see #6) and I wish I
had read that sooner.

With the above configuration values, I can now utilize the cluster at 100%.

Thanks for everyone's input!

Calvin

[1] http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/

On Fri, Aug 15, 2014 at 2:11 PM, java8964 <ja...@hotmail.com> wrote:
> Interesting to know that.
>
> I also want to know what underline logic holding the force to only generate
> 25-35 parallelized containers, instead of up to 1300.
>
> Another suggestion I can give is following:
>
> 1) In your driver, generate a text file, including all your 1300 bz2 file
> names with absolute path.
> 2) In your MR job, use the NLineInputFormat, with default setting, each line
> content will trigger one mapper task.
> 3) In your mapper, key/value pair will be offset byte loc/line content, just
> start to process the file, as it should be available from the mount path in
> the local data nodes.
> 4) I assume that you are using Yarn. In this case, at least 1300 container
> requests will be issued to the cluster. You generate 1300 parallelized
> request, now it is up to the cluster to decide how many containers can be
> parallel run.
>
> Yong
>
>> Date: Fri, 15 Aug 2014 12:30:09 -0600
>
>> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
>> From: iphcalvin@gmail.com
>> To: user@hadoop.apache.org
>
>>
>> Thanks for the responses!
>>
>> To clarify, I'm not using any special FileSystem implementation. An
>> example input parameter to a MapReduce job would be something like
>> "-input file:///scratch/data". Thus I think (any clarification would
>> be helpful) Hadoop is then utilizing LocalFileSystem
>> (org.apache.hadoop.fs.LocalFileSystem).
>>
>> The input data is large enough and splittable (1300 .bz2 files, 274MB
>> each, 350GB total). Thus even if it the input data weren't splittable,
>> Hadoop should be able to parallelize up to 1300 map tasks if capacity
>> is available; in my case, I find that the Hadoop cluster is not fully
>> utilized (i.e., ~25-35 containers running when it can scale up to ~80
>> containers) when not using HDFS, while achieving maximum use when
>> using HDFS.
>>
>> I'm wondering if Hadoop is "holding back" or throttling the I/O if
>> LocalFileSystem is being used, and what changes I can make to have the
>> Hadoop tasks scale.
>>
>> In the meantime, I'll take a look at the API calls that Harsh mentioned.
>>
>>
>> On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
>> > The split configurations in FIF mentioned earlier would work for local
>> > files
>> > as well. They aren't deemed unsplitable, just considered as one single
>> > block.
>> >
>> > If the FS in use has its advantages it's better to implement a proper
>> > interface to it making use of them, than to rely on the LFS by mounting
>> > it.
>> > This is what we do with HDFS.
>> >
>> > On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
>> >>
>> >> I believe that Calvin mentioned before that this parallel file system
>> >> mounted into local file system.
>> >>
>> >> In this case, will Hadoop just use java.io.File as local File system to
>> >> treat them as local file and not split the file?
>> >>
>> >> Just want to know the logic in hadoop handling the local file.
>> >>
>> >> One suggestion I can think is to split the files manually outside of
>> >> hadoop. For example, generate lots of small files as 128M or 256M size.
>> >>
>> >> In this case, each mapper will process one small file, so you can get
>> >> good
>> >> utilization of your cluster, assume you have a lot of small files.
>> >>
>> >> Yong
>> >>
>> >> > From: harsh@cloudera.com
>> >> > Date: Fri, 15 Aug 2014 16:45:02 +0530
>> >> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
>> >> > filesystems
>> >> > To: user@hadoop.apache.org
>> >> >
>> >> > Does your non-HDFS filesystem implement a getBlockLocations API, that
>> >> > MR relies on to know how to split files?
>> >> >
>> >> > The API is at
>> >> >
>> >> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
>> >> > long, long), and MR calls it at
>> >> >
>> >> >
>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
>> >> >
>> >> > If not, perhaps you can enforce a manual chunking by asking MR to use
>> >> > custom min/max split sizes values via config properties:
>> >> >
>> >> >
>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
>> >> >
>> >> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
>> >> > > I've looked a bit into this problem some more, and from what
>> >> > > another
>> >> > > person has written, HDFS is tuned to scale appropriately [1] given
>> >> > > the
>> >> > > number of input splits, etc.
>> >> > >
>> >> > > In the case of utilizing the local filesystem (which is really a
>> >> > > network share on a parallel filesystem), the settings might be set
>> >> > > conservatively in order not to thrash the local disks or present a
>> >> > > bottleneck in processing.
>> >> > >
>> >> > > Since this isn't a big concern, I'd rather tune the settings to
>> >> > > efficiently utilize the local filesystem.
>> >> > >
>> >> > > Are there any pointers to where in the source code I could look in
>> >> > > order to tweak such parameters?
>> >> > >
>> >> > > Thanks,
>> >> > > Calvin
>> >> > >
>> >> > > [1]
>> >> > >
>> >> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>> >> > >
>> >> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com>
>> >> > > wrote:
>> >> > >> Hi all,
>> >> > >>
>> >> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that
>> >> > >> running
>> >> > >> MapReduce applications will parallelize differently depending on
>> >> > >> what
>> >> > >> kind of filesystem the input data is on.
>> >> > >>
>> >> > >> Using HDFS, a MapReduce job will spawn enough containers to
>> >> > >> maximize
>> >> > >> use of all available memory. For example, a 3-node cluster with
>> >> > >> 172GB
>> >> > >> of memory with each map task allocating 2GB, about 86 application
>> >> > >> containers will be created.
>> >> > >>
>> >> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
>> >> > >> parallel filesystem), a MapReduce job will only allocate a subset
>> >> > >> of
>> >> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
>> >> > >> containers are created). Since I'm using a parallel filesystem,
>> >> > >> I'm
>> >> > >> not as concerned with the bottlenecks one would find if one were
>> >> > >> to
>> >> > >> use NFS.
>> >> > >>
>> >> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>> >> > >> configuration that will allow me to effectively maximize resource
>> >> > >> utilization?
>> >> > >>
>> >> > >> Thanks,
>> >> > >> Calvin
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

OK, I figured out exactly what was happening.

I had set the configuration value "yarn.nodemanager.vmem-pmem-ratio"
to 10. Since there is no swap space available for use, every task
which is requesting 2 GB of memory is also requesting an additional 20
GB of memory. This 20 GB isn't represented in the "Memory Used" column
on the YARN applications status page and thus it seemed like I was
underutilizing the YARN cluster (when in actuality I had allocated all
the memory available).

(The cluster "underutilization" occurs regardless of using HDFS or
LocalFileSystem; I must have made this configuration change after
testing HDFS and before testing the local filesystem.)

The solution is to set  "yarn.nodemanager.vmem-pmem-ratio" to 1 (since
I have no swap) *and* "yarn.nodemanager.vmem-check.enabled" to false.

Part of the reason why I had set such a high setting was due to
containers being killed because of virtual memory usage. The Cloudera
folks have a good blog post [1] on this topic (see #6) and I wish I
had read that sooner.

With the above configuration values, I can now utilize the cluster at 100%.

Thanks for everyone's input!

Calvin

[1] http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/

On Fri, Aug 15, 2014 at 2:11 PM, java8964 <ja...@hotmail.com> wrote:
> Interesting to know that.
>
> I also want to know what underline logic holding the force to only generate
> 25-35 parallelized containers, instead of up to 1300.
>
> Another suggestion I can give is following:
>
> 1) In your driver, generate a text file, including all your 1300 bz2 file
> names with absolute path.
> 2) In your MR job, use the NLineInputFormat, with default setting, each line
> content will trigger one mapper task.
> 3) In your mapper, key/value pair will be offset byte loc/line content, just
> start to process the file, as it should be available from the mount path in
> the local data nodes.
> 4) I assume that you are using Yarn. In this case, at least 1300 container
> requests will be issued to the cluster. You generate 1300 parallelized
> request, now it is up to the cluster to decide how many containers can be
> parallel run.
>
> Yong
>
>> Date: Fri, 15 Aug 2014 12:30:09 -0600
>
>> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
>> From: iphcalvin@gmail.com
>> To: user@hadoop.apache.org
>
>>
>> Thanks for the responses!
>>
>> To clarify, I'm not using any special FileSystem implementation. An
>> example input parameter to a MapReduce job would be something like
>> "-input file:///scratch/data". Thus I think (any clarification would
>> be helpful) Hadoop is then utilizing LocalFileSystem
>> (org.apache.hadoop.fs.LocalFileSystem).
>>
>> The input data is large enough and splittable (1300 .bz2 files, 274MB
>> each, 350GB total). Thus even if it the input data weren't splittable,
>> Hadoop should be able to parallelize up to 1300 map tasks if capacity
>> is available; in my case, I find that the Hadoop cluster is not fully
>> utilized (i.e., ~25-35 containers running when it can scale up to ~80
>> containers) when not using HDFS, while achieving maximum use when
>> using HDFS.
>>
>> I'm wondering if Hadoop is "holding back" or throttling the I/O if
>> LocalFileSystem is being used, and what changes I can make to have the
>> Hadoop tasks scale.
>>
>> In the meantime, I'll take a look at the API calls that Harsh mentioned.
>>
>>
>> On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
>> > The split configurations in FIF mentioned earlier would work for local
>> > files
>> > as well. They aren't deemed unsplitable, just considered as one single
>> > block.
>> >
>> > If the FS in use has its advantages it's better to implement a proper
>> > interface to it making use of them, than to rely on the LFS by mounting
>> > it.
>> > This is what we do with HDFS.
>> >
>> > On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
>> >>
>> >> I believe that Calvin mentioned before that this parallel file system
>> >> mounted into local file system.
>> >>
>> >> In this case, will Hadoop just use java.io.File as local File system to
>> >> treat them as local file and not split the file?
>> >>
>> >> Just want to know the logic in hadoop handling the local file.
>> >>
>> >> One suggestion I can think is to split the files manually outside of
>> >> hadoop. For example, generate lots of small files as 128M or 256M size.
>> >>
>> >> In this case, each mapper will process one small file, so you can get
>> >> good
>> >> utilization of your cluster, assume you have a lot of small files.
>> >>
>> >> Yong
>> >>
>> >> > From: harsh@cloudera.com
>> >> > Date: Fri, 15 Aug 2014 16:45:02 +0530
>> >> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
>> >> > filesystems
>> >> > To: user@hadoop.apache.org
>> >> >
>> >> > Does your non-HDFS filesystem implement a getBlockLocations API, that
>> >> > MR relies on to know how to split files?
>> >> >
>> >> > The API is at
>> >> >
>> >> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
>> >> > long, long), and MR calls it at
>> >> >
>> >> >
>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
>> >> >
>> >> > If not, perhaps you can enforce a manual chunking by asking MR to use
>> >> > custom min/max split sizes values via config properties:
>> >> >
>> >> >
>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
>> >> >
>> >> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
>> >> > > I've looked a bit into this problem some more, and from what
>> >> > > another
>> >> > > person has written, HDFS is tuned to scale appropriately [1] given
>> >> > > the
>> >> > > number of input splits, etc.
>> >> > >
>> >> > > In the case of utilizing the local filesystem (which is really a
>> >> > > network share on a parallel filesystem), the settings might be set
>> >> > > conservatively in order not to thrash the local disks or present a
>> >> > > bottleneck in processing.
>> >> > >
>> >> > > Since this isn't a big concern, I'd rather tune the settings to
>> >> > > efficiently utilize the local filesystem.
>> >> > >
>> >> > > Are there any pointers to where in the source code I could look in
>> >> > > order to tweak such parameters?
>> >> > >
>> >> > > Thanks,
>> >> > > Calvin
>> >> > >
>> >> > > [1]
>> >> > >
>> >> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>> >> > >
>> >> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com>
>> >> > > wrote:
>> >> > >> Hi all,
>> >> > >>
>> >> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that
>> >> > >> running
>> >> > >> MapReduce applications will parallelize differently depending on
>> >> > >> what
>> >> > >> kind of filesystem the input data is on.
>> >> > >>
>> >> > >> Using HDFS, a MapReduce job will spawn enough containers to
>> >> > >> maximize
>> >> > >> use of all available memory. For example, a 3-node cluster with
>> >> > >> 172GB
>> >> > >> of memory with each map task allocating 2GB, about 86 application
>> >> > >> containers will be created.
>> >> > >>
>> >> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
>> >> > >> parallel filesystem), a MapReduce job will only allocate a subset
>> >> > >> of
>> >> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
>> >> > >> containers are created). Since I'm using a parallel filesystem,
>> >> > >> I'm
>> >> > >> not as concerned with the bottlenecks one would find if one were
>> >> > >> to
>> >> > >> use NFS.
>> >> > >>
>> >> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>> >> > >> configuration that will allow me to effectively maximize resource
>> >> > >> utilization?
>> >> > >>
>> >> > >> Thanks,
>> >> > >> Calvin
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

OK, I figured out exactly what was happening.

I had set the configuration value "yarn.nodemanager.vmem-pmem-ratio"
to 10. Since there is no swap space available for use, every task
which is requesting 2 GB of memory is also requesting an additional 20
GB of memory. This 20 GB isn't represented in the "Memory Used" column
on the YARN applications status page and thus it seemed like I was
underutilizing the YARN cluster (when in actuality I had allocated all
the memory available).

(The cluster "underutilization" occurs regardless of using HDFS or
LocalFileSystem; I must have made this configuration change after
testing HDFS and before testing the local filesystem.)

The solution is to set  "yarn.nodemanager.vmem-pmem-ratio" to 1 (since
I have no swap) *and* "yarn.nodemanager.vmem-check.enabled" to false.

Part of the reason why I had set such a high setting was due to
containers being killed because of virtual memory usage. The Cloudera
folks have a good blog post [1] on this topic (see #6) and I wish I
had read that sooner.

With the above configuration values, I can now utilize the cluster at 100%.

Thanks for everyone's input!

Calvin

[1] http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/

On Fri, Aug 15, 2014 at 2:11 PM, java8964 <ja...@hotmail.com> wrote:
> Interesting to know that.
>
> I also want to know what underline logic holding the force to only generate
> 25-35 parallelized containers, instead of up to 1300.
>
> Another suggestion I can give is following:
>
> 1) In your driver, generate a text file, including all your 1300 bz2 file
> names with absolute path.
> 2) In your MR job, use the NLineInputFormat, with default setting, each line
> content will trigger one mapper task.
> 3) In your mapper, key/value pair will be offset byte loc/line content, just
> start to process the file, as it should be available from the mount path in
> the local data nodes.
> 4) I assume that you are using Yarn. In this case, at least 1300 container
> requests will be issued to the cluster. You generate 1300 parallelized
> request, now it is up to the cluster to decide how many containers can be
> parallel run.
>
> Yong
>
>> Date: Fri, 15 Aug 2014 12:30:09 -0600
>
>> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
>> From: iphcalvin@gmail.com
>> To: user@hadoop.apache.org
>
>>
>> Thanks for the responses!
>>
>> To clarify, I'm not using any special FileSystem implementation. An
>> example input parameter to a MapReduce job would be something like
>> "-input file:///scratch/data". Thus I think (any clarification would
>> be helpful) Hadoop is then utilizing LocalFileSystem
>> (org.apache.hadoop.fs.LocalFileSystem).
>>
>> The input data is large enough and splittable (1300 .bz2 files, 274MB
>> each, 350GB total). Thus even if it the input data weren't splittable,
>> Hadoop should be able to parallelize up to 1300 map tasks if capacity
>> is available; in my case, I find that the Hadoop cluster is not fully
>> utilized (i.e., ~25-35 containers running when it can scale up to ~80
>> containers) when not using HDFS, while achieving maximum use when
>> using HDFS.
>>
>> I'm wondering if Hadoop is "holding back" or throttling the I/O if
>> LocalFileSystem is being used, and what changes I can make to have the
>> Hadoop tasks scale.
>>
>> In the meantime, I'll take a look at the API calls that Harsh mentioned.
>>
>>
>> On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
>> > The split configurations in FIF mentioned earlier would work for local
>> > files
>> > as well. They aren't deemed unsplitable, just considered as one single
>> > block.
>> >
>> > If the FS in use has its advantages it's better to implement a proper
>> > interface to it making use of them, than to rely on the LFS by mounting
>> > it.
>> > This is what we do with HDFS.
>> >
>> > On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
>> >>
>> >> I believe that Calvin mentioned before that this parallel file system
>> >> mounted into local file system.
>> >>
>> >> In this case, will Hadoop just use java.io.File as local File system to
>> >> treat them as local file and not split the file?
>> >>
>> >> Just want to know the logic in hadoop handling the local file.
>> >>
>> >> One suggestion I can think is to split the files manually outside of
>> >> hadoop. For example, generate lots of small files as 128M or 256M size.
>> >>
>> >> In this case, each mapper will process one small file, so you can get
>> >> good
>> >> utilization of your cluster, assume you have a lot of small files.
>> >>
>> >> Yong
>> >>
>> >> > From: harsh@cloudera.com
>> >> > Date: Fri, 15 Aug 2014 16:45:02 +0530
>> >> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
>> >> > filesystems
>> >> > To: user@hadoop.apache.org
>> >> >
>> >> > Does your non-HDFS filesystem implement a getBlockLocations API, that
>> >> > MR relies on to know how to split files?
>> >> >
>> >> > The API is at
>> >> >
>> >> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
>> >> > long, long), and MR calls it at
>> >> >
>> >> >
>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
>> >> >
>> >> > If not, perhaps you can enforce a manual chunking by asking MR to use
>> >> > custom min/max split sizes values via config properties:
>> >> >
>> >> >
>> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
>> >> >
>> >> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
>> >> > > I've looked a bit into this problem some more, and from what
>> >> > > another
>> >> > > person has written, HDFS is tuned to scale appropriately [1] given
>> >> > > the
>> >> > > number of input splits, etc.
>> >> > >
>> >> > > In the case of utilizing the local filesystem (which is really a
>> >> > > network share on a parallel filesystem), the settings might be set
>> >> > > conservatively in order not to thrash the local disks or present a
>> >> > > bottleneck in processing.
>> >> > >
>> >> > > Since this isn't a big concern, I'd rather tune the settings to
>> >> > > efficiently utilize the local filesystem.
>> >> > >
>> >> > > Are there any pointers to where in the source code I could look in
>> >> > > order to tweak such parameters?
>> >> > >
>> >> > > Thanks,
>> >> > > Calvin
>> >> > >
>> >> > > [1]
>> >> > >
>> >> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>> >> > >
>> >> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com>
>> >> > > wrote:
>> >> > >> Hi all,
>> >> > >>
>> >> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that
>> >> > >> running
>> >> > >> MapReduce applications will parallelize differently depending on
>> >> > >> what
>> >> > >> kind of filesystem the input data is on.
>> >> > >>
>> >> > >> Using HDFS, a MapReduce job will spawn enough containers to
>> >> > >> maximize
>> >> > >> use of all available memory. For example, a 3-node cluster with
>> >> > >> 172GB
>> >> > >> of memory with each map task allocating 2GB, about 86 application
>> >> > >> containers will be created.
>> >> > >>
>> >> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
>> >> > >> parallel filesystem), a MapReduce job will only allocate a subset
>> >> > >> of
>> >> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
>> >> > >> containers are created). Since I'm using a parallel filesystem,
>> >> > >> I'm
>> >> > >> not as concerned with the bottlenecks one would find if one were
>> >> > >> to
>> >> > >> use NFS.
>> >> > >>
>> >> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>> >> > >> configuration that will allow me to effectively maximize resource
>> >> > >> utilization?
>> >> > >>
>> >> > >> Thanks,
>> >> > >> Calvin
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Harsh J

RE: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by java8964 <ja...@hotmail.com>.

Interesting to know that.
I also want to know what underline logic holding the force to only generate 25-35 parallelized containers, instead of up to 1300.
Another suggestion I can give is following:
1) In your driver, generate a text file, including all your 1300 bz2 file names with absolute path.2) In your MR job, use the NLineInputFormat, with default setting, each line content will trigger one mapper task.3) In your mapper, key/value pair will be offset byte loc/line content, just start to process the file, as it should be available from the mount path in the local data nodes.4) I assume that you are using Yarn. In this case, at least 1300 container requests will be issued to the cluster. You generate 1300 parallelized request, now it is up to the cluster to decide how many containers can be parallel run.
Yong

> Date: Fri, 15 Aug 2014 12:30:09 -0600
> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> From: iphcalvin@gmail.com
> To: user@hadoop.apache.org
> 
> Thanks for the responses!
> 
> To clarify, I'm not using any special FileSystem implementation. An
> example input parameter to a MapReduce job would be something like
> "-input file:///scratch/data". Thus I think (any clarification would
> be helpful) Hadoop is then utilizing LocalFileSystem
> (org.apache.hadoop.fs.LocalFileSystem).
> 
> The input data is large enough and splittable (1300 .bz2 files, 274MB
> each, 350GB total). Thus even if it the input data weren't splittable,
> Hadoop should be able to parallelize up to 1300 map tasks if capacity
> is available; in my case, I find that the Hadoop cluster is not fully
> utilized (i.e., ~25-35 containers running when it can scale up to ~80
> containers) when not using HDFS, while achieving maximum use when
> using HDFS.
> 
> I'm wondering if Hadoop is "holding back" or throttling the I/O if
> LocalFileSystem is being used, and what changes I can make to have the
> Hadoop tasks scale.
> 
> In the meantime, I'll take a look at the API calls that Harsh mentioned.
> 
> 
> On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
> > The split configurations in FIF mentioned earlier would work for local files
> > as well. They aren't deemed unsplitable, just considered as one single
> > block.
> >
> > If the FS in use has its advantages it's better to implement a proper
> > interface to it making use of them, than to rely on the LFS by mounting it.
> > This is what we do with HDFS.
> >
> > On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
> >>
> >> I believe that Calvin mentioned before that this parallel file system
> >> mounted into local file system.
> >>
> >> In this case, will Hadoop just use java.io.File as local File system to
> >> treat them as local file and not split the file?
> >>
> >> Just want to know the logic in hadoop handling the local file.
> >>
> >> One suggestion I can think is to split the files manually outside of
> >> hadoop. For example, generate lots of small files as 128M or 256M size.
> >>
> >> In this case, each mapper will process one small file, so you can get good
> >> utilization of your cluster, assume you have a lot of small files.
> >>
> >> Yong
> >>
> >> > From: harsh@cloudera.com
> >> > Date: Fri, 15 Aug 2014 16:45:02 +0530
> >> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
> >> > filesystems
> >> > To: user@hadoop.apache.org
> >> >
> >> > Does your non-HDFS filesystem implement a getBlockLocations API, that
> >> > MR relies on to know how to split files?
> >> >
> >> > The API is at
> >> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
> >> > long, long), and MR calls it at
> >> >
> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> >> >
> >> > If not, perhaps you can enforce a manual chunking by asking MR to use
> >> > custom min/max split sizes values via config properties:
> >> >
> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> >> >
> >> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> >> > > I've looked a bit into this problem some more, and from what another
> >> > > person has written, HDFS is tuned to scale appropriately [1] given the
> >> > > number of input splits, etc.
> >> > >
> >> > > In the case of utilizing the local filesystem (which is really a
> >> > > network share on a parallel filesystem), the settings might be set
> >> > > conservatively in order not to thrash the local disks or present a
> >> > > bottleneck in processing.
> >> > >
> >> > > Since this isn't a big concern, I'd rather tune the settings to
> >> > > efficiently utilize the local filesystem.
> >> > >
> >> > > Are there any pointers to where in the source code I could look in
> >> > > order to tweak such parameters?
> >> > >
> >> > > Thanks,
> >> > > Calvin
> >> > >
> >> > > [1]
> >> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> >> > >
> >> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> >> > >> Hi all,
> >> > >>
> >> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> >> > >> MapReduce applications will parallelize differently depending on what
> >> > >> kind of filesystem the input data is on.
> >> > >>
> >> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> >> > >> use of all available memory. For example, a 3-node cluster with 172GB
> >> > >> of memory with each map task allocating 2GB, about 86 application
> >> > >> containers will be created.
> >> > >>
> >> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> >> > >> parallel filesystem), a MapReduce job will only allocate a subset of
> >> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
> >> > >> containers are created). Since I'm using a parallel filesystem, I'm
> >> > >> not as concerned with the bottlenecks one would find if one were to
> >> > >> use NFS.
> >> > >>
> >> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> >> > >> configuration that will allow me to effectively maximize resource
> >> > >> utilization?
> >> > >>
> >> > >> Thanks,
> >> > >> Calvin
> >> >
> >> >
> >> >
> >> > --
> >> > Harsh J

RE: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by java8964 <ja...@hotmail.com>.

Interesting to know that.
I also want to know what underline logic holding the force to only generate 25-35 parallelized containers, instead of up to 1300.
Another suggestion I can give is following:
1) In your driver, generate a text file, including all your 1300 bz2 file names with absolute path.2) In your MR job, use the NLineInputFormat, with default setting, each line content will trigger one mapper task.3) In your mapper, key/value pair will be offset byte loc/line content, just start to process the file, as it should be available from the mount path in the local data nodes.4) I assume that you are using Yarn. In this case, at least 1300 container requests will be issued to the cluster. You generate 1300 parallelized request, now it is up to the cluster to decide how many containers can be parallel run.
Yong

> Date: Fri, 15 Aug 2014 12:30:09 -0600
> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> From: iphcalvin@gmail.com
> To: user@hadoop.apache.org
> 
> Thanks for the responses!
> 
> To clarify, I'm not using any special FileSystem implementation. An
> example input parameter to a MapReduce job would be something like
> "-input file:///scratch/data". Thus I think (any clarification would
> be helpful) Hadoop is then utilizing LocalFileSystem
> (org.apache.hadoop.fs.LocalFileSystem).
> 
> The input data is large enough and splittable (1300 .bz2 files, 274MB
> each, 350GB total). Thus even if it the input data weren't splittable,
> Hadoop should be able to parallelize up to 1300 map tasks if capacity
> is available; in my case, I find that the Hadoop cluster is not fully
> utilized (i.e., ~25-35 containers running when it can scale up to ~80
> containers) when not using HDFS, while achieving maximum use when
> using HDFS.
> 
> I'm wondering if Hadoop is "holding back" or throttling the I/O if
> LocalFileSystem is being used, and what changes I can make to have the
> Hadoop tasks scale.
> 
> In the meantime, I'll take a look at the API calls that Harsh mentioned.
> 
> 
> On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
> > The split configurations in FIF mentioned earlier would work for local files
> > as well. They aren't deemed unsplitable, just considered as one single
> > block.
> >
> > If the FS in use has its advantages it's better to implement a proper
> > interface to it making use of them, than to rely on the LFS by mounting it.
> > This is what we do with HDFS.
> >
> > On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
> >>
> >> I believe that Calvin mentioned before that this parallel file system
> >> mounted into local file system.
> >>
> >> In this case, will Hadoop just use java.io.File as local File system to
> >> treat them as local file and not split the file?
> >>
> >> Just want to know the logic in hadoop handling the local file.
> >>
> >> One suggestion I can think is to split the files manually outside of
> >> hadoop. For example, generate lots of small files as 128M or 256M size.
> >>
> >> In this case, each mapper will process one small file, so you can get good
> >> utilization of your cluster, assume you have a lot of small files.
> >>
> >> Yong
> >>
> >> > From: harsh@cloudera.com
> >> > Date: Fri, 15 Aug 2014 16:45:02 +0530
> >> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
> >> > filesystems
> >> > To: user@hadoop.apache.org
> >> >
> >> > Does your non-HDFS filesystem implement a getBlockLocations API, that
> >> > MR relies on to know how to split files?
> >> >
> >> > The API is at
> >> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
> >> > long, long), and MR calls it at
> >> >
> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> >> >
> >> > If not, perhaps you can enforce a manual chunking by asking MR to use
> >> > custom min/max split sizes values via config properties:
> >> >
> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> >> >
> >> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> >> > > I've looked a bit into this problem some more, and from what another
> >> > > person has written, HDFS is tuned to scale appropriately [1] given the
> >> > > number of input splits, etc.
> >> > >
> >> > > In the case of utilizing the local filesystem (which is really a
> >> > > network share on a parallel filesystem), the settings might be set
> >> > > conservatively in order not to thrash the local disks or present a
> >> > > bottleneck in processing.
> >> > >
> >> > > Since this isn't a big concern, I'd rather tune the settings to
> >> > > efficiently utilize the local filesystem.
> >> > >
> >> > > Are there any pointers to where in the source code I could look in
> >> > > order to tweak such parameters?
> >> > >
> >> > > Thanks,
> >> > > Calvin
> >> > >
> >> > > [1]
> >> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> >> > >
> >> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> >> > >> Hi all,
> >> > >>
> >> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> >> > >> MapReduce applications will parallelize differently depending on what
> >> > >> kind of filesystem the input data is on.
> >> > >>
> >> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> >> > >> use of all available memory. For example, a 3-node cluster with 172GB
> >> > >> of memory with each map task allocating 2GB, about 86 application
> >> > >> containers will be created.
> >> > >>
> >> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> >> > >> parallel filesystem), a MapReduce job will only allocate a subset of
> >> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
> >> > >> containers are created). Since I'm using a parallel filesystem, I'm
> >> > >> not as concerned with the bottlenecks one would find if one were to
> >> > >> use NFS.
> >> > >>
> >> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> >> > >> configuration that will allow me to effectively maximize resource
> >> > >> utilization?
> >> > >>
> >> > >> Thanks,
> >> > >> Calvin
> >> >
> >> >
> >> >
> >> > --
> >> > Harsh J

RE: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by java8964 <ja...@hotmail.com>.

Interesting to know that.
I also want to know what underline logic holding the force to only generate 25-35 parallelized containers, instead of up to 1300.
Another suggestion I can give is following:
1) In your driver, generate a text file, including all your 1300 bz2 file names with absolute path.2) In your MR job, use the NLineInputFormat, with default setting, each line content will trigger one mapper task.3) In your mapper, key/value pair will be offset byte loc/line content, just start to process the file, as it should be available from the mount path in the local data nodes.4) I assume that you are using Yarn. In this case, at least 1300 container requests will be issued to the cluster. You generate 1300 parallelized request, now it is up to the cluster to decide how many containers can be parallel run.
Yong

> Date: Fri, 15 Aug 2014 12:30:09 -0600
> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> From: iphcalvin@gmail.com
> To: user@hadoop.apache.org
> 
> Thanks for the responses!
> 
> To clarify, I'm not using any special FileSystem implementation. An
> example input parameter to a MapReduce job would be something like
> "-input file:///scratch/data". Thus I think (any clarification would
> be helpful) Hadoop is then utilizing LocalFileSystem
> (org.apache.hadoop.fs.LocalFileSystem).
> 
> The input data is large enough and splittable (1300 .bz2 files, 274MB
> each, 350GB total). Thus even if it the input data weren't splittable,
> Hadoop should be able to parallelize up to 1300 map tasks if capacity
> is available; in my case, I find that the Hadoop cluster is not fully
> utilized (i.e., ~25-35 containers running when it can scale up to ~80
> containers) when not using HDFS, while achieving maximum use when
> using HDFS.
> 
> I'm wondering if Hadoop is "holding back" or throttling the I/O if
> LocalFileSystem is being used, and what changes I can make to have the
> Hadoop tasks scale.
> 
> In the meantime, I'll take a look at the API calls that Harsh mentioned.
> 
> 
> On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
> > The split configurations in FIF mentioned earlier would work for local files
> > as well. They aren't deemed unsplitable, just considered as one single
> > block.
> >
> > If the FS in use has its advantages it's better to implement a proper
> > interface to it making use of them, than to rely on the LFS by mounting it.
> > This is what we do with HDFS.
> >
> > On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
> >>
> >> I believe that Calvin mentioned before that this parallel file system
> >> mounted into local file system.
> >>
> >> In this case, will Hadoop just use java.io.File as local File system to
> >> treat them as local file and not split the file?
> >>
> >> Just want to know the logic in hadoop handling the local file.
> >>
> >> One suggestion I can think is to split the files manually outside of
> >> hadoop. For example, generate lots of small files as 128M or 256M size.
> >>
> >> In this case, each mapper will process one small file, so you can get good
> >> utilization of your cluster, assume you have a lot of small files.
> >>
> >> Yong
> >>
> >> > From: harsh@cloudera.com
> >> > Date: Fri, 15 Aug 2014 16:45:02 +0530
> >> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
> >> > filesystems
> >> > To: user@hadoop.apache.org
> >> >
> >> > Does your non-HDFS filesystem implement a getBlockLocations API, that
> >> > MR relies on to know how to split files?
> >> >
> >> > The API is at
> >> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
> >> > long, long), and MR calls it at
> >> >
> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> >> >
> >> > If not, perhaps you can enforce a manual chunking by asking MR to use
> >> > custom min/max split sizes values via config properties:
> >> >
> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> >> >
> >> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> >> > > I've looked a bit into this problem some more, and from what another
> >> > > person has written, HDFS is tuned to scale appropriately [1] given the
> >> > > number of input splits, etc.
> >> > >
> >> > > In the case of utilizing the local filesystem (which is really a
> >> > > network share on a parallel filesystem), the settings might be set
> >> > > conservatively in order not to thrash the local disks or present a
> >> > > bottleneck in processing.
> >> > >
> >> > > Since this isn't a big concern, I'd rather tune the settings to
> >> > > efficiently utilize the local filesystem.
> >> > >
> >> > > Are there any pointers to where in the source code I could look in
> >> > > order to tweak such parameters?
> >> > >
> >> > > Thanks,
> >> > > Calvin
> >> > >
> >> > > [1]
> >> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> >> > >
> >> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> >> > >> Hi all,
> >> > >>
> >> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> >> > >> MapReduce applications will parallelize differently depending on what
> >> > >> kind of filesystem the input data is on.
> >> > >>
> >> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> >> > >> use of all available memory. For example, a 3-node cluster with 172GB
> >> > >> of memory with each map task allocating 2GB, about 86 application
> >> > >> containers will be created.
> >> > >>
> >> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> >> > >> parallel filesystem), a MapReduce job will only allocate a subset of
> >> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
> >> > >> containers are created). Since I'm using a parallel filesystem, I'm
> >> > >> not as concerned with the bottlenecks one would find if one were to
> >> > >> use NFS.
> >> > >>
> >> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> >> > >> configuration that will allow me to effectively maximize resource
> >> > >> utilization?
> >> > >>
> >> > >> Thanks,
> >> > >> Calvin
> >> >
> >> >
> >> >
> >> > --
> >> > Harsh J

RE: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by java8964 <ja...@hotmail.com>.

Interesting to know that.
I also want to know what underline logic holding the force to only generate 25-35 parallelized containers, instead of up to 1300.
Another suggestion I can give is following:
1) In your driver, generate a text file, including all your 1300 bz2 file names with absolute path.2) In your MR job, use the NLineInputFormat, with default setting, each line content will trigger one mapper task.3) In your mapper, key/value pair will be offset byte loc/line content, just start to process the file, as it should be available from the mount path in the local data nodes.4) I assume that you are using Yarn. In this case, at least 1300 container requests will be issued to the cluster. You generate 1300 parallelized request, now it is up to the cluster to decide how many containers can be parallel run.
Yong

> Date: Fri, 15 Aug 2014 12:30:09 -0600
> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> From: iphcalvin@gmail.com
> To: user@hadoop.apache.org
> 
> Thanks for the responses!
> 
> To clarify, I'm not using any special FileSystem implementation. An
> example input parameter to a MapReduce job would be something like
> "-input file:///scratch/data". Thus I think (any clarification would
> be helpful) Hadoop is then utilizing LocalFileSystem
> (org.apache.hadoop.fs.LocalFileSystem).
> 
> The input data is large enough and splittable (1300 .bz2 files, 274MB
> each, 350GB total). Thus even if it the input data weren't splittable,
> Hadoop should be able to parallelize up to 1300 map tasks if capacity
> is available; in my case, I find that the Hadoop cluster is not fully
> utilized (i.e., ~25-35 containers running when it can scale up to ~80
> containers) when not using HDFS, while achieving maximum use when
> using HDFS.
> 
> I'm wondering if Hadoop is "holding back" or throttling the I/O if
> LocalFileSystem is being used, and what changes I can make to have the
> Hadoop tasks scale.
> 
> In the meantime, I'll take a look at the API calls that Harsh mentioned.
> 
> 
> On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
> > The split configurations in FIF mentioned earlier would work for local files
> > as well. They aren't deemed unsplitable, just considered as one single
> > block.
> >
> > If the FS in use has its advantages it's better to implement a proper
> > interface to it making use of them, than to rely on the LFS by mounting it.
> > This is what we do with HDFS.
> >
> > On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
> >>
> >> I believe that Calvin mentioned before that this parallel file system
> >> mounted into local file system.
> >>
> >> In this case, will Hadoop just use java.io.File as local File system to
> >> treat them as local file and not split the file?
> >>
> >> Just want to know the logic in hadoop handling the local file.
> >>
> >> One suggestion I can think is to split the files manually outside of
> >> hadoop. For example, generate lots of small files as 128M or 256M size.
> >>
> >> In this case, each mapper will process one small file, so you can get good
> >> utilization of your cluster, assume you have a lot of small files.
> >>
> >> Yong
> >>
> >> > From: harsh@cloudera.com
> >> > Date: Fri, 15 Aug 2014 16:45:02 +0530
> >> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
> >> > filesystems
> >> > To: user@hadoop.apache.org
> >> >
> >> > Does your non-HDFS filesystem implement a getBlockLocations API, that
> >> > MR relies on to know how to split files?
> >> >
> >> > The API is at
> >> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
> >> > long, long), and MR calls it at
> >> >
> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> >> >
> >> > If not, perhaps you can enforce a manual chunking by asking MR to use
> >> > custom min/max split sizes values via config properties:
> >> >
> >> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> >> >
> >> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> >> > > I've looked a bit into this problem some more, and from what another
> >> > > person has written, HDFS is tuned to scale appropriately [1] given the
> >> > > number of input splits, etc.
> >> > >
> >> > > In the case of utilizing the local filesystem (which is really a
> >> > > network share on a parallel filesystem), the settings might be set
> >> > > conservatively in order not to thrash the local disks or present a
> >> > > bottleneck in processing.
> >> > >
> >> > > Since this isn't a big concern, I'd rather tune the settings to
> >> > > efficiently utilize the local filesystem.
> >> > >
> >> > > Are there any pointers to where in the source code I could look in
> >> > > order to tweak such parameters?
> >> > >
> >> > > Thanks,
> >> > > Calvin
> >> > >
> >> > > [1]
> >> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> >> > >
> >> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> >> > >> Hi all,
> >> > >>
> >> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> >> > >> MapReduce applications will parallelize differently depending on what
> >> > >> kind of filesystem the input data is on.
> >> > >>
> >> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> >> > >> use of all available memory. For example, a 3-node cluster with 172GB
> >> > >> of memory with each map task allocating 2GB, about 86 application
> >> > >> containers will be created.
> >> > >>
> >> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> >> > >> parallel filesystem), a MapReduce job will only allocate a subset of
> >> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
> >> > >> containers are created). Since I'm using a parallel filesystem, I'm
> >> > >> not as concerned with the bottlenecks one would find if one were to
> >> > >> use NFS.
> >> > >>
> >> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> >> > >> configuration that will allow me to effectively maximize resource
> >> > >> utilization?
> >> > >>
> >> > >> Thanks,
> >> > >> Calvin
> >> >
> >> >
> >> >
> >> > --
> >> > Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

Thanks for the responses!

To clarify, I'm not using any special FileSystem implementation. An
example input parameter to a MapReduce job would be something like
"-input file:///scratch/data". Thus I think (any clarification would
be helpful) Hadoop is then utilizing LocalFileSystem
(org.apache.hadoop.fs.LocalFileSystem).

The input data is large enough and splittable (1300 .bz2 files, 274MB
each, 350GB total). Thus even if it the input data weren't splittable,
Hadoop should be able to parallelize up to 1300 map tasks if capacity
is available; in my case, I find that the Hadoop cluster is not fully
utilized (i.e., ~25-35 containers running when it can scale up to ~80
containers) when not using HDFS, while achieving maximum use when
using HDFS.

I'm wondering if Hadoop is "holding back" or throttling the I/O if
LocalFileSystem is being used, and what changes I can make to have the
Hadoop tasks scale.

In the meantime, I'll take a look at the API calls that Harsh mentioned.


On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
> The split configurations in FIF mentioned earlier would work for local files
> as well. They aren't deemed unsplitable, just considered as one single
> block.
>
> If the FS in use has its advantages it's better to implement a proper
> interface to it making use of them, than to rely on the LFS by mounting it.
> This is what we do with HDFS.
>
> On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
>>
>> I believe that Calvin mentioned before that this parallel file system
>> mounted into local file system.
>>
>> In this case, will Hadoop just use java.io.File as local File system to
>> treat them as local file and not split the file?
>>
>> Just want to know the logic in hadoop handling the local file.
>>
>> One suggestion I can think is to split the files manually outside of
>> hadoop. For example, generate lots of small files as 128M or 256M size.
>>
>> In this case, each mapper will process one small file, so you can get good
>> utilization of your cluster, assume you have a lot of small files.
>>
>> Yong
>>
>> > From: harsh@cloudera.com
>> > Date: Fri, 15 Aug 2014 16:45:02 +0530
>> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
>> > filesystems
>> > To: user@hadoop.apache.org
>> >
>> > Does your non-HDFS filesystem implement a getBlockLocations API, that
>> > MR relies on to know how to split files?
>> >
>> > The API is at
>> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
>> > long, long), and MR calls it at
>> >
>> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
>> >
>> > If not, perhaps you can enforce a manual chunking by asking MR to use
>> > custom min/max split sizes values via config properties:
>> >
>> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
>> >
>> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
>> > > I've looked a bit into this problem some more, and from what another
>> > > person has written, HDFS is tuned to scale appropriately [1] given the
>> > > number of input splits, etc.
>> > >
>> > > In the case of utilizing the local filesystem (which is really a
>> > > network share on a parallel filesystem), the settings might be set
>> > > conservatively in order not to thrash the local disks or present a
>> > > bottleneck in processing.
>> > >
>> > > Since this isn't a big concern, I'd rather tune the settings to
>> > > efficiently utilize the local filesystem.
>> > >
>> > > Are there any pointers to where in the source code I could look in
>> > > order to tweak such parameters?
>> > >
>> > > Thanks,
>> > > Calvin
>> > >
>> > > [1]
>> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>> > >
>> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
>> > >> Hi all,
>> > >>
>> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
>> > >> MapReduce applications will parallelize differently depending on what
>> > >> kind of filesystem the input data is on.
>> > >>
>> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
>> > >> use of all available memory. For example, a 3-node cluster with 172GB
>> > >> of memory with each map task allocating 2GB, about 86 application
>> > >> containers will be created.
>> > >>
>> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
>> > >> parallel filesystem), a MapReduce job will only allocate a subset of
>> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
>> > >> containers are created). Since I'm using a parallel filesystem, I'm
>> > >> not as concerned with the bottlenecks one would find if one were to
>> > >> use NFS.
>> > >>
>> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>> > >> configuration that will allow me to effectively maximize resource
>> > >> utilization?
>> > >>
>> > >> Thanks,
>> > >> Calvin
>> >
>> >
>> >
>> > --
>> > Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

Thanks for the responses!

To clarify, I'm not using any special FileSystem implementation. An
example input parameter to a MapReduce job would be something like
"-input file:///scratch/data". Thus I think (any clarification would
be helpful) Hadoop is then utilizing LocalFileSystem
(org.apache.hadoop.fs.LocalFileSystem).

The input data is large enough and splittable (1300 .bz2 files, 274MB
each, 350GB total). Thus even if it the input data weren't splittable,
Hadoop should be able to parallelize up to 1300 map tasks if capacity
is available; in my case, I find that the Hadoop cluster is not fully
utilized (i.e., ~25-35 containers running when it can scale up to ~80
containers) when not using HDFS, while achieving maximum use when
using HDFS.

I'm wondering if Hadoop is "holding back" or throttling the I/O if
LocalFileSystem is being used, and what changes I can make to have the
Hadoop tasks scale.

In the meantime, I'll take a look at the API calls that Harsh mentioned.


On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
> The split configurations in FIF mentioned earlier would work for local files
> as well. They aren't deemed unsplitable, just considered as one single
> block.
>
> If the FS in use has its advantages it's better to implement a proper
> interface to it making use of them, than to rely on the LFS by mounting it.
> This is what we do with HDFS.
>
> On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
>>
>> I believe that Calvin mentioned before that this parallel file system
>> mounted into local file system.
>>
>> In this case, will Hadoop just use java.io.File as local File system to
>> treat them as local file and not split the file?
>>
>> Just want to know the logic in hadoop handling the local file.
>>
>> One suggestion I can think is to split the files manually outside of
>> hadoop. For example, generate lots of small files as 128M or 256M size.
>>
>> In this case, each mapper will process one small file, so you can get good
>> utilization of your cluster, assume you have a lot of small files.
>>
>> Yong
>>
>> > From: harsh@cloudera.com
>> > Date: Fri, 15 Aug 2014 16:45:02 +0530
>> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
>> > filesystems
>> > To: user@hadoop.apache.org
>> >
>> > Does your non-HDFS filesystem implement a getBlockLocations API, that
>> > MR relies on to know how to split files?
>> >
>> > The API is at
>> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
>> > long, long), and MR calls it at
>> >
>> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
>> >
>> > If not, perhaps you can enforce a manual chunking by asking MR to use
>> > custom min/max split sizes values via config properties:
>> >
>> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
>> >
>> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
>> > > I've looked a bit into this problem some more, and from what another
>> > > person has written, HDFS is tuned to scale appropriately [1] given the
>> > > number of input splits, etc.
>> > >
>> > > In the case of utilizing the local filesystem (which is really a
>> > > network share on a parallel filesystem), the settings might be set
>> > > conservatively in order not to thrash the local disks or present a
>> > > bottleneck in processing.
>> > >
>> > > Since this isn't a big concern, I'd rather tune the settings to
>> > > efficiently utilize the local filesystem.
>> > >
>> > > Are there any pointers to where in the source code I could look in
>> > > order to tweak such parameters?
>> > >
>> > > Thanks,
>> > > Calvin
>> > >
>> > > [1]
>> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>> > >
>> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
>> > >> Hi all,
>> > >>
>> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
>> > >> MapReduce applications will parallelize differently depending on what
>> > >> kind of filesystem the input data is on.
>> > >>
>> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
>> > >> use of all available memory. For example, a 3-node cluster with 172GB
>> > >> of memory with each map task allocating 2GB, about 86 application
>> > >> containers will be created.
>> > >>
>> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
>> > >> parallel filesystem), a MapReduce job will only allocate a subset of
>> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
>> > >> containers are created). Since I'm using a parallel filesystem, I'm
>> > >> not as concerned with the bottlenecks one would find if one were to
>> > >> use NFS.
>> > >>
>> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>> > >> configuration that will allow me to effectively maximize resource
>> > >> utilization?
>> > >>
>> > >> Thanks,
>> > >> Calvin
>> >
>> >
>> >
>> > --
>> > Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

Thanks for the responses!

To clarify, I'm not using any special FileSystem implementation. An
example input parameter to a MapReduce job would be something like
"-input file:///scratch/data". Thus I think (any clarification would
be helpful) Hadoop is then utilizing LocalFileSystem
(org.apache.hadoop.fs.LocalFileSystem).

The input data is large enough and splittable (1300 .bz2 files, 274MB
each, 350GB total). Thus even if it the input data weren't splittable,
Hadoop should be able to parallelize up to 1300 map tasks if capacity
is available; in my case, I find that the Hadoop cluster is not fully
utilized (i.e., ~25-35 containers running when it can scale up to ~80
containers) when not using HDFS, while achieving maximum use when
using HDFS.

I'm wondering if Hadoop is "holding back" or throttling the I/O if
LocalFileSystem is being used, and what changes I can make to have the
Hadoop tasks scale.

In the meantime, I'll take a look at the API calls that Harsh mentioned.


On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
> The split configurations in FIF mentioned earlier would work for local files
> as well. They aren't deemed unsplitable, just considered as one single
> block.
>
> If the FS in use has its advantages it's better to implement a proper
> interface to it making use of them, than to rely on the LFS by mounting it.
> This is what we do with HDFS.
>
> On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
>>
>> I believe that Calvin mentioned before that this parallel file system
>> mounted into local file system.
>>
>> In this case, will Hadoop just use java.io.File as local File system to
>> treat them as local file and not split the file?
>>
>> Just want to know the logic in hadoop handling the local file.
>>
>> One suggestion I can think is to split the files manually outside of
>> hadoop. For example, generate lots of small files as 128M or 256M size.
>>
>> In this case, each mapper will process one small file, so you can get good
>> utilization of your cluster, assume you have a lot of small files.
>>
>> Yong
>>
>> > From: harsh@cloudera.com
>> > Date: Fri, 15 Aug 2014 16:45:02 +0530
>> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
>> > filesystems
>> > To: user@hadoop.apache.org
>> >
>> > Does your non-HDFS filesystem implement a getBlockLocations API, that
>> > MR relies on to know how to split files?
>> >
>> > The API is at
>> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
>> > long, long), and MR calls it at
>> >
>> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
>> >
>> > If not, perhaps you can enforce a manual chunking by asking MR to use
>> > custom min/max split sizes values via config properties:
>> >
>> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
>> >
>> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
>> > > I've looked a bit into this problem some more, and from what another
>> > > person has written, HDFS is tuned to scale appropriately [1] given the
>> > > number of input splits, etc.
>> > >
>> > > In the case of utilizing the local filesystem (which is really a
>> > > network share on a parallel filesystem), the settings might be set
>> > > conservatively in order not to thrash the local disks or present a
>> > > bottleneck in processing.
>> > >
>> > > Since this isn't a big concern, I'd rather tune the settings to
>> > > efficiently utilize the local filesystem.
>> > >
>> > > Are there any pointers to where in the source code I could look in
>> > > order to tweak such parameters?
>> > >
>> > > Thanks,
>> > > Calvin
>> > >
>> > > [1]
>> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>> > >
>> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
>> > >> Hi all,
>> > >>
>> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
>> > >> MapReduce applications will parallelize differently depending on what
>> > >> kind of filesystem the input data is on.
>> > >>
>> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
>> > >> use of all available memory. For example, a 3-node cluster with 172GB
>> > >> of memory with each map task allocating 2GB, about 86 application
>> > >> containers will be created.
>> > >>
>> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
>> > >> parallel filesystem), a MapReduce job will only allocate a subset of
>> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
>> > >> containers are created). Since I'm using a parallel filesystem, I'm
>> > >> not as concerned with the bottlenecks one would find if one were to
>> > >> use NFS.
>> > >>
>> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>> > >> configuration that will allow me to effectively maximize resource
>> > >> utilization?
>> > >>
>> > >> Thanks,
>> > >> Calvin
>> >
>> >
>> >
>> > --
>> > Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

Thanks for the responses!

To clarify, I'm not using any special FileSystem implementation. An
example input parameter to a MapReduce job would be something like
"-input file:///scratch/data". Thus I think (any clarification would
be helpful) Hadoop is then utilizing LocalFileSystem
(org.apache.hadoop.fs.LocalFileSystem).

The input data is large enough and splittable (1300 .bz2 files, 274MB
each, 350GB total). Thus even if it the input data weren't splittable,
Hadoop should be able to parallelize up to 1300 map tasks if capacity
is available; in my case, I find that the Hadoop cluster is not fully
utilized (i.e., ~25-35 containers running when it can scale up to ~80
containers) when not using HDFS, while achieving maximum use when
using HDFS.

I'm wondering if Hadoop is "holding back" or throttling the I/O if
LocalFileSystem is being used, and what changes I can make to have the
Hadoop tasks scale.

In the meantime, I'll take a look at the API calls that Harsh mentioned.


On Fri, Aug 15, 2014 at 10:15 AM, Harsh J <ha...@cloudera.com> wrote:
> The split configurations in FIF mentioned earlier would work for local files
> as well. They aren't deemed unsplitable, just considered as one single
> block.
>
> If the FS in use has its advantages it's better to implement a proper
> interface to it making use of them, than to rely on the LFS by mounting it.
> This is what we do with HDFS.
>
> On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:
>>
>> I believe that Calvin mentioned before that this parallel file system
>> mounted into local file system.
>>
>> In this case, will Hadoop just use java.io.File as local File system to
>> treat them as local file and not split the file?
>>
>> Just want to know the logic in hadoop handling the local file.
>>
>> One suggestion I can think is to split the files manually outside of
>> hadoop. For example, generate lots of small files as 128M or 256M size.
>>
>> In this case, each mapper will process one small file, so you can get good
>> utilization of your cluster, assume you have a lot of small files.
>>
>> Yong
>>
>> > From: harsh@cloudera.com
>> > Date: Fri, 15 Aug 2014 16:45:02 +0530
>> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs
>> > filesystems
>> > To: user@hadoop.apache.org
>> >
>> > Does your non-HDFS filesystem implement a getBlockLocations API, that
>> > MR relies on to know how to split files?
>> >
>> > The API is at
>> > http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
>> > long, long), and MR calls it at
>> >
>> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
>> >
>> > If not, perhaps you can enforce a manual chunking by asking MR to use
>> > custom min/max split sizes values via config properties:
>> >
>> > https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
>> >
>> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
>> > > I've looked a bit into this problem some more, and from what another
>> > > person has written, HDFS is tuned to scale appropriately [1] given the
>> > > number of input splits, etc.
>> > >
>> > > In the case of utilizing the local filesystem (which is really a
>> > > network share on a parallel filesystem), the settings might be set
>> > > conservatively in order not to thrash the local disks or present a
>> > > bottleneck in processing.
>> > >
>> > > Since this isn't a big concern, I'd rather tune the settings to
>> > > efficiently utilize the local filesystem.
>> > >
>> > > Are there any pointers to where in the source code I could look in
>> > > order to tweak such parameters?
>> > >
>> > > Thanks,
>> > > Calvin
>> > >
>> > > [1]
>> > > https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>> > >
>> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
>> > >> Hi all,
>> > >>
>> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
>> > >> MapReduce applications will parallelize differently depending on what
>> > >> kind of filesystem the input data is on.
>> > >>
>> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
>> > >> use of all available memory. For example, a 3-node cluster with 172GB
>> > >> of memory with each map task allocating 2GB, about 86 application
>> > >> containers will be created.
>> > >>
>> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
>> > >> parallel filesystem), a MapReduce job will only allocate a subset of
>> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
>> > >> containers are created). Since I'm using a parallel filesystem, I'm
>> > >> not as concerned with the bottlenecks one would find if one were to
>> > >> use NFS.
>> > >>
>> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>> > >> configuration that will allow me to effectively maximize resource
>> > >> utilization?
>> > >>
>> > >> Thanks,
>> > >> Calvin
>> >
>> >
>> >
>> > --
>> > Harsh J

RE: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Harsh J <ha...@cloudera.com>.

The split configurations in FIF mentioned earlier would work for local
files as well. They aren't deemed unsplitable, just considered as one
single block.

If the FS in use has its advantages it's better to implement a proper
interface to it making use of them, than to rely on the LFS by mounting it.
This is what we do with HDFS.
On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:

> I believe that Calvin mentioned before that this parallel file system
> mounted into local file system.
>
> In this case, will Hadoop just use java.io.File as local File system to
> treat them as local file and not split the file?
>
> Just want to know the logic in hadoop handling the local file.
>
> One suggestion I can think is to split the files manually outside of
> hadoop. For example, generate lots of small files as 128M or 256M size.
>
> In this case, each mapper will process one small file, so you can get good
> utilization of your cluster, assume you have a lot of small files.
>
> Yong
>
> > From: harsh@cloudera.com
> > Date: Fri, 15 Aug 2014 16:45:02 +0530
> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> > To: user@hadoop.apache.org
> >
> > Does your non-HDFS filesystem implement a getBlockLocations API, that
> > MR relies on to know how to split files?
> >
> > The API is at
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus
> ,
> > long, long), and MR calls it at
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> >
> > If not, perhaps you can enforce a manual chunking by asking MR to use
> > custom min/max split sizes values via config properties:
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> >
> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> > > I've looked a bit into this problem some more, and from what another
> > > person has written, HDFS is tuned to scale appropriately [1] given the
> > > number of input splits, etc.
> > >
> > > In the case of utilizing the local filesystem (which is really a
> > > network share on a parallel filesystem), the settings might be set
> > > conservatively in order not to thrash the local disks or present a
> > > bottleneck in processing.
> > >
> > > Since this isn't a big concern, I'd rather tune the settings to
> > > efficiently utilize the local filesystem.
> > >
> > > Are there any pointers to where in the source code I could look in
> > > order to tweak such parameters?
> > >
> > > Thanks,
> > > Calvin
> > >
> > > [1]
> https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> > >
> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> > >> Hi all,
> > >>
> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> > >> MapReduce applications will parallelize differently depending on what
> > >> kind of filesystem the input data is on.
> > >>
> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> > >> use of all available memory. For example, a 3-node cluster with 172GB
> > >> of memory with each map task allocating 2GB, about 86 application
> > >> containers will be created.
> > >>
> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> > >> parallel filesystem), a MapReduce job will only allocate a subset of
> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
> > >> containers are created). Since I'm using a parallel filesystem, I'm
> > >> not as concerned with the bottlenecks one would find if one were to
> > >> use NFS.
> > >>
> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> > >> configuration that will allow me to effectively maximize resource
> > >> utilization?
> > >>
> > >> Thanks,
> > >> Calvin
> >
> >
> >
> > --
> > Harsh J
>

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by jay vyas <ja...@gmail.com>.

Your FileSystem implementation should provide specific tuning parameters
for IO.

For example, in the GlusterFileSystem, we have a buffer parameter that is
typically
embedded into the core-site.xml.

https://github.com/gluster/glusterfs-hadoop/blob/master/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterVolume.java


Similarly, in HDFS, there are tuning parameters that would go in
hdfs-site.xml

IIRC from your stackoverflow question, the Hadoop Compatible FileSystem you
are using is backed by a company of some sort, so
you should contact the engineers working on the implementation about how to
tune the underlying FS.

Regarding mapreduce and yarn - task optimization at that level is
independent of the underlying file system.  There are some parameters that
you can specify with your job, like setting the min number of tasks, which
can increase/decrease the number of total tasks.

>From some experience tuning web crawlers with this stuff, I can say that  a
high number will increase parallelism but might decrease availability of
your cluster (and locality of individual jobs).
A high # of tasks generally works good when doing something CPU or network
intensive.


On Fri, Aug 15, 2014 at 11:22 AM, java8964 <ja...@hotmail.com> wrote:

> I believe that Calvin mentioned before that this parallel file system
> mounted into local file system.
>
> In this case, will Hadoop just use java.io.File as local File system to
> treat them as local file and not split the file?
>
> Just want to know the logic in hadoop handling the local file.
>
> One suggestion I can think is to split the files manually outside of
> hadoop. For example, generate lots of small files as 128M or 256M size.
>
> In this case, each mapper will process one small file, so you can get good
> utilization of your cluster, assume you have a lot of small files.
>
> Yong
>
> > From: harsh@cloudera.com
> > Date: Fri, 15 Aug 2014 16:45:02 +0530
> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> > To: user@hadoop.apache.org
>
> >
> > Does your non-HDFS filesystem implement a getBlockLocations API, that
> > MR relies on to know how to split files?
> >
> > The API is at
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus
> ,
> > long, long), and MR calls it at
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> >
> > If not, perhaps you can enforce a manual chunking by asking MR to use
> > custom min/max split sizes values via config properties:
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> >
> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> > > I've looked a bit into this problem some more, and from what another
> > > person has written, HDFS is tuned to scale appropriately [1] given the
> > > number of input splits, etc.
> > >
> > > In the case of utilizing the local filesystem (which is really a
> > > network share on a parallel filesystem), the settings might be set
> > > conservatively in order not to thrash the local disks or present a
> > > bottleneck in processing.
> > >
> > > Since this isn't a big concern, I'd rather tune the settings to
> > > efficiently utilize the local filesystem.
> > >
> > > Are there any pointers to where in the source code I could look in
> > > order to tweak such parameters?
> > >
> > > Thanks,
> > > Calvin
> > >
> > > [1]
> https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> > >
> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> > >> Hi all,
> > >>
> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> > >> MapReduce applications will parallelize differently depending on what
> > >> kind of filesystem the input data is on.
> > >>
> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> > >> use of all available memory. For example, a 3-node cluster with 172GB
> > >> of memory with each map task allocating 2GB, about 86 application
> > >> containers will be created.
> > >>
> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> > >> parallel filesystem), a MapReduce job will only allocate a subset of
> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
> > >> containers are created). Since I'm using a parallel filesystem, I'm
> > >> not as concerned with the bottlenecks one would find if one were to
> > >> use NFS.
> > >>
> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> > >> configuration that will allow me to effectively maximize resource
> > >> utilization?
> > >>
> > >> Thanks,
> > >> Calvin
> >
> >
> >
> > --
> > Harsh J
>



-- 
jay vyas

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by jay vyas <ja...@gmail.com>.

Your FileSystem implementation should provide specific tuning parameters
for IO.

For example, in the GlusterFileSystem, we have a buffer parameter that is
typically
embedded into the core-site.xml.

https://github.com/gluster/glusterfs-hadoop/blob/master/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterVolume.java


Similarly, in HDFS, there are tuning parameters that would go in
hdfs-site.xml

IIRC from your stackoverflow question, the Hadoop Compatible FileSystem you
are using is backed by a company of some sort, so
you should contact the engineers working on the implementation about how to
tune the underlying FS.

Regarding mapreduce and yarn - task optimization at that level is
independent of the underlying file system.  There are some parameters that
you can specify with your job, like setting the min number of tasks, which
can increase/decrease the number of total tasks.

>From some experience tuning web crawlers with this stuff, I can say that  a
high number will increase parallelism but might decrease availability of
your cluster (and locality of individual jobs).
A high # of tasks generally works good when doing something CPU or network
intensive.


On Fri, Aug 15, 2014 at 11:22 AM, java8964 <ja...@hotmail.com> wrote:

> I believe that Calvin mentioned before that this parallel file system
> mounted into local file system.
>
> In this case, will Hadoop just use java.io.File as local File system to
> treat them as local file and not split the file?
>
> Just want to know the logic in hadoop handling the local file.
>
> One suggestion I can think is to split the files manually outside of
> hadoop. For example, generate lots of small files as 128M or 256M size.
>
> In this case, each mapper will process one small file, so you can get good
> utilization of your cluster, assume you have a lot of small files.
>
> Yong
>
> > From: harsh@cloudera.com
> > Date: Fri, 15 Aug 2014 16:45:02 +0530
> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> > To: user@hadoop.apache.org
>
> >
> > Does your non-HDFS filesystem implement a getBlockLocations API, that
> > MR relies on to know how to split files?
> >
> > The API is at
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus
> ,
> > long, long), and MR calls it at
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> >
> > If not, perhaps you can enforce a manual chunking by asking MR to use
> > custom min/max split sizes values via config properties:
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> >
> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> > > I've looked a bit into this problem some more, and from what another
> > > person has written, HDFS is tuned to scale appropriately [1] given the
> > > number of input splits, etc.
> > >
> > > In the case of utilizing the local filesystem (which is really a
> > > network share on a parallel filesystem), the settings might be set
> > > conservatively in order not to thrash the local disks or present a
> > > bottleneck in processing.
> > >
> > > Since this isn't a big concern, I'd rather tune the settings to
> > > efficiently utilize the local filesystem.
> > >
> > > Are there any pointers to where in the source code I could look in
> > > order to tweak such parameters?
> > >
> > > Thanks,
> > > Calvin
> > >
> > > [1]
> https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> > >
> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> > >> Hi all,
> > >>
> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> > >> MapReduce applications will parallelize differently depending on what
> > >> kind of filesystem the input data is on.
> > >>
> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> > >> use of all available memory. For example, a 3-node cluster with 172GB
> > >> of memory with each map task allocating 2GB, about 86 application
> > >> containers will be created.
> > >>
> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> > >> parallel filesystem), a MapReduce job will only allocate a subset of
> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
> > >> containers are created). Since I'm using a parallel filesystem, I'm
> > >> not as concerned with the bottlenecks one would find if one were to
> > >> use NFS.
> > >>
> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> > >> configuration that will allow me to effectively maximize resource
> > >> utilization?
> > >>
> > >> Thanks,
> > >> Calvin
> >
> >
> >
> > --
> > Harsh J
>



-- 
jay vyas

RE: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Harsh J <ha...@cloudera.com>.

The split configurations in FIF mentioned earlier would work for local
files as well. They aren't deemed unsplitable, just considered as one
single block.

If the FS in use has its advantages it's better to implement a proper
interface to it making use of them, than to rely on the LFS by mounting it.
This is what we do with HDFS.
On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:

> I believe that Calvin mentioned before that this parallel file system
> mounted into local file system.
>
> In this case, will Hadoop just use java.io.File as local File system to
> treat them as local file and not split the file?
>
> Just want to know the logic in hadoop handling the local file.
>
> One suggestion I can think is to split the files manually outside of
> hadoop. For example, generate lots of small files as 128M or 256M size.
>
> In this case, each mapper will process one small file, so you can get good
> utilization of your cluster, assume you have a lot of small files.
>
> Yong
>
> > From: harsh@cloudera.com
> > Date: Fri, 15 Aug 2014 16:45:02 +0530
> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> > To: user@hadoop.apache.org
> >
> > Does your non-HDFS filesystem implement a getBlockLocations API, that
> > MR relies on to know how to split files?
> >
> > The API is at
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus
> ,
> > long, long), and MR calls it at
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> >
> > If not, perhaps you can enforce a manual chunking by asking MR to use
> > custom min/max split sizes values via config properties:
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> >
> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> > > I've looked a bit into this problem some more, and from what another
> > > person has written, HDFS is tuned to scale appropriately [1] given the
> > > number of input splits, etc.
> > >
> > > In the case of utilizing the local filesystem (which is really a
> > > network share on a parallel filesystem), the settings might be set
> > > conservatively in order not to thrash the local disks or present a
> > > bottleneck in processing.
> > >
> > > Since this isn't a big concern, I'd rather tune the settings to
> > > efficiently utilize the local filesystem.
> > >
> > > Are there any pointers to where in the source code I could look in
> > > order to tweak such parameters?
> > >
> > > Thanks,
> > > Calvin
> > >
> > > [1]
> https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> > >
> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> > >> Hi all,
> > >>
> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> > >> MapReduce applications will parallelize differently depending on what
> > >> kind of filesystem the input data is on.
> > >>
> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> > >> use of all available memory. For example, a 3-node cluster with 172GB
> > >> of memory with each map task allocating 2GB, about 86 application
> > >> containers will be created.
> > >>
> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> > >> parallel filesystem), a MapReduce job will only allocate a subset of
> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
> > >> containers are created). Since I'm using a parallel filesystem, I'm
> > >> not as concerned with the bottlenecks one would find if one were to
> > >> use NFS.
> > >>
> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> > >> configuration that will allow me to effectively maximize resource
> > >> utilization?
> > >>
> > >> Thanks,
> > >> Calvin
> >
> >
> >
> > --
> > Harsh J
>

RE: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Harsh J <ha...@cloudera.com>.

The split configurations in FIF mentioned earlier would work for local
files as well. They aren't deemed unsplitable, just considered as one
single block.

If the FS in use has its advantages it's better to implement a proper
interface to it making use of them, than to rely on the LFS by mounting it.
This is what we do with HDFS.
On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:

> I believe that Calvin mentioned before that this parallel file system
> mounted into local file system.
>
> In this case, will Hadoop just use java.io.File as local File system to
> treat them as local file and not split the file?
>
> Just want to know the logic in hadoop handling the local file.
>
> One suggestion I can think is to split the files manually outside of
> hadoop. For example, generate lots of small files as 128M or 256M size.
>
> In this case, each mapper will process one small file, so you can get good
> utilization of your cluster, assume you have a lot of small files.
>
> Yong
>
> > From: harsh@cloudera.com
> > Date: Fri, 15 Aug 2014 16:45:02 +0530
> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> > To: user@hadoop.apache.org
> >
> > Does your non-HDFS filesystem implement a getBlockLocations API, that
> > MR relies on to know how to split files?
> >
> > The API is at
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus
> ,
> > long, long), and MR calls it at
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> >
> > If not, perhaps you can enforce a manual chunking by asking MR to use
> > custom min/max split sizes values via config properties:
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> >
> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> > > I've looked a bit into this problem some more, and from what another
> > > person has written, HDFS is tuned to scale appropriately [1] given the
> > > number of input splits, etc.
> > >
> > > In the case of utilizing the local filesystem (which is really a
> > > network share on a parallel filesystem), the settings might be set
> > > conservatively in order not to thrash the local disks or present a
> > > bottleneck in processing.
> > >
> > > Since this isn't a big concern, I'd rather tune the settings to
> > > efficiently utilize the local filesystem.
> > >
> > > Are there any pointers to where in the source code I could look in
> > > order to tweak such parameters?
> > >
> > > Thanks,
> > > Calvin
> > >
> > > [1]
> https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> > >
> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> > >> Hi all,
> > >>
> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> > >> MapReduce applications will parallelize differently depending on what
> > >> kind of filesystem the input data is on.
> > >>
> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> > >> use of all available memory. For example, a 3-node cluster with 172GB
> > >> of memory with each map task allocating 2GB, about 86 application
> > >> containers will be created.
> > >>
> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> > >> parallel filesystem), a MapReduce job will only allocate a subset of
> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
> > >> containers are created). Since I'm using a parallel filesystem, I'm
> > >> not as concerned with the bottlenecks one would find if one were to
> > >> use NFS.
> > >>
> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> > >> configuration that will allow me to effectively maximize resource
> > >> utilization?
> > >>
> > >> Thanks,
> > >> Calvin
> >
> >
> >
> > --
> > Harsh J
>

RE: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Harsh J <ha...@cloudera.com>.

The split configurations in FIF mentioned earlier would work for local
files as well. They aren't deemed unsplitable, just considered as one
single block.

If the FS in use has its advantages it's better to implement a proper
interface to it making use of them, than to rely on the LFS by mounting it.
This is what we do with HDFS.
On Aug 15, 2014 8:52 PM, "java8964" <ja...@hotmail.com> wrote:

> I believe that Calvin mentioned before that this parallel file system
> mounted into local file system.
>
> In this case, will Hadoop just use java.io.File as local File system to
> treat them as local file and not split the file?
>
> Just want to know the logic in hadoop handling the local file.
>
> One suggestion I can think is to split the files manually outside of
> hadoop. For example, generate lots of small files as 128M or 256M size.
>
> In this case, each mapper will process one small file, so you can get good
> utilization of your cluster, assume you have a lot of small files.
>
> Yong
>
> > From: harsh@cloudera.com
> > Date: Fri, 15 Aug 2014 16:45:02 +0530
> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> > To: user@hadoop.apache.org
> >
> > Does your non-HDFS filesystem implement a getBlockLocations API, that
> > MR relies on to know how to split files?
> >
> > The API is at
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus
> ,
> > long, long), and MR calls it at
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> >
> > If not, perhaps you can enforce a manual chunking by asking MR to use
> > custom min/max split sizes values via config properties:
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> >
> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> > > I've looked a bit into this problem some more, and from what another
> > > person has written, HDFS is tuned to scale appropriately [1] given the
> > > number of input splits, etc.
> > >
> > > In the case of utilizing the local filesystem (which is really a
> > > network share on a parallel filesystem), the settings might be set
> > > conservatively in order not to thrash the local disks or present a
> > > bottleneck in processing.
> > >
> > > Since this isn't a big concern, I'd rather tune the settings to
> > > efficiently utilize the local filesystem.
> > >
> > > Are there any pointers to where in the source code I could look in
> > > order to tweak such parameters?
> > >
> > > Thanks,
> > > Calvin
> > >
> > > [1]
> https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> > >
> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> > >> Hi all,
> > >>
> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> > >> MapReduce applications will parallelize differently depending on what
> > >> kind of filesystem the input data is on.
> > >>
> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> > >> use of all available memory. For example, a 3-node cluster with 172GB
> > >> of memory with each map task allocating 2GB, about 86 application
> > >> containers will be created.
> > >>
> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> > >> parallel filesystem), a MapReduce job will only allocate a subset of
> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
> > >> containers are created). Since I'm using a parallel filesystem, I'm
> > >> not as concerned with the bottlenecks one would find if one were to
> > >> use NFS.
> > >>
> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> > >> configuration that will allow me to effectively maximize resource
> > >> utilization?
> > >>
> > >> Thanks,
> > >> Calvin
> >
> >
> >
> > --
> > Harsh J
>

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by jay vyas <ja...@gmail.com>.

Your FileSystem implementation should provide specific tuning parameters
for IO.

For example, in the GlusterFileSystem, we have a buffer parameter that is
typically
embedded into the core-site.xml.

https://github.com/gluster/glusterfs-hadoop/blob/master/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterVolume.java


Similarly, in HDFS, there are tuning parameters that would go in
hdfs-site.xml

IIRC from your stackoverflow question, the Hadoop Compatible FileSystem you
are using is backed by a company of some sort, so
you should contact the engineers working on the implementation about how to
tune the underlying FS.

Regarding mapreduce and yarn - task optimization at that level is
independent of the underlying file system.  There are some parameters that
you can specify with your job, like setting the min number of tasks, which
can increase/decrease the number of total tasks.

>From some experience tuning web crawlers with this stuff, I can say that  a
high number will increase parallelism but might decrease availability of
your cluster (and locality of individual jobs).
A high # of tasks generally works good when doing something CPU or network
intensive.


On Fri, Aug 15, 2014 at 11:22 AM, java8964 <ja...@hotmail.com> wrote:

> I believe that Calvin mentioned before that this parallel file system
> mounted into local file system.
>
> In this case, will Hadoop just use java.io.File as local File system to
> treat them as local file and not split the file?
>
> Just want to know the logic in hadoop handling the local file.
>
> One suggestion I can think is to split the files manually outside of
> hadoop. For example, generate lots of small files as 128M or 256M size.
>
> In this case, each mapper will process one small file, so you can get good
> utilization of your cluster, assume you have a lot of small files.
>
> Yong
>
> > From: harsh@cloudera.com
> > Date: Fri, 15 Aug 2014 16:45:02 +0530
> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> > To: user@hadoop.apache.org
>
> >
> > Does your non-HDFS filesystem implement a getBlockLocations API, that
> > MR relies on to know how to split files?
> >
> > The API is at
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus
> ,
> > long, long), and MR calls it at
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> >
> > If not, perhaps you can enforce a manual chunking by asking MR to use
> > custom min/max split sizes values via config properties:
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> >
> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> > > I've looked a bit into this problem some more, and from what another
> > > person has written, HDFS is tuned to scale appropriately [1] given the
> > > number of input splits, etc.
> > >
> > > In the case of utilizing the local filesystem (which is really a
> > > network share on a parallel filesystem), the settings might be set
> > > conservatively in order not to thrash the local disks or present a
> > > bottleneck in processing.
> > >
> > > Since this isn't a big concern, I'd rather tune the settings to
> > > efficiently utilize the local filesystem.
> > >
> > > Are there any pointers to where in the source code I could look in
> > > order to tweak such parameters?
> > >
> > > Thanks,
> > > Calvin
> > >
> > > [1]
> https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> > >
> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> > >> Hi all,
> > >>
> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> > >> MapReduce applications will parallelize differently depending on what
> > >> kind of filesystem the input data is on.
> > >>
> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> > >> use of all available memory. For example, a 3-node cluster with 172GB
> > >> of memory with each map task allocating 2GB, about 86 application
> > >> containers will be created.
> > >>
> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> > >> parallel filesystem), a MapReduce job will only allocate a subset of
> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
> > >> containers are created). Since I'm using a parallel filesystem, I'm
> > >> not as concerned with the bottlenecks one would find if one were to
> > >> use NFS.
> > >>
> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> > >> configuration that will allow me to effectively maximize resource
> > >> utilization?
> > >>
> > >> Thanks,
> > >> Calvin
> >
> >
> >
> > --
> > Harsh J
>



-- 
jay vyas

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by jay vyas <ja...@gmail.com>.

Your FileSystem implementation should provide specific tuning parameters
for IO.

For example, in the GlusterFileSystem, we have a buffer parameter that is
typically
embedded into the core-site.xml.

https://github.com/gluster/glusterfs-hadoop/blob/master/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterVolume.java


Similarly, in HDFS, there are tuning parameters that would go in
hdfs-site.xml

IIRC from your stackoverflow question, the Hadoop Compatible FileSystem you
are using is backed by a company of some sort, so
you should contact the engineers working on the implementation about how to
tune the underlying FS.

Regarding mapreduce and yarn - task optimization at that level is
independent of the underlying file system.  There are some parameters that
you can specify with your job, like setting the min number of tasks, which
can increase/decrease the number of total tasks.

>From some experience tuning web crawlers with this stuff, I can say that  a
high number will increase parallelism but might decrease availability of
your cluster (and locality of individual jobs).
A high # of tasks generally works good when doing something CPU or network
intensive.


On Fri, Aug 15, 2014 at 11:22 AM, java8964 <ja...@hotmail.com> wrote:

> I believe that Calvin mentioned before that this parallel file system
> mounted into local file system.
>
> In this case, will Hadoop just use java.io.File as local File system to
> treat them as local file and not split the file?
>
> Just want to know the logic in hadoop handling the local file.
>
> One suggestion I can think is to split the files manually outside of
> hadoop. For example, generate lots of small files as 128M or 256M size.
>
> In this case, each mapper will process one small file, so you can get good
> utilization of your cluster, assume you have a lot of small files.
>
> Yong
>
> > From: harsh@cloudera.com
> > Date: Fri, 15 Aug 2014 16:45:02 +0530
> > Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> > To: user@hadoop.apache.org
>
> >
> > Does your non-HDFS filesystem implement a getBlockLocations API, that
> > MR relies on to know how to split files?
> >
> > The API is at
> http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus
> ,
> > long, long), and MR calls it at
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> >
> > If not, perhaps you can enforce a manual chunking by asking MR to use
> > custom min/max split sizes values via config properties:
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> >
> > On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> > > I've looked a bit into this problem some more, and from what another
> > > person has written, HDFS is tuned to scale appropriately [1] given the
> > > number of input splits, etc.
> > >
> > > In the case of utilizing the local filesystem (which is really a
> > > network share on a parallel filesystem), the settings might be set
> > > conservatively in order not to thrash the local disks or present a
> > > bottleneck in processing.
> > >
> > > Since this isn't a big concern, I'd rather tune the settings to
> > > efficiently utilize the local filesystem.
> > >
> > > Are there any pointers to where in the source code I could look in
> > > order to tweak such parameters?
> > >
> > > Thanks,
> > > Calvin
> > >
> > > [1]
> https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> > >
> > > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> > >> Hi all,
> > >>
> > >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> > >> MapReduce applications will parallelize differently depending on what
> > >> kind of filesystem the input data is on.
> > >>
> > >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> > >> use of all available memory. For example, a 3-node cluster with 172GB
> > >> of memory with each map task allocating 2GB, about 86 application
> > >> containers will be created.
> > >>
> > >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> > >> parallel filesystem), a MapReduce job will only allocate a subset of
> > >> available tasks (e.g., with the same 3-node cluster, about 25-40
> > >> containers are created). Since I'm using a parallel filesystem, I'm
> > >> not as concerned with the bottlenecks one would find if one were to
> > >> use NFS.
> > >>
> > >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> > >> configuration that will allow me to effectively maximize resource
> > >> utilization?
> > >>
> > >> Thanks,
> > >> Calvin
> >
> >
> >
> > --
> > Harsh J
>



-- 
jay vyas

RE: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by java8964 <ja...@hotmail.com>.

I believe that Calvin mentioned before that this parallel file system mounted into local file system.
In this case, will Hadoop just use java.io.File as local File system to treat them as local file and not split the file?
Just want to know the logic in hadoop handling the local file.
One suggestion I can think is to split the files manually outside of hadoop. For example, generate lots of small files as 128M or 256M size.
In this case, each mapper will process one small file, so you can get good utilization of your cluster, assume you have a lot of small files.
Yong

> From: harsh@cloudera.com
> Date: Fri, 15 Aug 2014 16:45:02 +0530
> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> To: user@hadoop.apache.org
> 
> Does your non-HDFS filesystem implement a getBlockLocations API, that
> MR relies on to know how to split files?
> 
> The API is at http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
> long, long), and MR calls it at
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> 
> If not, perhaps you can enforce a manual chunking by asking MR to use
> custom min/max split sizes values via config properties:
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> 
> On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> > I've looked a bit into this problem some more, and from what another
> > person has written, HDFS is tuned to scale appropriately [1] given the
> > number of input splits, etc.
> >
> > In the case of utilizing the local filesystem (which is really a
> > network share on a parallel filesystem), the settings might be set
> > conservatively in order not to thrash the local disks or present a
> > bottleneck in processing.
> >
> > Since this isn't a big concern, I'd rather tune the settings to
> > efficiently utilize the local filesystem.
> >
> > Are there any pointers to where in the source code I could look in
> > order to tweak such parameters?
> >
> > Thanks,
> > Calvin
> >
> > [1] https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> >
> > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> >> Hi all,
> >>
> >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> >> MapReduce applications will parallelize differently depending on what
> >> kind of filesystem the input data is on.
> >>
> >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> >> use of all available memory. For example, a 3-node cluster with 172GB
> >> of memory with each map task allocating 2GB, about 86 application
> >> containers will be created.
> >>
> >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> >> parallel filesystem), a MapReduce job will only allocate a subset of
> >> available tasks (e.g., with the same 3-node cluster, about 25-40
> >> containers are created). Since I'm using a parallel filesystem, I'm
> >> not as concerned with the bottlenecks one would find if one were to
> >> use NFS.
> >>
> >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> >> configuration that will allow me to effectively maximize resource
> >> utilization?
> >>
> >> Thanks,
> >> Calvin
> 
> 
> 
> -- 
> Harsh J

RE: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by java8964 <ja...@hotmail.com>.

I believe that Calvin mentioned before that this parallel file system mounted into local file system.
In this case, will Hadoop just use java.io.File as local File system to treat them as local file and not split the file?
Just want to know the logic in hadoop handling the local file.
One suggestion I can think is to split the files manually outside of hadoop. For example, generate lots of small files as 128M or 256M size.
In this case, each mapper will process one small file, so you can get good utilization of your cluster, assume you have a lot of small files.
Yong

> From: harsh@cloudera.com
> Date: Fri, 15 Aug 2014 16:45:02 +0530
> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> To: user@hadoop.apache.org
> 
> Does your non-HDFS filesystem implement a getBlockLocations API, that
> MR relies on to know how to split files?
> 
> The API is at http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
> long, long), and MR calls it at
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> 
> If not, perhaps you can enforce a manual chunking by asking MR to use
> custom min/max split sizes values via config properties:
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> 
> On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> > I've looked a bit into this problem some more, and from what another
> > person has written, HDFS is tuned to scale appropriately [1] given the
> > number of input splits, etc.
> >
> > In the case of utilizing the local filesystem (which is really a
> > network share on a parallel filesystem), the settings might be set
> > conservatively in order not to thrash the local disks or present a
> > bottleneck in processing.
> >
> > Since this isn't a big concern, I'd rather tune the settings to
> > efficiently utilize the local filesystem.
> >
> > Are there any pointers to where in the source code I could look in
> > order to tweak such parameters?
> >
> > Thanks,
> > Calvin
> >
> > [1] https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> >
> > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> >> Hi all,
> >>
> >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> >> MapReduce applications will parallelize differently depending on what
> >> kind of filesystem the input data is on.
> >>
> >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> >> use of all available memory. For example, a 3-node cluster with 172GB
> >> of memory with each map task allocating 2GB, about 86 application
> >> containers will be created.
> >>
> >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> >> parallel filesystem), a MapReduce job will only allocate a subset of
> >> available tasks (e.g., with the same 3-node cluster, about 25-40
> >> containers are created). Since I'm using a parallel filesystem, I'm
> >> not as concerned with the bottlenecks one would find if one were to
> >> use NFS.
> >>
> >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> >> configuration that will allow me to effectively maximize resource
> >> utilization?
> >>
> >> Thanks,
> >> Calvin
> 
> 
> 
> -- 
> Harsh J

RE: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by java8964 <ja...@hotmail.com>.

I believe that Calvin mentioned before that this parallel file system mounted into local file system.
In this case, will Hadoop just use java.io.File as local File system to treat them as local file and not split the file?
Just want to know the logic in hadoop handling the local file.
One suggestion I can think is to split the files manually outside of hadoop. For example, generate lots of small files as 128M or 256M size.
In this case, each mapper will process one small file, so you can get good utilization of your cluster, assume you have a lot of small files.
Yong

> From: harsh@cloudera.com
> Date: Fri, 15 Aug 2014 16:45:02 +0530
> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> To: user@hadoop.apache.org
> 
> Does your non-HDFS filesystem implement a getBlockLocations API, that
> MR relies on to know how to split files?
> 
> The API is at http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
> long, long), and MR calls it at
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> 
> If not, perhaps you can enforce a manual chunking by asking MR to use
> custom min/max split sizes values via config properties:
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> 
> On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> > I've looked a bit into this problem some more, and from what another
> > person has written, HDFS is tuned to scale appropriately [1] given the
> > number of input splits, etc.
> >
> > In the case of utilizing the local filesystem (which is really a
> > network share on a parallel filesystem), the settings might be set
> > conservatively in order not to thrash the local disks or present a
> > bottleneck in processing.
> >
> > Since this isn't a big concern, I'd rather tune the settings to
> > efficiently utilize the local filesystem.
> >
> > Are there any pointers to where in the source code I could look in
> > order to tweak such parameters?
> >
> > Thanks,
> > Calvin
> >
> > [1] https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> >
> > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> >> Hi all,
> >>
> >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> >> MapReduce applications will parallelize differently depending on what
> >> kind of filesystem the input data is on.
> >>
> >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> >> use of all available memory. For example, a 3-node cluster with 172GB
> >> of memory with each map task allocating 2GB, about 86 application
> >> containers will be created.
> >>
> >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> >> parallel filesystem), a MapReduce job will only allocate a subset of
> >> available tasks (e.g., with the same 3-node cluster, about 25-40
> >> containers are created). Since I'm using a parallel filesystem, I'm
> >> not as concerned with the bottlenecks one would find if one were to
> >> use NFS.
> >>
> >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> >> configuration that will allow me to effectively maximize resource
> >> utilization?
> >>
> >> Thanks,
> >> Calvin
> 
> 
> 
> -- 
> Harsh J

RE: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by java8964 <ja...@hotmail.com>.

I believe that Calvin mentioned before that this parallel file system mounted into local file system.
In this case, will Hadoop just use java.io.File as local File system to treat them as local file and not split the file?
Just want to know the logic in hadoop handling the local file.
One suggestion I can think is to split the files manually outside of hadoop. For example, generate lots of small files as 128M or 256M size.
In this case, each mapper will process one small file, so you can get good utilization of your cluster, assume you have a lot of small files.
Yong

> From: harsh@cloudera.com
> Date: Fri, 15 Aug 2014 16:45:02 +0530
> Subject: Re: hadoop/yarn and task parallelization on non-hdfs filesystems
> To: user@hadoop.apache.org
> 
> Does your non-HDFS filesystem implement a getBlockLocations API, that
> MR relies on to know how to split files?
> 
> The API is at http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
> long, long), and MR calls it at
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392
> 
> If not, perhaps you can enforce a manual chunking by asking MR to use
> custom min/max split sizes values via config properties:
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66
> 
> On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> > I've looked a bit into this problem some more, and from what another
> > person has written, HDFS is tuned to scale appropriately [1] given the
> > number of input splits, etc.
> >
> > In the case of utilizing the local filesystem (which is really a
> > network share on a parallel filesystem), the settings might be set
> > conservatively in order not to thrash the local disks or present a
> > bottleneck in processing.
> >
> > Since this isn't a big concern, I'd rather tune the settings to
> > efficiently utilize the local filesystem.
> >
> > Are there any pointers to where in the source code I could look in
> > order to tweak such parameters?
> >
> > Thanks,
> > Calvin
> >
> > [1] https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
> >
> > On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> >> Hi all,
> >>
> >> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> >> MapReduce applications will parallelize differently depending on what
> >> kind of filesystem the input data is on.
> >>
> >> Using HDFS, a MapReduce job will spawn enough containers to maximize
> >> use of all available memory. For example, a 3-node cluster with 172GB
> >> of memory with each map task allocating 2GB, about 86 application
> >> containers will be created.
> >>
> >> On a filesystem that isn't HDFS (like NFS or in my use case, a
> >> parallel filesystem), a MapReduce job will only allocate a subset of
> >> available tasks (e.g., with the same 3-node cluster, about 25-40
> >> containers are created). Since I'm using a parallel filesystem, I'm
> >> not as concerned with the bottlenecks one would find if one were to
> >> use NFS.
> >>
> >> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> >> configuration that will allow me to effectively maximize resource
> >> utilization?
> >>
> >> Thanks,
> >> Calvin
> 
> 
> 
> -- 
> Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Harsh J <ha...@cloudera.com>.

Does your non-HDFS filesystem implement a getBlockLocations API, that
MR relies on to know how to split files?

The API is at http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
long, long), and MR calls it at
https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392

If not, perhaps you can enforce a manual chunking by asking MR to use
custom min/max split sizes values via config properties:
https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66

On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> I've looked a bit into this problem some more, and from what another
> person has written, HDFS is tuned to scale appropriately [1] given the
> number of input splits, etc.
>
> In the case of utilizing the local filesystem (which is really a
> network share on a parallel filesystem), the settings might be set
> conservatively in order not to thrash the local disks or present a
> bottleneck in processing.
>
> Since this isn't a big concern, I'd rather tune the settings to
> efficiently utilize the local filesystem.
>
> Are there any pointers to where in the source code I could look in
> order to tweak such parameters?
>
> Thanks,
> Calvin
>
> [1] https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>
> On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
>> Hi all,
>>
>> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
>> MapReduce applications will parallelize differently depending on what
>> kind of filesystem the input data is on.
>>
>> Using HDFS, a MapReduce job will spawn enough containers to maximize
>> use of all available memory. For example, a 3-node cluster with 172GB
>> of memory with each map task allocating 2GB, about 86 application
>> containers will be created.
>>
>> On a filesystem that isn't HDFS (like NFS or in my use case, a
>> parallel filesystem), a MapReduce job will only allocate a subset of
>> available tasks (e.g., with the same 3-node cluster, about 25-40
>> containers are created). Since I'm using a parallel filesystem, I'm
>> not as concerned with the bottlenecks one would find if one were to
>> use NFS.
>>
>> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>> configuration that will allow me to effectively maximize resource
>> utilization?
>>
>> Thanks,
>> Calvin



-- 
Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Harsh J <ha...@cloudera.com>.

Does your non-HDFS filesystem implement a getBlockLocations API, that
MR relies on to know how to split files?

The API is at http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
long, long), and MR calls it at
https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392

If not, perhaps you can enforce a manual chunking by asking MR to use
custom min/max split sizes values via config properties:
https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66

On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> I've looked a bit into this problem some more, and from what another
> person has written, HDFS is tuned to scale appropriately [1] given the
> number of input splits, etc.
>
> In the case of utilizing the local filesystem (which is really a
> network share on a parallel filesystem), the settings might be set
> conservatively in order not to thrash the local disks or present a
> bottleneck in processing.
>
> Since this isn't a big concern, I'd rather tune the settings to
> efficiently utilize the local filesystem.
>
> Are there any pointers to where in the source code I could look in
> order to tweak such parameters?
>
> Thanks,
> Calvin
>
> [1] https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>
> On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
>> Hi all,
>>
>> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
>> MapReduce applications will parallelize differently depending on what
>> kind of filesystem the input data is on.
>>
>> Using HDFS, a MapReduce job will spawn enough containers to maximize
>> use of all available memory. For example, a 3-node cluster with 172GB
>> of memory with each map task allocating 2GB, about 86 application
>> containers will be created.
>>
>> On a filesystem that isn't HDFS (like NFS or in my use case, a
>> parallel filesystem), a MapReduce job will only allocate a subset of
>> available tasks (e.g., with the same 3-node cluster, about 25-40
>> containers are created). Since I'm using a parallel filesystem, I'm
>> not as concerned with the bottlenecks one would find if one were to
>> use NFS.
>>
>> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>> configuration that will allow me to effectively maximize resource
>> utilization?
>>
>> Thanks,
>> Calvin



-- 
Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Harsh J <ha...@cloudera.com>.

Does your non-HDFS filesystem implement a getBlockLocations API, that
MR relies on to know how to split files?

The API is at http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
long, long), and MR calls it at
https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392

If not, perhaps you can enforce a manual chunking by asking MR to use
custom min/max split sizes values via config properties:
https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66

On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> I've looked a bit into this problem some more, and from what another
> person has written, HDFS is tuned to scale appropriately [1] given the
> number of input splits, etc.
>
> In the case of utilizing the local filesystem (which is really a
> network share on a parallel filesystem), the settings might be set
> conservatively in order not to thrash the local disks or present a
> bottleneck in processing.
>
> Since this isn't a big concern, I'd rather tune the settings to
> efficiently utilize the local filesystem.
>
> Are there any pointers to where in the source code I could look in
> order to tweak such parameters?
>
> Thanks,
> Calvin
>
> [1] https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>
> On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
>> Hi all,
>>
>> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
>> MapReduce applications will parallelize differently depending on what
>> kind of filesystem the input data is on.
>>
>> Using HDFS, a MapReduce job will spawn enough containers to maximize
>> use of all available memory. For example, a 3-node cluster with 172GB
>> of memory with each map task allocating 2GB, about 86 application
>> containers will be created.
>>
>> On a filesystem that isn't HDFS (like NFS or in my use case, a
>> parallel filesystem), a MapReduce job will only allocate a subset of
>> available tasks (e.g., with the same 3-node cluster, about 25-40
>> containers are created). Since I'm using a parallel filesystem, I'm
>> not as concerned with the bottlenecks one would find if one were to
>> use NFS.
>>
>> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>> configuration that will allow me to effectively maximize resource
>> utilization?
>>
>> Thanks,
>> Calvin



-- 
Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Harsh J <ha...@cloudera.com>.

Does your non-HDFS filesystem implement a getBlockLocations API, that
MR relies on to know how to split files?

The API is at http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.FileStatus,
long, long), and MR calls it at
https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L392

If not, perhaps you can enforce a manual chunking by asking MR to use
custom min/max split sizes values via config properties:
https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L66

On Fri, Aug 15, 2014 at 10:16 AM, Calvin <ip...@gmail.com> wrote:
> I've looked a bit into this problem some more, and from what another
> person has written, HDFS is tuned to scale appropriately [1] given the
> number of input splits, etc.
>
> In the case of utilizing the local filesystem (which is really a
> network share on a parallel filesystem), the settings might be set
> conservatively in order not to thrash the local disks or present a
> bottleneck in processing.
>
> Since this isn't a big concern, I'd rather tune the settings to
> efficiently utilize the local filesystem.
>
> Are there any pointers to where in the source code I could look in
> order to tweak such parameters?
>
> Thanks,
> Calvin
>
> [1] https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems
>
> On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
>> Hi all,
>>
>> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
>> MapReduce applications will parallelize differently depending on what
>> kind of filesystem the input data is on.
>>
>> Using HDFS, a MapReduce job will spawn enough containers to maximize
>> use of all available memory. For example, a 3-node cluster with 172GB
>> of memory with each map task allocating 2GB, about 86 application
>> containers will be created.
>>
>> On a filesystem that isn't HDFS (like NFS or in my use case, a
>> parallel filesystem), a MapReduce job will only allocate a subset of
>> available tasks (e.g., with the same 3-node cluster, about 25-40
>> containers are created). Since I'm using a parallel filesystem, I'm
>> not as concerned with the bottlenecks one would find if one were to
>> use NFS.
>>
>> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
>> configuration that will allow me to effectively maximize resource
>> utilization?
>>
>> Thanks,
>> Calvin



-- 
Harsh J

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

I've looked a bit into this problem some more, and from what another
person has written, HDFS is tuned to scale appropriately [1] given the
number of input splits, etc.

In the case of utilizing the local filesystem (which is really a
network share on a parallel filesystem), the settings might be set
conservatively in order not to thrash the local disks or present a
bottleneck in processing.

Since this isn't a big concern, I'd rather tune the settings to
efficiently utilize the local filesystem.

Are there any pointers to where in the source code I could look in
order to tweak such parameters?

Thanks,
Calvin

[1] https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems

On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> Hi all,
>
> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> MapReduce applications will parallelize differently depending on what
> kind of filesystem the input data is on.
>
> Using HDFS, a MapReduce job will spawn enough containers to maximize
> use of all available memory. For example, a 3-node cluster with 172GB
> of memory with each map task allocating 2GB, about 86 application
> containers will be created.
>
> On a filesystem that isn't HDFS (like NFS or in my use case, a
> parallel filesystem), a MapReduce job will only allocate a subset of
> available tasks (e.g., with the same 3-node cluster, about 25-40
> containers are created). Since I'm using a parallel filesystem, I'm
> not as concerned with the bottlenecks one would find if one were to
> use NFS.
>
> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> configuration that will allow me to effectively maximize resource
> utilization?
>
> Thanks,
> Calvin

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

I've looked a bit into this problem some more, and from what another
person has written, HDFS is tuned to scale appropriately [1] given the
number of input splits, etc.

In the case of utilizing the local filesystem (which is really a
network share on a parallel filesystem), the settings might be set
conservatively in order not to thrash the local disks or present a
bottleneck in processing.

Since this isn't a big concern, I'd rather tune the settings to
efficiently utilize the local filesystem.

Are there any pointers to where in the source code I could look in
order to tweak such parameters?

Thanks,
Calvin

[1] https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems

On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> Hi all,
>
> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> MapReduce applications will parallelize differently depending on what
> kind of filesystem the input data is on.
>
> Using HDFS, a MapReduce job will spawn enough containers to maximize
> use of all available memory. For example, a 3-node cluster with 172GB
> of memory with each map task allocating 2GB, about 86 application
> containers will be created.
>
> On a filesystem that isn't HDFS (like NFS or in my use case, a
> parallel filesystem), a MapReduce job will only allocate a subset of
> available tasks (e.g., with the same 3-node cluster, about 25-40
> containers are created). Since I'm using a parallel filesystem, I'm
> not as concerned with the bottlenecks one would find if one were to
> use NFS.
>
> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> configuration that will allow me to effectively maximize resource
> utilization?
>
> Thanks,
> Calvin

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

I've looked a bit into this problem some more, and from what another
person has written, HDFS is tuned to scale appropriately [1] given the
number of input splits, etc.

In the case of utilizing the local filesystem (which is really a
network share on a parallel filesystem), the settings might be set
conservatively in order not to thrash the local disks or present a
bottleneck in processing.

Since this isn't a big concern, I'd rather tune the settings to
efficiently utilize the local filesystem.

Are there any pointers to where in the source code I could look in
order to tweak such parameters?

Thanks,
Calvin

[1] https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems

On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> Hi all,
>
> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> MapReduce applications will parallelize differently depending on what
> kind of filesystem the input data is on.
>
> Using HDFS, a MapReduce job will spawn enough containers to maximize
> use of all available memory. For example, a 3-node cluster with 172GB
> of memory with each map task allocating 2GB, about 86 application
> containers will be created.
>
> On a filesystem that isn't HDFS (like NFS or in my use case, a
> parallel filesystem), a MapReduce job will only allocate a subset of
> available tasks (e.g., with the same 3-node cluster, about 25-40
> containers are created). Since I'm using a parallel filesystem, I'm
> not as concerned with the bottlenecks one would find if one were to
> use NFS.
>
> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> configuration that will allow me to effectively maximize resource
> utilization?
>
> Thanks,
> Calvin

Re: hadoop/yarn and task parallelization on non-hdfs filesystems

Posted by Calvin <ip...@gmail.com>.

I've looked a bit into this problem some more, and from what another
person has written, HDFS is tuned to scale appropriately [1] given the
number of input splits, etc.

In the case of utilizing the local filesystem (which is really a
network share on a parallel filesystem), the settings might be set
conservatively in order not to thrash the local disks or present a
bottleneck in processing.

Since this isn't a big concern, I'd rather tune the settings to
efficiently utilize the local filesystem.

Are there any pointers to where in the source code I could look in
order to tweak such parameters?

Thanks,
Calvin

[1] https://stackoverflow.com/questions/25269964/hadoop-yarn-and-task-parallelization-on-non-hdfs-filesystems

On Tue, Aug 12, 2014 at 12:29 PM, Calvin <ip...@gmail.com> wrote:
> Hi all,
>
> I've instantiated a Hadoop 2.4.1 cluster and I've found that running
> MapReduce applications will parallelize differently depending on what
> kind of filesystem the input data is on.
>
> Using HDFS, a MapReduce job will spawn enough containers to maximize
> use of all available memory. For example, a 3-node cluster with 172GB
> of memory with each map task allocating 2GB, about 86 application
> containers will be created.
>
> On a filesystem that isn't HDFS (like NFS or in my use case, a
> parallel filesystem), a MapReduce job will only allocate a subset of
> available tasks (e.g., with the same 3-node cluster, about 25-40
> containers are created). Since I'm using a parallel filesystem, I'm
> not as concerned with the bottlenecks one would find if one were to
> use NFS.
>
> Is there a YARN (yarn-site.xml) or MapReduce (mapred-site.xml)
> configuration that will allow me to effectively maximize resource
> utilization?
>
> Thanks,
> Calvin