You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by Jay Vyas <ja...@gmail.com> on 2013/04/13 22:46:20 UTC

MapReduce Integration tests for test FileSystems

MapReduce has some very demanding file operations at at job submission
time, and is an important integration test for any hadoop compliant
FileSystem.

How in the hadoop HDFS source tree are mapreduce integration tests
implemented (also I'm interested in how S3, Swift, and other Hadoop
filesystem implementations) implement integration tests... if any clues.  I
haven't seen much in this way in the source code.

To distill my quesiton: how are new HDFS patches tested against the
MapReduce job flow --- ? --- are there some standalone vanilla MapReduce
jobs (Sorting, WordCount, etc..., that run as part of the HDFS build or is
there an HDFS-Mapreduce integration repository ?)


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: MapReduce Integration tests for test FileSystems

Posted by Steve Loughran <st...@hortonworks.com>.

On 13 April 2013 21:46, Jay Vyas <ja...@gmail.com> wrote:

> MapReduce has some very demanding file operations at at job submission
> time, and is an important integration test for any hadoop compliant
> FileSystem.
>
> How in the hadoop HDFS source tree are mapreduce integration tests
> implemented (also I'm interested in how S3, Swift, and other Hadoop
> filesystem implementations) implement integration tests... if any clues.  I
> haven't seen much in this way in the source code.
>

There are some functional tests for S3 that run by hand; when the swift
support goes in that will be skipped unless there is the file
test/resources/auth-keys.xml containing the login details.

We'd like to add more here in Bigtop, which is downstream enough you can
include stuff like Pig, Hive and HBase tests, using different filesystems
as source, destination and intermediate files. They should also allow the
option of creating big files, many files in a single dir and deeper
directories: scale problems.

> To distill my quesiton: how are new HDFS patches tested against the
> MapReduce job flow --- ? --- are there some standalone vanilla MapReduce
> jobs (Sorting, WordCount, etc..., that run as part of the HDFS build or is
> there an HDFS-Mapreduce integration repository ?)
>
>
The -common, HDFS and -mapred source trees are in a combined repo so this
is implicit, you just go "mvn clean test" at the root. Or have Jenkins do
it and send email when you break things

steve