You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Christian Schneider <cs...@gmail.com> on 2013/08/12 21:33:18 UTC

How to tune fileSystem.listFiles("/", true) if you like walk though almost all files

Hi, is there a way to tune this?

I walk though the files with:

RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
Path(uri), true);
while(listFiles.hasNext()) {
    listFiles.next();
};

I need to get some information about those files, therefore i like to scan
them all.

Is there any way to tune the listFiles.next() call. Like loading a bunch of
files, or multithreading it?

Best Regards,
Christian.

Re: How to tune fileSystem.listFiles("/", true) if you like walk though almost all files

Posted by Christian Schneider <cs...@gmail.com>.
Hi, i found out that it works much faster with fileSystem.listStatus() and
a recursion by hand.

listFiles    = 4021 Files in 14.27 s
listStatus = 4021 Files 364.3 ms

Currently i just tested it on localhost. Tomorrow I check it against the
cluster.

public class Main
{
static AtomicInteger count = new AtomicInteger();

static URI uri;
static FileSystem fileSystem;

public static void main(final String... args) throws URISyntaxException,
IOException, InterruptedException
{
uri = new URI("/home/christian/Documents");
fileSystem = FileSystem.get(uri, new Configuration(), "hdfs");

Stopwatch stopwatch = new Stopwatch();

stopwatch.start();
withListAllAndNext();
stopwatch.stop();
System.out.println(count + " - " + stopwatch);

stopwatch.reset();
count.set(0);
 stopwatch.start();
blockwiseWithRecursion(fileSystem.listStatus(new Path(uri)));
stopwatch.stop();
System.out.println(count + " - " + stopwatch);
}

private static void blockwiseWithRecursion(FileStatus... listLocatedStatus)
throws FileNotFoundException, IOException
{
for (FileStatus fileStatus : listLocatedStatus)
{
if (fileStatus.isDirectory())
blockwiseWithRecursion(fileSystem.listStatus(fileStatus.getPath()));
else
count.incrementAndGet();
}

}

private static void withListAllAndNext() throws FileNotFoundException,
IOException
{
RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
Path(uri), true);
 while (true)
{
try
{
LocatedFileStatus next = listFiles.next();
count.incrementAndGet();
}
catch (IOException e)
{
System.err.println(e.getMessage());
}
catch (NoSuchElementException e)
{
break;
}
}
}
}


2013/8/12 Christian Schneider <cs...@gmail.com>

> Hi, is there a way to tune this?
>
> I walk though the files with:
>
> RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
> Path(uri), true);
> while(listFiles.hasNext()) {
>     listFiles.next();
> };
>
> I need to get some information about those files, therefore i like to scan
> them all.
>
> Is there any way to tune the listFiles.next() call. Like loading a bunch
> of files, or multithreading it?
>
> Best Regards,
> Christian.
>

Re: How to tune fileSystem.listFiles("/", true) if you like walk though almost all files

Posted by Christian Schneider <cs...@gmail.com>.
Hi, i found out that it works much faster with fileSystem.listStatus() and
a recursion by hand.

listFiles    = 4021 Files in 14.27 s
listStatus = 4021 Files 364.3 ms

Currently i just tested it on localhost. Tomorrow I check it against the
cluster.

public class Main
{
static AtomicInteger count = new AtomicInteger();

static URI uri;
static FileSystem fileSystem;

public static void main(final String... args) throws URISyntaxException,
IOException, InterruptedException
{
uri = new URI("/home/christian/Documents");
fileSystem = FileSystem.get(uri, new Configuration(), "hdfs");

Stopwatch stopwatch = new Stopwatch();

stopwatch.start();
withListAllAndNext();
stopwatch.stop();
System.out.println(count + " - " + stopwatch);

stopwatch.reset();
count.set(0);
 stopwatch.start();
blockwiseWithRecursion(fileSystem.listStatus(new Path(uri)));
stopwatch.stop();
System.out.println(count + " - " + stopwatch);
}

private static void blockwiseWithRecursion(FileStatus... listLocatedStatus)
throws FileNotFoundException, IOException
{
for (FileStatus fileStatus : listLocatedStatus)
{
if (fileStatus.isDirectory())
blockwiseWithRecursion(fileSystem.listStatus(fileStatus.getPath()));
else
count.incrementAndGet();
}

}

private static void withListAllAndNext() throws FileNotFoundException,
IOException
{
RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
Path(uri), true);
 while (true)
{
try
{
LocatedFileStatus next = listFiles.next();
count.incrementAndGet();
}
catch (IOException e)
{
System.err.println(e.getMessage());
}
catch (NoSuchElementException e)
{
break;
}
}
}
}


2013/8/12 Christian Schneider <cs...@gmail.com>

> Hi, is there a way to tune this?
>
> I walk though the files with:
>
> RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
> Path(uri), true);
> while(listFiles.hasNext()) {
>     listFiles.next();
> };
>
> I need to get some information about those files, therefore i like to scan
> them all.
>
> Is there any way to tune the listFiles.next() call. Like loading a bunch
> of files, or multithreading it?
>
> Best Regards,
> Christian.
>

Re: How to tune fileSystem.listFiles("/", true) if you like walk though almost all files

Posted by Christian Schneider <cs...@gmail.com>.
Hi, i found out that it works much faster with fileSystem.listStatus() and
a recursion by hand.

listFiles    = 4021 Files in 14.27 s
listStatus = 4021 Files 364.3 ms

Currently i just tested it on localhost. Tomorrow I check it against the
cluster.

public class Main
{
static AtomicInteger count = new AtomicInteger();

static URI uri;
static FileSystem fileSystem;

public static void main(final String... args) throws URISyntaxException,
IOException, InterruptedException
{
uri = new URI("/home/christian/Documents");
fileSystem = FileSystem.get(uri, new Configuration(), "hdfs");

Stopwatch stopwatch = new Stopwatch();

stopwatch.start();
withListAllAndNext();
stopwatch.stop();
System.out.println(count + " - " + stopwatch);

stopwatch.reset();
count.set(0);
 stopwatch.start();
blockwiseWithRecursion(fileSystem.listStatus(new Path(uri)));
stopwatch.stop();
System.out.println(count + " - " + stopwatch);
}

private static void blockwiseWithRecursion(FileStatus... listLocatedStatus)
throws FileNotFoundException, IOException
{
for (FileStatus fileStatus : listLocatedStatus)
{
if (fileStatus.isDirectory())
blockwiseWithRecursion(fileSystem.listStatus(fileStatus.getPath()));
else
count.incrementAndGet();
}

}

private static void withListAllAndNext() throws FileNotFoundException,
IOException
{
RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
Path(uri), true);
 while (true)
{
try
{
LocatedFileStatus next = listFiles.next();
count.incrementAndGet();
}
catch (IOException e)
{
System.err.println(e.getMessage());
}
catch (NoSuchElementException e)
{
break;
}
}
}
}


2013/8/12 Christian Schneider <cs...@gmail.com>

> Hi, is there a way to tune this?
>
> I walk though the files with:
>
> RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
> Path(uri), true);
> while(listFiles.hasNext()) {
>     listFiles.next();
> };
>
> I need to get some information about those files, therefore i like to scan
> them all.
>
> Is there any way to tune the listFiles.next() call. Like loading a bunch
> of files, or multithreading it?
>
> Best Regards,
> Christian.
>

Re: How to tune fileSystem.listFiles("/", true) if you like walk though almost all files

Posted by Christian Schneider <cs...@gmail.com>.
Hi, i found out that it works much faster with fileSystem.listStatus() and
a recursion by hand.

listFiles    = 4021 Files in 14.27 s
listStatus = 4021 Files 364.3 ms

Currently i just tested it on localhost. Tomorrow I check it against the
cluster.

public class Main
{
static AtomicInteger count = new AtomicInteger();

static URI uri;
static FileSystem fileSystem;

public static void main(final String... args) throws URISyntaxException,
IOException, InterruptedException
{
uri = new URI("/home/christian/Documents");
fileSystem = FileSystem.get(uri, new Configuration(), "hdfs");

Stopwatch stopwatch = new Stopwatch();

stopwatch.start();
withListAllAndNext();
stopwatch.stop();
System.out.println(count + " - " + stopwatch);

stopwatch.reset();
count.set(0);
 stopwatch.start();
blockwiseWithRecursion(fileSystem.listStatus(new Path(uri)));
stopwatch.stop();
System.out.println(count + " - " + stopwatch);
}

private static void blockwiseWithRecursion(FileStatus... listLocatedStatus)
throws FileNotFoundException, IOException
{
for (FileStatus fileStatus : listLocatedStatus)
{
if (fileStatus.isDirectory())
blockwiseWithRecursion(fileSystem.listStatus(fileStatus.getPath()));
else
count.incrementAndGet();
}

}

private static void withListAllAndNext() throws FileNotFoundException,
IOException
{
RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
Path(uri), true);
 while (true)
{
try
{
LocatedFileStatus next = listFiles.next();
count.incrementAndGet();
}
catch (IOException e)
{
System.err.println(e.getMessage());
}
catch (NoSuchElementException e)
{
break;
}
}
}
}


2013/8/12 Christian Schneider <cs...@gmail.com>

> Hi, is there a way to tune this?
>
> I walk though the files with:
>
> RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
> Path(uri), true);
> while(listFiles.hasNext()) {
>     listFiles.next();
> };
>
> I need to get some information about those files, therefore i like to scan
> them all.
>
> Is there any way to tune the listFiles.next() call. Like loading a bunch
> of files, or multithreading it?
>
> Best Regards,
> Christian.
>