You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Christian Schneider <cs...@gmail.com> on 2013/08/12 21:33:18 UTC
How to tune fileSystem.listFiles("/", true) if you like walk though
almost all files
Hi, is there a way to tune this?
I walk though the files with:
RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
Path(uri), true);
while(listFiles.hasNext()) {
listFiles.next();
};
I need to get some information about those files, therefore i like to scan
them all.
Is there any way to tune the listFiles.next() call. Like loading a bunch of
files, or multithreading it?
Best Regards,
Christian.
Re: How to tune fileSystem.listFiles("/", true) if you like walk
though almost all files
Posted by Christian Schneider <cs...@gmail.com>.
Hi, i found out that it works much faster with fileSystem.listStatus() and
a recursion by hand.
listFiles = 4021 Files in 14.27 s
listStatus = 4021 Files 364.3 ms
Currently i just tested it on localhost. Tomorrow I check it against the
cluster.
public class Main
{
static AtomicInteger count = new AtomicInteger();
static URI uri;
static FileSystem fileSystem;
public static void main(final String... args) throws URISyntaxException,
IOException, InterruptedException
{
uri = new URI("/home/christian/Documents");
fileSystem = FileSystem.get(uri, new Configuration(), "hdfs");
Stopwatch stopwatch = new Stopwatch();
stopwatch.start();
withListAllAndNext();
stopwatch.stop();
System.out.println(count + " - " + stopwatch);
stopwatch.reset();
count.set(0);
stopwatch.start();
blockwiseWithRecursion(fileSystem.listStatus(new Path(uri)));
stopwatch.stop();
System.out.println(count + " - " + stopwatch);
}
private static void blockwiseWithRecursion(FileStatus... listLocatedStatus)
throws FileNotFoundException, IOException
{
for (FileStatus fileStatus : listLocatedStatus)
{
if (fileStatus.isDirectory())
blockwiseWithRecursion(fileSystem.listStatus(fileStatus.getPath()));
else
count.incrementAndGet();
}
}
private static void withListAllAndNext() throws FileNotFoundException,
IOException
{
RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
Path(uri), true);
while (true)
{
try
{
LocatedFileStatus next = listFiles.next();
count.incrementAndGet();
}
catch (IOException e)
{
System.err.println(e.getMessage());
}
catch (NoSuchElementException e)
{
break;
}
}
}
}
2013/8/12 Christian Schneider <cs...@gmail.com>
> Hi, is there a way to tune this?
>
> I walk though the files with:
>
> RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
> Path(uri), true);
> while(listFiles.hasNext()) {
> listFiles.next();
> };
>
> I need to get some information about those files, therefore i like to scan
> them all.
>
> Is there any way to tune the listFiles.next() call. Like loading a bunch
> of files, or multithreading it?
>
> Best Regards,
> Christian.
>
Re: How to tune fileSystem.listFiles("/", true) if you like walk
though almost all files
Posted by Christian Schneider <cs...@gmail.com>.
Hi, i found out that it works much faster with fileSystem.listStatus() and
a recursion by hand.
listFiles = 4021 Files in 14.27 s
listStatus = 4021 Files 364.3 ms
Currently i just tested it on localhost. Tomorrow I check it against the
cluster.
public class Main
{
static AtomicInteger count = new AtomicInteger();
static URI uri;
static FileSystem fileSystem;
public static void main(final String... args) throws URISyntaxException,
IOException, InterruptedException
{
uri = new URI("/home/christian/Documents");
fileSystem = FileSystem.get(uri, new Configuration(), "hdfs");
Stopwatch stopwatch = new Stopwatch();
stopwatch.start();
withListAllAndNext();
stopwatch.stop();
System.out.println(count + " - " + stopwatch);
stopwatch.reset();
count.set(0);
stopwatch.start();
blockwiseWithRecursion(fileSystem.listStatus(new Path(uri)));
stopwatch.stop();
System.out.println(count + " - " + stopwatch);
}
private static void blockwiseWithRecursion(FileStatus... listLocatedStatus)
throws FileNotFoundException, IOException
{
for (FileStatus fileStatus : listLocatedStatus)
{
if (fileStatus.isDirectory())
blockwiseWithRecursion(fileSystem.listStatus(fileStatus.getPath()));
else
count.incrementAndGet();
}
}
private static void withListAllAndNext() throws FileNotFoundException,
IOException
{
RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
Path(uri), true);
while (true)
{
try
{
LocatedFileStatus next = listFiles.next();
count.incrementAndGet();
}
catch (IOException e)
{
System.err.println(e.getMessage());
}
catch (NoSuchElementException e)
{
break;
}
}
}
}
2013/8/12 Christian Schneider <cs...@gmail.com>
> Hi, is there a way to tune this?
>
> I walk though the files with:
>
> RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
> Path(uri), true);
> while(listFiles.hasNext()) {
> listFiles.next();
> };
>
> I need to get some information about those files, therefore i like to scan
> them all.
>
> Is there any way to tune the listFiles.next() call. Like loading a bunch
> of files, or multithreading it?
>
> Best Regards,
> Christian.
>
Re: How to tune fileSystem.listFiles("/", true) if you like walk
though almost all files
Posted by Christian Schneider <cs...@gmail.com>.
Hi, i found out that it works much faster with fileSystem.listStatus() and
a recursion by hand.
listFiles = 4021 Files in 14.27 s
listStatus = 4021 Files 364.3 ms
Currently i just tested it on localhost. Tomorrow I check it against the
cluster.
public class Main
{
static AtomicInteger count = new AtomicInteger();
static URI uri;
static FileSystem fileSystem;
public static void main(final String... args) throws URISyntaxException,
IOException, InterruptedException
{
uri = new URI("/home/christian/Documents");
fileSystem = FileSystem.get(uri, new Configuration(), "hdfs");
Stopwatch stopwatch = new Stopwatch();
stopwatch.start();
withListAllAndNext();
stopwatch.stop();
System.out.println(count + " - " + stopwatch);
stopwatch.reset();
count.set(0);
stopwatch.start();
blockwiseWithRecursion(fileSystem.listStatus(new Path(uri)));
stopwatch.stop();
System.out.println(count + " - " + stopwatch);
}
private static void blockwiseWithRecursion(FileStatus... listLocatedStatus)
throws FileNotFoundException, IOException
{
for (FileStatus fileStatus : listLocatedStatus)
{
if (fileStatus.isDirectory())
blockwiseWithRecursion(fileSystem.listStatus(fileStatus.getPath()));
else
count.incrementAndGet();
}
}
private static void withListAllAndNext() throws FileNotFoundException,
IOException
{
RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
Path(uri), true);
while (true)
{
try
{
LocatedFileStatus next = listFiles.next();
count.incrementAndGet();
}
catch (IOException e)
{
System.err.println(e.getMessage());
}
catch (NoSuchElementException e)
{
break;
}
}
}
}
2013/8/12 Christian Schneider <cs...@gmail.com>
> Hi, is there a way to tune this?
>
> I walk though the files with:
>
> RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
> Path(uri), true);
> while(listFiles.hasNext()) {
> listFiles.next();
> };
>
> I need to get some information about those files, therefore i like to scan
> them all.
>
> Is there any way to tune the listFiles.next() call. Like loading a bunch
> of files, or multithreading it?
>
> Best Regards,
> Christian.
>
Re: How to tune fileSystem.listFiles("/", true) if you like walk
though almost all files
Posted by Christian Schneider <cs...@gmail.com>.
Hi, i found out that it works much faster with fileSystem.listStatus() and
a recursion by hand.
listFiles = 4021 Files in 14.27 s
listStatus = 4021 Files 364.3 ms
Currently i just tested it on localhost. Tomorrow I check it against the
cluster.
public class Main
{
static AtomicInteger count = new AtomicInteger();
static URI uri;
static FileSystem fileSystem;
public static void main(final String... args) throws URISyntaxException,
IOException, InterruptedException
{
uri = new URI("/home/christian/Documents");
fileSystem = FileSystem.get(uri, new Configuration(), "hdfs");
Stopwatch stopwatch = new Stopwatch();
stopwatch.start();
withListAllAndNext();
stopwatch.stop();
System.out.println(count + " - " + stopwatch);
stopwatch.reset();
count.set(0);
stopwatch.start();
blockwiseWithRecursion(fileSystem.listStatus(new Path(uri)));
stopwatch.stop();
System.out.println(count + " - " + stopwatch);
}
private static void blockwiseWithRecursion(FileStatus... listLocatedStatus)
throws FileNotFoundException, IOException
{
for (FileStatus fileStatus : listLocatedStatus)
{
if (fileStatus.isDirectory())
blockwiseWithRecursion(fileSystem.listStatus(fileStatus.getPath()));
else
count.incrementAndGet();
}
}
private static void withListAllAndNext() throws FileNotFoundException,
IOException
{
RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
Path(uri), true);
while (true)
{
try
{
LocatedFileStatus next = listFiles.next();
count.incrementAndGet();
}
catch (IOException e)
{
System.err.println(e.getMessage());
}
catch (NoSuchElementException e)
{
break;
}
}
}
}
2013/8/12 Christian Schneider <cs...@gmail.com>
> Hi, is there a way to tune this?
>
> I walk though the files with:
>
> RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(new
> Path(uri), true);
> while(listFiles.hasNext()) {
> listFiles.next();
> };
>
> I need to get some information about those files, therefore i like to scan
> them all.
>
> Is there any way to tune the listFiles.next() call. Like loading a bunch
> of files, or multithreading it?
>
> Best Regards,
> Christian.
>