You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Damian Noseda (JIRA)" <ji...@apache.org> on 2008/05/19 20:09:56 UTC

[jira] Created: (IO-170) Scalable Iterator for files, better than FileUtils.iterateFiles

Scalable Iterator for files, better than FileUtils.iterateFiles
---------------------------------------------------------------

                 Key: IO-170
                 URL: https://issues.apache.org/jira/browse/IO-170
             Project: Commons IO
          Issue Type: Improvement
          Components: Utilities
    Affects Versions: 1.4
         Environment: generic file systems
            Reporter: Damian Noseda
            Priority: Minor
             Fix For: 1.4


Improve the way that iterateFiles generate an iterator. The current way it not scale. It's try to add all files in a list and then return the iterator of that list. A better way it would be create an customize Iterator<File> with a stack of arrays of File to go up and down in the directory tree.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (IO-170) Scalable Iterator for files, better than FileUtils.iterateFiles

Posted by "Vincent Bouscasse (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/IO-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783648#action_12783648 ] 

Vincent Bouscasse commented on IO-170:
--------------------------------------

Hi Matthew. 

I'm not sure i got the point but i think Damian was thinking about an iterator that allows for processing results on the fly as soon as they are available. The code in your patch does not allow this: we have to wait until the result files are found before the first file object can be used as a return of iterator.next(). It can be long if we search files in large directory trees.

I've written a recursive Iterator<File> implementation allowing to get the first matches as soon as they're discovered. The next match is computed in the hasNext() method and it uses linked lists to store matches and subdirectories. The complete iteration speed is the same as the actual one but first results are provided more quickly. This iterator implementation typical usage is in a producer thread whereas the file processing is done in a consumer thread allowing to speeding up the file processing

I can provide you a code sample if it can match your needs.

Best regards.




> Scalable Iterator for files, better than FileUtils.iterateFiles
> ---------------------------------------------------------------
>
>                 Key: IO-170
>                 URL: https://issues.apache.org/jira/browse/IO-170
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.4
>         Environment: generic file systems
>            Reporter: Damian Noseda
>            Priority: Minor
>             Fix For: 2.x
>
>         Attachments: real_iterators.patch
>
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> Improve the way that iterateFiles generate an iterator. The current way it not scale. It's try to add all files in a list and then return the iterator of that list. A better way it would be create an customize Iterator<File> with a stack of arrays of File to go up and down in the directory tree.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (IO-170) Scalable Iterator for files, better than FileUtils.iterateFiles

Posted by "Matthew Flaschen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/IO-170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew Flaschen updated IO-170:
--------------------------------

    Attachment: real_iterators.patch

Okay, this is a first draft of a direct iterator implementation of iterateFiles.  It basically uses the same traversal technique as the existing functions (and borrows code from them), but it doesn't create any LinkedList.  It uses chains of iterators, specifically apache.commons.collections.iterators.IteratorChain and apache.commons.collections.iterators.ObjectArrayIterator (if this dependency is unacceptable, neither of these are overly complex classes, so they can be reimplemented or imported).

As is the code is a bit redundant, because the list code is doing the same thing as the iterator code.  Once the iterator code is tested and considered correct, the list functions can be implemented using iterators.  E.g.:

public static Collection<File> listFiles(
            File directory, IOFileFilter fileFilter, IOFileFilter dirFilter) {
Iterator<File> iter = iterateFiles(directory, fileFilter, dirFilter);
LinkedList<File> list = new LinkedList<File>();
while(iter.hasNext())
list.add(iter.next());
return list;
}

or similar.  I'm glad to refine the patch more as needed.

> Scalable Iterator for files, better than FileUtils.iterateFiles
> ---------------------------------------------------------------
>
>                 Key: IO-170
>                 URL: https://issues.apache.org/jira/browse/IO-170
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.4
>         Environment: generic file systems
>            Reporter: Damian Noseda
>            Priority: Minor
>             Fix For: 2.x
>
>         Attachments: real_iterators.patch
>
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> Improve the way that iterateFiles generate an iterator. The current way it not scale. It's try to add all files in a list and then return the iterator of that list. A better way it would be create an customize Iterator<File> with a stack of arrays of File to go up and down in the directory tree.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (IO-170) Scalable Iterator for files, better than FileUtils.iterateFiles

Posted by "Vincent Bouscasse (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/IO-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783648#action_12783648 ] 

Vincent Bouscasse edited comment on IO-170 at 11/30/09 1:23 PM:
----------------------------------------------------------------

Hi Matthew. 

I'm not sure i got the point but i think Damian was thinking about an iterator that allows for processing results on the fly as soon as they are available. The code in your patch does not allow this: we have to wait until the result files are found before the first file object can be used as a return of iterator.next(). It can be long if we search files in large directory trees.

I've written a recursive Iterator<File> implementation allowing to get the first matches as soon as they're discovered. The next match is computed in the hasNext() method and it uses linked lists to store matches and subdirectories. The complete iteration speed is the same as the actual one (Commons IO 1.4) but first results are provided more quickly. This iterator implementation typical usage is in a producer thread whereas the file processing is done in a consumer thread allowing to speeding up the file processing

I can provide you a code sample if it can match your needs.

Best regards.




      was (Author: vbouscasse):
    Hi Matthew. 

I'm not sure i got the point but i think Damian was thinking about an iterator that allows for processing results on the fly as soon as they are available. The code in your patch does not allow this: we have to wait until the result files are found before the first file object can be used as a return of iterator.next(). It can be long if we search files in large directory trees.

I've written a recursive Iterator<File> implementation allowing to get the first matches as soon as they're discovered. The next match is computed in the hasNext() method and it uses linked lists to store matches and subdirectories. The complete iteration speed is the same as the actual one but first results are provided more quickly. This iterator implementation typical usage is in a producer thread whereas the file processing is done in a consumer thread allowing to speeding up the file processing

I can provide you a code sample if it can match your needs.

Best regards.



  
> Scalable Iterator for files, better than FileUtils.iterateFiles
> ---------------------------------------------------------------
>
>                 Key: IO-170
>                 URL: https://issues.apache.org/jira/browse/IO-170
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.4
>         Environment: generic file systems
>            Reporter: Damian Noseda
>            Priority: Minor
>             Fix For: 2.x
>
>         Attachments: real_iterators.patch
>
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> Improve the way that iterateFiles generate an iterator. The current way it not scale. It's try to add all files in a list and then return the iterator of that list. A better way it would be create an customize Iterator<File> with a stack of arrays of File to go up and down in the directory tree.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (IO-170) Scalable Iterator for files, better than FileUtils.iterateFiles

Posted by "Niall Pemberton (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/IO-170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Niall Pemberton updated IO-170:
-------------------------------

    Fix Version/s:     (was: 1.4)
                   2.x

Needs someone to put forward a patch

> Scalable Iterator for files, better than FileUtils.iterateFiles
> ---------------------------------------------------------------
>
>                 Key: IO-170
>                 URL: https://issues.apache.org/jira/browse/IO-170
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.4
>         Environment: generic file systems
>            Reporter: Damian Noseda
>            Priority: Minor
>             Fix For: 2.x
>
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> Improve the way that iterateFiles generate an iterator. The current way it not scale. It's try to add all files in a list and then return the iterator of that list. A better way it would be create an customize Iterator<File> with a stack of arrays of File to go up and down in the directory tree.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.