You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Vincent Bouscasse (JIRA)" <ji...@apache.org> on 2009/11/30 14:25:20 UTC
[jira] Issue Comment Edited: (IO-170) Scalable Iterator for files, better than FileUtils.iterateFiles

    [ https://issues.apache.org/jira/browse/IO-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783648#action_12783648 ] 

Vincent Bouscasse edited comment on IO-170 at 11/30/09 1:23 PM:
----------------------------------------------------------------

Hi Matthew. 

I'm not sure i got the point but i think Damian was thinking about an iterator that allows for processing results on the fly as soon as they are available. The code in your patch does not allow this: we have to wait until the result files are found before the first file object can be used as a return of iterator.next(). It can be long if we search files in large directory trees.

I've written a recursive Iterator<File> implementation allowing to get the first matches as soon as they're discovered. The next match is computed in the hasNext() method and it uses linked lists to store matches and subdirectories. The complete iteration speed is the same as the actual one (Commons IO 1.4) but first results are provided more quickly. This iterator implementation typical usage is in a producer thread whereas the file processing is done in a consumer thread allowing to speeding up the file processing

I can provide you a code sample if it can match your needs.

Best regards.




      was (Author: vbouscasse):
    Hi Matthew. 

I'm not sure i got the point but i think Damian was thinking about an iterator that allows for processing results on the fly as soon as they are available. The code in your patch does not allow this: we have to wait until the result files are found before the first file object can be used as a return of iterator.next(). It can be long if we search files in large directory trees.

I've written a recursive Iterator<File> implementation allowing to get the first matches as soon as they're discovered. The next match is computed in the hasNext() method and it uses linked lists to store matches and subdirectories. The complete iteration speed is the same as the actual one but first results are provided more quickly. This iterator implementation typical usage is in a producer thread whereas the file processing is done in a consumer thread allowing to speeding up the file processing

I can provide you a code sample if it can match your needs.

Best regards.



  
> Scalable Iterator for files, better than FileUtils.iterateFiles
> ---------------------------------------------------------------
>
>                 Key: IO-170
>                 URL: https://issues.apache.org/jira/browse/IO-170
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.4
>         Environment: generic file systems
>            Reporter: Damian Noseda
>            Priority: Minor
>             Fix For: 2.x
>
>         Attachments: real_iterators.patch
>
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> Improve the way that iterateFiles generate an iterator. The current way it not scale. It's try to add all files in a list and then return the iterator of that list. A better way it would be create an customize Iterator<File> with a stack of arrays of File to go up and down in the directory tree.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.