You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Scott Ganyo (JIRA)" <ji...@apache.org> on 2010/05/15 03:13:42 UTC

[jira] Created: (MAHOUT-395) Using KMeansDriver leaves open files and can lead to FileNotFoundException - "too many open files" error

Using KMeansDriver leaves open files and can lead to FileNotFoundException - "too many open files" error
--------------------------------------------------------------------------------------------------------

                 Key: MAHOUT-395
                 URL: https://issues.apache.org/jira/browse/MAHOUT-395
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.3, 0.2, 0.1, 0.4
            Reporter: Scott Ganyo
            Priority: Critical


KMeansDriver uses isConverged() method to determine if the k-means clustering run is complete.  isConverged() has to open each SequenceFIle and read each cluster to see if the containing cluster is converged.  During this process the readers are not explicitly closed, so in the case where there are a large number of sequence files opened, the driving system may run out of file handles before they are eventually implicitly reclaimed.  I'm attaching a patch that explicitly closes these files as they are no longer needed to remain open.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-395) Using KMeansDriver leaves open files and can lead to FileNotFoundException - "too many open files" error

Posted by "Scott Ganyo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Ganyo updated MAHOUT-395:
-------------------------------

    Attachment: KMeansDriver.patch

> Using KMeansDriver leaves open files and can lead to FileNotFoundException - "too many open files" error
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-395
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-395
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.1, 0.2, 0.3, 0.4
>            Reporter: Scott Ganyo
>            Priority: Critical
>         Attachments: KMeansDriver.patch
>
>
> KMeansDriver uses isConverged() method to determine if the k-means clustering run is complete.  isConverged() has to open each SequenceFIle and read each cluster to see if the containing cluster is converged.  During this process the readers are not explicitly closed, so in the case where there are a large number of sequence files opened, the driving system may run out of file handles before they are eventually implicitly reclaimed.  I'm attaching a patch that explicitly closes these files as they are no longer needed to remain open.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (MAHOUT-395) Using KMeansDriver leaves open files and can lead to FileNotFoundException - "too many open files" error

Posted by "Drew Farris (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867782#action_12867782 ] 

Drew Farris edited comment on MAHOUT-395 at 5/14/10 10:45 PM:
--------------------------------------------------------------

applied in r944550, with minor revisions. Thanks for the patch.

      was (Author: drew.farris):
    applied in r944550, with minor revisions.
  
> Using KMeansDriver leaves open files and can lead to FileNotFoundException - "too many open files" error
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-395
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-395
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.1, 0.2, 0.3, 0.4
>            Reporter: Scott Ganyo
>            Assignee: Drew Farris
>            Priority: Critical
>             Fix For: 0.4
>
>         Attachments: KMeansDriver.patch
>
>
> KMeansDriver uses isConverged() method to determine if the k-means clustering run is complete.  isConverged() has to open each SequenceFIle and read each cluster to see if the containing cluster is converged.  During this process the readers are not explicitly closed, so in the case where there are a large number of sequence files opened, the driving system may run out of file handles before they are eventually implicitly reclaimed.  I'm attaching a patch that explicitly closes these files as they are no longer needed to remain open.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-395) Using KMeansDriver leaves open files and can lead to FileNotFoundException - "too many open files" error

Posted by "Drew Farris (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Drew Farris updated MAHOUT-395:
-------------------------------

           Status: Resolved  (was: Patch Available)
         Assignee: Drew Farris
    Fix Version/s: 0.4
       Resolution: Fixed

applied in r944550, with minor revisions.

> Using KMeansDriver leaves open files and can lead to FileNotFoundException - "too many open files" error
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-395
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-395
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.1, 0.2, 0.3, 0.4
>            Reporter: Scott Ganyo
>            Assignee: Drew Farris
>            Priority: Critical
>             Fix For: 0.4
>
>         Attachments: KMeansDriver.patch
>
>
> KMeansDriver uses isConverged() method to determine if the k-means clustering run is complete.  isConverged() has to open each SequenceFIle and read each cluster to see if the containing cluster is converged.  During this process the readers are not explicitly closed, so in the case where there are a large number of sequence files opened, the driving system may run out of file handles before they are eventually implicitly reclaimed.  I'm attaching a patch that explicitly closes these files as they are no longer needed to remain open.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-395) Using KMeansDriver leaves open files and can lead to FileNotFoundException - "too many open files" error

Posted by "Scott Ganyo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Ganyo updated MAHOUT-395:
-------------------------------

    Status: Patch Available  (was: Open)

> Using KMeansDriver leaves open files and can lead to FileNotFoundException - "too many open files" error
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-395
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-395
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.3, 0.2, 0.1, 0.4
>            Reporter: Scott Ganyo
>            Priority: Critical
>
> KMeansDriver uses isConverged() method to determine if the k-means clustering run is complete.  isConverged() has to open each SequenceFIle and read each cluster to see if the containing cluster is converged.  During this process the readers are not explicitly closed, so in the case where there are a large number of sequence files opened, the driving system may run out of file handles before they are eventually implicitly reclaimed.  I'm attaching a patch that explicitly closes these files as they are no longer needed to remain open.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.