You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2011/02/21 12:01:38 UTC

[jira] Created: (MAHOUT-614) org.apache.mahout.classifier.baytes.MultipleOutputFormat not working as intended with Hadoop 0.20?

org.apache.mahout.classifier.baytes.MultipleOutputFormat not working as intended with Hadoop 0.20?
--------------------------------------------------------------------------------------------------

                 Key: MAHOUT-614
                 URL: https://issues.apache.org/jira/browse/MAHOUT-614
             Project: Mahout
          Issue Type: Bug
          Components: Classification
    Affects Versions: 0.4
            Reporter: Sean Owen
            Assignee: Robin Anil
             Fix For: 0.5


I believe there might be an error in org.apache.mahout.classifier.baytes.MultipleOutputFormat. It overrides the Hadoop class FileOutputFormat, and most of its work is done in getRecordWriter(FileSystem, Configuration, String, Progressable). However this is not the method that one must override to control how FileOutputFormat writes records; that's getRecordWriter(TaskAttemptContext). My hunch is that this used to work, but against the Hadoop 0.19.x APIs. (@Override is our friend!)

I've attached a patch that I believe addresses this and along the way is able to clean things up slightly. Am I on track here?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (MAHOUT-614) org.apache.mahout.classifier.baytes.MultipleOutputFormat not working as intended with Hadoop 0.20?

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-614:
-----------------------------

    Attachment: MAHOUT-614.patch

> org.apache.mahout.classifier.baytes.MultipleOutputFormat not working as intended with Hadoop 0.20?
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-614
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-614
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.4
>            Reporter: Sean Owen
>            Assignee: Robin Anil
>              Labels: hadoop, multiple, output
>             Fix For: 0.5
>
>         Attachments: MAHOUT-614.patch
>
>
> I believe there might be an error in org.apache.mahout.classifier.baytes.MultipleOutputFormat. It overrides the Hadoop class FileOutputFormat, and most of its work is done in getRecordWriter(FileSystem, Configuration, String, Progressable). However this is not the method that one must override to control how FileOutputFormat writes records; that's getRecordWriter(TaskAttemptContext). My hunch is that this used to work, but against the Hadoop 0.19.x APIs. (@Override is our friend!)
> I've attached a patch that I believe addresses this and along the way is able to clean things up slightly. Am I on track here?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (MAHOUT-614) org.apache.mahout.classifier.baytes.MultipleOutputFormat not working as intended with Hadoop 0.20?

Posted by "Robin Anil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997403#comment-12997403 ] 

Robin Anil commented on MAHOUT-614:
-----------------------------------

I will take a look from home. Patch seems alright, just have to verify the functionality.

> org.apache.mahout.classifier.baytes.MultipleOutputFormat not working as intended with Hadoop 0.20?
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-614
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-614
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.4
>            Reporter: Sean Owen
>            Assignee: Robin Anil
>              Labels: hadoop, multiple, output
>             Fix For: 0.5
>
>         Attachments: MAHOUT-614.patch
>
>
> I believe there might be an error in org.apache.mahout.classifier.baytes.MultipleOutputFormat. It overrides the Hadoop class FileOutputFormat, and most of its work is done in getRecordWriter(FileSystem, Configuration, String, Progressable). However this is not the method that one must override to control how FileOutputFormat writes records; that's getRecordWriter(TaskAttemptContext). My hunch is that this used to work, but against the Hadoop 0.19.x APIs. (@Override is our friend!)
> I've attached a patch that I believe addresses this and along the way is able to clean things up slightly. Am I on track here?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (MAHOUT-614) org.apache.mahout.classifier.baytes.MultipleOutputFormat not working as intended with Hadoop 0.20?

Posted by "Robin Anil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998033#comment-12998033 ] 

Robin Anil commented on MAHOUT-614:
-----------------------------------

Verified using 20newsgroups. Looks good to commit.

> org.apache.mahout.classifier.baytes.MultipleOutputFormat not working as intended with Hadoop 0.20?
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-614
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-614
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.4
>            Reporter: Sean Owen
>            Assignee: Robin Anil
>              Labels: hadoop, multiple, output
>             Fix For: 0.5
>
>         Attachments: MAHOUT-614.patch
>
>
> I believe there might be an error in org.apache.mahout.classifier.baytes.MultipleOutputFormat. It overrides the Hadoop class FileOutputFormat, and most of its work is done in getRecordWriter(FileSystem, Configuration, String, Progressable). However this is not the method that one must override to control how FileOutputFormat writes records; that's getRecordWriter(TaskAttemptContext). My hunch is that this used to work, but against the Hadoop 0.19.x APIs. (@Override is our friend!)
> I've attached a patch that I believe addresses this and along the way is able to clean things up slightly. Am I on track here?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Resolved: (MAHOUT-614) org.apache.mahout.classifier.baytes.MultipleOutputFormat not working as intended with Hadoop 0.20?

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-614.
------------------------------

    Resolution: Fixed

> org.apache.mahout.classifier.baytes.MultipleOutputFormat not working as intended with Hadoop 0.20?
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-614
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-614
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.4
>            Reporter: Sean Owen
>            Assignee: Robin Anil
>              Labels: hadoop, multiple, output
>             Fix For: 0.5
>
>         Attachments: MAHOUT-614.patch
>
>
> I believe there might be an error in org.apache.mahout.classifier.baytes.MultipleOutputFormat. It overrides the Hadoop class FileOutputFormat, and most of its work is done in getRecordWriter(FileSystem, Configuration, String, Progressable). However this is not the method that one must override to control how FileOutputFormat writes records; that's getRecordWriter(TaskAttemptContext). My hunch is that this used to work, but against the Hadoop 0.19.x APIs. (@Override is our friend!)
> I've attached a patch that I believe addresses this and along the way is able to clean things up slightly. Am I on track here?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (MAHOUT-614) org.apache.mahout.classifier.baytes.MultipleOutputFormat not working as intended with Hadoop 0.20?

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998102#comment-12998102 ] 

Hudson commented on MAHOUT-614:
-------------------------------

Integrated in Mahout-Quality #640 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/640/])
    MAHOUT-614 fix up overriding of Hadoop's FileOutputFormat


> org.apache.mahout.classifier.baytes.MultipleOutputFormat not working as intended with Hadoop 0.20?
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-614
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-614
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.4
>            Reporter: Sean Owen
>            Assignee: Robin Anil
>              Labels: hadoop, multiple, output
>             Fix For: 0.5
>
>         Attachments: MAHOUT-614.patch
>
>
> I believe there might be an error in org.apache.mahout.classifier.baytes.MultipleOutputFormat. It overrides the Hadoop class FileOutputFormat, and most of its work is done in getRecordWriter(FileSystem, Configuration, String, Progressable). However this is not the method that one must override to control how FileOutputFormat writes records; that's getRecordWriter(TaskAttemptContext). My hunch is that this used to work, but against the Hadoop 0.19.x APIs. (@Override is our friend!)
> I've attached a patch that I believe addresses this and along the way is able to clean things up slightly. Am I on track here?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira