You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Suneel Marthi (Created) (JIRA)" <ji...@apache.org> on 2012/01/28 14:59:10 UTC

[jira] [Created] (MAHOUT-964) RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI

RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI
-------------------------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: MAHOUT-964
                 URL: https://issues.apache.org/jira/browse/MAHOUT-964
             Project: Mahout
          Issue Type: Improvement
          Components: Math
    Affects Versions: 0.6
         Environment: Mahout 0.6 snapshot from trunk
            Reporter: Suneel Marthi


1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassCastException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).

2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-964) RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI

Posted by "Suneel Marthi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-964:
---------------------------------

    Status: Open  (was: Patch Available)

Cancelling the patch, broke the unit tests; needs more testing.

Grant, could you please assign this to me?
                
> RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-964
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-964
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>         Environment: Mahout 0.6 snapshot from trunk
>            Reporter: Suneel Marthi
>         Attachments: Mahout-964.patch
>
>
> 1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassCastException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).
> 2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAHOUT-964) RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI

Posted by "Grant Ingersoll (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll resolved MAHOUT-964.
------------------------------------

    Resolution: Duplicate

Dup of MAHOUT-834
                
> RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-964
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-964
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>         Environment: Mahout 0.6 snapshot from trunk
>            Reporter: Suneel Marthi
>            Assignee: Suneel Marthi
>         Attachments: Mahout-964.patch
>
>
> 1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassNotFoundException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).
> 2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-964) RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI

Posted by "Suneel Marthi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-964:
---------------------------------

    Attachment: Mahout-964.patch
    
> RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-964
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-964
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>         Environment: Mahout 0.6 snapshot from trunk
>            Reporter: Suneel Marthi
>         Attachments: Mahout-964.patch
>
>
> 1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassCastException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).
> 2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-964) RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI

Posted by "Suneel Marthi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-964:
---------------------------------

    Status: Patch Available  (was: Open)

Submitting Patch to fix this issue.
                
> RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-964
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-964
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>         Environment: Mahout 0.6 snapshot from trunk
>            Reporter: Suneel Marthi
>         Attachments: Mahout-964.patch
>
>
> 1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassCastException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).
> 2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-964) RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI

Posted by "Suneel Marthi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-964:
---------------------------------

    Attachment: Mahout-964.patch
    
> RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-964
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-964
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>         Environment: Mahout 0.6 snapshot from trunk
>            Reporter: Suneel Marthi
>         Attachments: Mahout-964.patch
>
>
> 1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassCastException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).
> 2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-964) RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI

Posted by "Suneel Marthi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195571#comment-13195571 ] 

Suneel Marthi commented on MAHOUT-964:
--------------------------------------

Will do, sorry about this. I can upload a patch generated from the top level directory or would you like me to hold off for now.
                
> RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-964
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-964
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>         Environment: Mahout 0.6 snapshot from trunk
>            Reporter: Suneel Marthi
>         Attachments: Mahout-964.patch
>
>
> 1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassCastException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).
> 2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-964) RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI

Posted by "Suneel Marthi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-964:
---------------------------------

    Attachment:     (was: Mahout-964.patch)
    
> RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-964
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-964
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>         Environment: Mahout 0.6 snapshot from trunk
>            Reporter: Suneel Marthi
>         Attachments: Mahout-964.patch
>
>
> 1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassCastException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).
> 2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-964) RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI

Posted by "Suneel Marthi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-964:
---------------------------------

    Status: Patch Available  (was: Open)

Uploading Patch generated from the top level directory.
                
> RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-964
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-964
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>         Environment: Mahout 0.6 snapshot from trunk
>            Reporter: Suneel Marthi
>         Attachments: Mahout-964.patch
>
>
> 1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassCastException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).
> 2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (MAHOUT-964) RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI

Posted by "Grant Ingersoll (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned MAHOUT-964:
--------------------------------------

    Assignee: Suneel Marthi
    
> RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-964
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-964
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>         Environment: Mahout 0.6 snapshot from trunk
>            Reporter: Suneel Marthi
>            Assignee: Suneel Marthi
>         Attachments: Mahout-964.patch
>
>
> 1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassNotFoundException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).
> 2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-964) RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI

Posted by "Suneel Marthi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195931#comment-13195931 ] 

Suneel Marthi commented on MAHOUT-964:
--------------------------------------

Looking at the past Jira issues, this issue had already been reported before in Mahout-834 (https://issues.apache.org/jira/browse/MAHOUT-834) and the issue's still open. 

Grant, how would you like to proceed on this?
                
> RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-964
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-964
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>         Environment: Mahout 0.6 snapshot from trunk
>            Reporter: Suneel Marthi
>            Assignee: Suneel Marthi
>         Attachments: Mahout-964.patch
>
>
> 1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassNotFoundException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).
> 2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-964) RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI

Posted by "Grant Ingersoll (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195566#comment-13195566 ] 

Grant Ingersoll commented on MAHOUT-964:
----------------------------------------

Hey Suneel,

Content of patch is fine, but can you, going forward, generate your patch so that it can be applied from the top level directory?  This will make it easier to apply since we don't have to go chasing down what directory it applies in.

Thanks!
                
> RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-964
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-964
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>         Environment: Mahout 0.6 snapshot from trunk
>            Reporter: Suneel Marthi
>         Attachments: Mahout-964.patch
>
>
> 1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassCastException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).
> 2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-964) RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI

Posted by "Suneel Marthi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-964:
---------------------------------

    Description: 
1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassNotFoundException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).

2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

  was:
1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassCastException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).

2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

    
> RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-964
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-964
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>         Environment: Mahout 0.6 snapshot from trunk
>            Reporter: Suneel Marthi
>         Attachments: Mahout-964.patch
>
>
> 1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassNotFoundException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).
> 2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-964) RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI

Posted by "Suneel Marthi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-964:
---------------------------------

    Status: Open  (was: Patch Available)
    
> RowSimilarityJob should exit immediately if an invalid similarity measure specified and it would be nice to have an --overwrite option for the RowSimilarityJob CLI
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-964
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-964
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>         Environment: Mahout 0.6 snapshot from trunk
>            Reporter: Suneel Marthi
>         Attachments: Mahout-964.patch
>
>
> 1. If an invalid Similarity Measure has been specified as input to the RowSimilarityJob, it presently throws a ClassCastException but still proceeds with executing all of the subsequent tasks - VectorNormalizer, Cooccurrences Mapper and UnSymmetrify Mapper. We should exit the process early without having to invoke all of the subsequent tasks (all of them fail anyways).
> 2. It would be nice to have an --overwrite option for the Command line interface which would delete the temp and output paths at the beginning of RowSimilarityJob execution, similar to what's being done in seq2sparse, seqdirectory. If I run RowSimilarityJob over and over again with different similarity measures, I should not be forced to delete my temp and output paths first prior to invoking the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira