You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jack Tanner (JIRA)" <ji...@apache.org> on 2011/09/02 07:27:09 UTC

[jira] [Created] (MAHOUT-799) Cannot run SequenceFilesFromCsvFilter, ever

Cannot run SequenceFilesFromCsvFilter, ever
-------------------------------------------

                 Key: MAHOUT-799
                 URL: https://issues.apache.org/jira/browse/MAHOUT-799
             Project: Mahout
          Issue Type: Bug
          Components: Examples
    Affects Versions: 0.5, 0.6
            Reporter: Jack Tanner


As described here:

http://mail-archives.apache.org/mod_mbox/mahout-user/201106.mbox/%3C4DED5DCD.6050107@gmail.com%3E

SequenceFilesFromCsvFilter cannot be invoked with default parameter values, because it dies like so:

bin/mahout seqdirectory -i input -o output -filter 
org.apache.mahout.text.SequenceFilesFromCsvFilter

...
Caused by: java.lang.NumberFormatException: null
     at java.lang.Integer.parseInt(Integer.java:417)
     at java.lang.Integer.parseInt(Integer.java:499)
     at org.apache.mahout.text.SequenceFilesFromCsvFilter.<init>(SequenceFilesFromCsvFilter.java:56)

If one adds the parameters -kcol 0 -vcol 0 (or their long-form versions), it dies like so:

Unexpected -kcol while processing Job-Specific Options

Commenting out SequenceFilesFromCsvFilter:56 and SequenceFilesFromCsvFilter:57, like so, allows the run to proceed

//    this.keyColumn = Integer.parseInt(options.get(KEY_COLUMN_OPTION[0]));
//    this.valueColumn = Integer.parseInt(options.get(VALUE_COLUMN_OPTION[0]));


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-799) Cannot run SequenceFilesFromCsvFilter, ever

Posted by "Sean Owen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-799:
-----------------------------

    Attachment: MAHOUT-799.patch

Hmm, the author didn't follow up. As far as I can tell, the -filter option should never have been added. The only subclass was not written to work as an 'argument', but only as a command-line program. My best fix is to just remove it. You can use this, still, by running it directly as the command-line program.
                
> Cannot run SequenceFilesFromCsvFilter, ever
> -------------------------------------------
>
>                 Key: MAHOUT-799
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-799
>             Project: Mahout
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.5
>            Reporter: Jack Tanner
>            Assignee: Sean Owen
>             Fix For: 0.6
>
>         Attachments: MAHOUT-799.patch
>
>
> As described here:
> http://mail-archives.apache.org/mod_mbox/mahout-user/201106.mbox/%3C4DED5DCD.6050107@gmail.com%3E
> SequenceFilesFromCsvFilter cannot be invoked with default parameter values, because it dies like so:
> bin/mahout seqdirectory -i input -o output -filter 
> org.apache.mahout.text.SequenceFilesFromCsvFilter
> ...
> Caused by: java.lang.NumberFormatException: null
>      at java.lang.Integer.parseInt(Integer.java:417)
>      at java.lang.Integer.parseInt(Integer.java:499)
>      at org.apache.mahout.text.SequenceFilesFromCsvFilter.<init>(SequenceFilesFromCsvFilter.java:56)
> If one adds the parameters -kcol 0 -vcol 0 (or their long-form versions), it dies like so:
> Unexpected -kcol while processing Job-Specific Options
> Commenting out SequenceFilesFromCsvFilter:56 and SequenceFilesFromCsvFilter:57, like so, allows the run to proceed
> //    this.keyColumn = Integer.parseInt(options.get(KEY_COLUMN_OPTION[0]));
> //    this.valueColumn = Integer.parseInt(options.get(VALUE_COLUMN_OPTION[0]));

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-799) Cannot run SequenceFilesFromCsvFilter, ever

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116865#comment-13116865 ] 

Hudson commented on MAHOUT-799:
-------------------------------

Integrated in Mahout-Quality #1068 (See [https://builds.apache.org/job/Mahout-Quality/1068/])
    MAHOUT-799 remove CSV filter that wasn't working

srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177027
Files : 
* /mahout/trunk/integration/src/main/java/org/apache/mahout/text/PrefixAdditionFilter.java
* /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromCsvFilter.java
* /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromDirectory.java
* /mahout/trunk/integration/src/main/java/org/apache/mahout/text/SequenceFilesFromDirectoryFilter.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/text/TestSequenceFilesFromDirectory.java

                
> Cannot run SequenceFilesFromCsvFilter, ever
> -------------------------------------------
>
>                 Key: MAHOUT-799
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-799
>             Project: Mahout
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.5
>            Reporter: Jack Tanner
>            Assignee: Sean Owen
>             Fix For: 0.6
>
>         Attachments: MAHOUT-799.patch, MAHOUT-799.patch
>
>
> As described here:
> http://mail-archives.apache.org/mod_mbox/mahout-user/201106.mbox/%3C4DED5DCD.6050107@gmail.com%3E
> SequenceFilesFromCsvFilter cannot be invoked with default parameter values, because it dies like so:
> bin/mahout seqdirectory -i input -o output -filter 
> org.apache.mahout.text.SequenceFilesFromCsvFilter
> ...
> Caused by: java.lang.NumberFormatException: null
>      at java.lang.Integer.parseInt(Integer.java:417)
>      at java.lang.Integer.parseInt(Integer.java:499)
>      at org.apache.mahout.text.SequenceFilesFromCsvFilter.<init>(SequenceFilesFromCsvFilter.java:56)
> If one adds the parameters -kcol 0 -vcol 0 (or their long-form versions), it dies like so:
> Unexpected -kcol while processing Job-Specific Options
> Commenting out SequenceFilesFromCsvFilter:56 and SequenceFilesFromCsvFilter:57, like so, allows the run to proceed
> //    this.keyColumn = Integer.parseInt(options.get(KEY_COLUMN_OPTION[0]));
> //    this.valueColumn = Integer.parseInt(options.get(VALUE_COLUMN_OPTION[0]));

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-799) Cannot run SequenceFilesFromCsvFilter, ever

Posted by "Sean Owen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-799:
-----------------------------

        Fix Version/s: 0.6
             Assignee: Sean Owen
    Affects Version/s:     (was: 0.6)
               Status: Patch Available  (was: Open)
    
> Cannot run SequenceFilesFromCsvFilter, ever
> -------------------------------------------
>
>                 Key: MAHOUT-799
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-799
>             Project: Mahout
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.5
>            Reporter: Jack Tanner
>            Assignee: Sean Owen
>             Fix For: 0.6
>
>
> As described here:
> http://mail-archives.apache.org/mod_mbox/mahout-user/201106.mbox/%3C4DED5DCD.6050107@gmail.com%3E
> SequenceFilesFromCsvFilter cannot be invoked with default parameter values, because it dies like so:
> bin/mahout seqdirectory -i input -o output -filter 
> org.apache.mahout.text.SequenceFilesFromCsvFilter
> ...
> Caused by: java.lang.NumberFormatException: null
>      at java.lang.Integer.parseInt(Integer.java:417)
>      at java.lang.Integer.parseInt(Integer.java:499)
>      at org.apache.mahout.text.SequenceFilesFromCsvFilter.<init>(SequenceFilesFromCsvFilter.java:56)
> If one adds the parameters -kcol 0 -vcol 0 (or their long-form versions), it dies like so:
> Unexpected -kcol while processing Job-Specific Options
> Commenting out SequenceFilesFromCsvFilter:56 and SequenceFilesFromCsvFilter:57, like so, allows the run to proceed
> //    this.keyColumn = Integer.parseInt(options.get(KEY_COLUMN_OPTION[0]));
> //    this.valueColumn = Integer.parseInt(options.get(VALUE_COLUMN_OPTION[0]));

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-799) Cannot run SequenceFilesFromCsvFilter, ever

Posted by "Sean Owen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-799:
-----------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)
    
> Cannot run SequenceFilesFromCsvFilter, ever
> -------------------------------------------
>
>                 Key: MAHOUT-799
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-799
>             Project: Mahout
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.5
>            Reporter: Jack Tanner
>            Assignee: Sean Owen
>             Fix For: 0.6
>
>         Attachments: MAHOUT-799.patch, MAHOUT-799.patch
>
>
> As described here:
> http://mail-archives.apache.org/mod_mbox/mahout-user/201106.mbox/%3C4DED5DCD.6050107@gmail.com%3E
> SequenceFilesFromCsvFilter cannot be invoked with default parameter values, because it dies like so:
> bin/mahout seqdirectory -i input -o output -filter 
> org.apache.mahout.text.SequenceFilesFromCsvFilter
> ...
> Caused by: java.lang.NumberFormatException: null
>      at java.lang.Integer.parseInt(Integer.java:417)
>      at java.lang.Integer.parseInt(Integer.java:499)
>      at org.apache.mahout.text.SequenceFilesFromCsvFilter.<init>(SequenceFilesFromCsvFilter.java:56)
> If one adds the parameters -kcol 0 -vcol 0 (or their long-form versions), it dies like so:
> Unexpected -kcol while processing Job-Specific Options
> Commenting out SequenceFilesFromCsvFilter:56 and SequenceFilesFromCsvFilter:57, like so, allows the run to proceed
> //    this.keyColumn = Integer.parseInt(options.get(KEY_COLUMN_OPTION[0]));
> //    this.valueColumn = Integer.parseInt(options.get(VALUE_COLUMN_OPTION[0]));

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-799) Cannot run SequenceFilesFromCsvFilter, ever

Posted by "Jack Tanner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096032#comment-13096032 ] 

Jack Tanner commented on MAHOUT-799:
------------------------------------

To avoid having to build the cross-wiring, you could just detect this execution pattern and exit with a message that explains the proper command-line use.

Which begs the question, how does one run it correctly from the command line?

> Cannot run SequenceFilesFromCsvFilter, ever
> -------------------------------------------
>
>                 Key: MAHOUT-799
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-799
>             Project: Mahout
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.5, 0.6
>            Reporter: Jack Tanner
>
> As described here:
> http://mail-archives.apache.org/mod_mbox/mahout-user/201106.mbox/%3C4DED5DCD.6050107@gmail.com%3E
> SequenceFilesFromCsvFilter cannot be invoked with default parameter values, because it dies like so:
> bin/mahout seqdirectory -i input -o output -filter 
> org.apache.mahout.text.SequenceFilesFromCsvFilter
> ...
> Caused by: java.lang.NumberFormatException: null
>      at java.lang.Integer.parseInt(Integer.java:417)
>      at java.lang.Integer.parseInt(Integer.java:499)
>      at org.apache.mahout.text.SequenceFilesFromCsvFilter.<init>(SequenceFilesFromCsvFilter.java:56)
> If one adds the parameters -kcol 0 -vcol 0 (or their long-form versions), it dies like so:
> Unexpected -kcol while processing Job-Specific Options
> Commenting out SequenceFilesFromCsvFilter:56 and SequenceFilesFromCsvFilter:57, like so, allows the run to proceed
> //    this.keyColumn = Integer.parseInt(options.get(KEY_COLUMN_OPTION[0]));
> //    this.valueColumn = Integer.parseInt(options.get(VALUE_COLUMN_OPTION[0]));

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-799) Cannot run SequenceFilesFromCsvFilter, ever

Posted by "Sean Owen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-799:
-----------------------------

    Attachment: MAHOUT-799.patch

OK, different answer: I don't think the CSV filter can be 'saved'. I'm unable to make it work once I shake out the rest of the knock-on issues, as it's currently designed. Instead of removing -filter, I think we should just remove this implementation, tidy up a bit, and leave the integration point for someone to try again. Here's a new patch that removes it and tidies up instead. I don't actually think it's controversial since 1) it doesn't work now and 2) didn't actually read CSV data to begin with!
                
> Cannot run SequenceFilesFromCsvFilter, ever
> -------------------------------------------
>
>                 Key: MAHOUT-799
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-799
>             Project: Mahout
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.5
>            Reporter: Jack Tanner
>            Assignee: Sean Owen
>             Fix For: 0.6
>
>         Attachments: MAHOUT-799.patch, MAHOUT-799.patch
>
>
> As described here:
> http://mail-archives.apache.org/mod_mbox/mahout-user/201106.mbox/%3C4DED5DCD.6050107@gmail.com%3E
> SequenceFilesFromCsvFilter cannot be invoked with default parameter values, because it dies like so:
> bin/mahout seqdirectory -i input -o output -filter 
> org.apache.mahout.text.SequenceFilesFromCsvFilter
> ...
> Caused by: java.lang.NumberFormatException: null
>      at java.lang.Integer.parseInt(Integer.java:417)
>      at java.lang.Integer.parseInt(Integer.java:499)
>      at org.apache.mahout.text.SequenceFilesFromCsvFilter.<init>(SequenceFilesFromCsvFilter.java:56)
> If one adds the parameters -kcol 0 -vcol 0 (or their long-form versions), it dies like so:
> Unexpected -kcol while processing Job-Specific Options
> Commenting out SequenceFilesFromCsvFilter:56 and SequenceFilesFromCsvFilter:57, like so, allows the run to proceed
> //    this.keyColumn = Integer.parseInt(options.get(KEY_COLUMN_OPTION[0]));
> //    this.valueColumn = Integer.parseInt(options.get(VALUE_COLUMN_OPTION[0]));

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-799) Cannot run SequenceFilesFromCsvFilter, ever

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095857#comment-13095857 ] 

Sean Owen commented on MAHOUT-799:
----------------------------------

I'm also confused, reading this. SequenceFilesFromCsvFilter works when run as a command-line program. But when used this way it never adds its options to the command line and can't work. Was this the intent of the design? seems like there needs to be additional cross-wiring for these filters to participate in the command line.

> Cannot run SequenceFilesFromCsvFilter, ever
> -------------------------------------------
>
>                 Key: MAHOUT-799
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-799
>             Project: Mahout
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.5, 0.6
>            Reporter: Jack Tanner
>
> As described here:
> http://mail-archives.apache.org/mod_mbox/mahout-user/201106.mbox/%3C4DED5DCD.6050107@gmail.com%3E
> SequenceFilesFromCsvFilter cannot be invoked with default parameter values, because it dies like so:
> bin/mahout seqdirectory -i input -o output -filter 
> org.apache.mahout.text.SequenceFilesFromCsvFilter
> ...
> Caused by: java.lang.NumberFormatException: null
>      at java.lang.Integer.parseInt(Integer.java:417)
>      at java.lang.Integer.parseInt(Integer.java:499)
>      at org.apache.mahout.text.SequenceFilesFromCsvFilter.<init>(SequenceFilesFromCsvFilter.java:56)
> If one adds the parameters -kcol 0 -vcol 0 (or their long-form versions), it dies like so:
> Unexpected -kcol while processing Job-Specific Options
> Commenting out SequenceFilesFromCsvFilter:56 and SequenceFilesFromCsvFilter:57, like so, allows the run to proceed
> //    this.keyColumn = Integer.parseInt(options.get(KEY_COLUMN_OPTION[0]));
> //    this.valueColumn = Integer.parseInt(options.get(VALUE_COLUMN_OPTION[0]));

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-799) Cannot run SequenceFilesFromCsvFilter, ever

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096044#comment-13096044 ] 

Sean Owen commented on MAHOUT-799:
----------------------------------

Right now there is no proper command-line use it seems. I don't know what was intended here. Who wrote this bit? not clear from the SVN logs.

> Cannot run SequenceFilesFromCsvFilter, ever
> -------------------------------------------
>
>                 Key: MAHOUT-799
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-799
>             Project: Mahout
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.5, 0.6
>            Reporter: Jack Tanner
>
> As described here:
> http://mail-archives.apache.org/mod_mbox/mahout-user/201106.mbox/%3C4DED5DCD.6050107@gmail.com%3E
> SequenceFilesFromCsvFilter cannot be invoked with default parameter values, because it dies like so:
> bin/mahout seqdirectory -i input -o output -filter 
> org.apache.mahout.text.SequenceFilesFromCsvFilter
> ...
> Caused by: java.lang.NumberFormatException: null
>      at java.lang.Integer.parseInt(Integer.java:417)
>      at java.lang.Integer.parseInt(Integer.java:499)
>      at org.apache.mahout.text.SequenceFilesFromCsvFilter.<init>(SequenceFilesFromCsvFilter.java:56)
> If one adds the parameters -kcol 0 -vcol 0 (or their long-form versions), it dies like so:
> Unexpected -kcol while processing Job-Specific Options
> Commenting out SequenceFilesFromCsvFilter:56 and SequenceFilesFromCsvFilter:57, like so, allows the run to proceed
> //    this.keyColumn = Integer.parseInt(options.get(KEY_COLUMN_OPTION[0]));
> //    this.valueColumn = Integer.parseInt(options.get(VALUE_COLUMN_OPTION[0]));

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira