You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Dave Kor (Created) (JIRA)" <ji...@apache.org> on 2012/02/25 03:37:46 UTC

[jira] [Created] (MAHOUT-985) MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights

MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights
-------------------------------------------------------------------------

                 Key: MAHOUT-985
                 URL: https://issues.apache.org/jira/browse/MAHOUT-985
             Project: Mahout
          Issue Type: Bug
          Components: Integration
    Affects Versions: 0.5
            Reporter: Dave Kor
            Priority: Minor


When parsing an Arff file that contain instance-specific weights, MapBackedArffModel throws the following NPE exception. While I have only tested this in 0.5, I suspect this bug also occur in 0.6

Exception in thread "main" java.lang.NullPointerException
        at org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:87)
        at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:75)
        at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
        at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
        at org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:159)
        at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:127)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)

The code works properly when all instance weights are set to the default value of 1. However when any instance has a non-default weight, such as in the sample Arff file below, the NPE occurs when MapBackedArffModel attempts to parse line 8. 

-----
@relation 'Test Mahout'

@attribute Attr0 numeric
@attribute Label {True,False}

@data
0,False
1,True,{2}
-----

The reason is that in Weka, all data instances are assumed to have a default weight of 1 and this default weight is not saved in the Arff file. However when a data instance DOES NOT have the default weight of 1, the non-default instance weight is appended at the end of the line surrounded by curly braces. When MapBackedArffModel.getValue method tries to parse this weight as an attribute, typeMap.get(idx) returns a null ARFFtype as there is no such attribute, which results in an NPE. 



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-985) MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights

Posted by "Joe Prasanna Kumar (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joe Prasanna Kumar updated MAHOUT-985:
--------------------------------------

    Attachment: MAHOUT-985.patch

Since the DenseVector and RandomAccessSparseVector dont have weight attributes, we'll currently discard the weights in ARFF files. This patch contains code to discard the weights specified for instances in ARFF files. 
                
> MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-985
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-985
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.5
>            Reporter: Dave Kor
>            Priority: Minor
>              Labels: Arff
>         Attachments: MAHOUT-985.patch
>
>
> When parsing an Arff file that contain instance-specific weights, MapBackedArffModel throws the following NPE exception. While I have only tested this in 0.5, I suspect this bug also occur in 0.6
> Exception in thread "main" java.lang.NullPointerException
>         at org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:87)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:75)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>         at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
>         at org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:159)
>         at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:127)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> The code works properly when all instance weights are set to the default value of 1. However when any instance has a non-default weight, such as in the sample Arff file below, the NPE occurs when MapBackedArffModel attempts to parse line 8. 
> -----
> @relation 'Test Mahout'
> @attribute Attr0 numeric
> @attribute Label {True,False}
> @data
> 0,False
> 1,True,{2}
> -----
> The reason is that in Weka, all data instances are assumed to have a default weight of 1 and this default weight is not saved in the Arff file. However when a data instance DOES NOT have the default weight of 1, the non-default instance weight is appended at the end of the line surrounded by curly braces. When MapBackedArffModel.getValue method tries to parse this weight as an attribute, typeMap.get(idx) returns a null ARFFtype as there is no such attribute, which results in an NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-985) MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399369#comment-13399369 ] 

Hudson commented on MAHOUT-985:
-------------------------------

Integrated in Mahout-Quality #1556 (See [https://builds.apache.org/job/Mahout-Quality/1556/])
    MAHOUT-985 ignore ARFF instance weights, handle ? correctly (Revision 1352857)

     Result = SUCCESS
srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1352857
Files : 
* /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/arff/ARFFIterator.java
* /mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/arff/ARFFModel.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/utils/vectors/arff/ARFFVectorIterableTest.java

                
> MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-985
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-985
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.5
>            Reporter: Dave Kor
>            Assignee: Sean Owen
>            Priority: Minor
>              Labels: Arff
>             Fix For: 0.8
>
>         Attachments: MAHOUT-985.patch
>
>
> When parsing an Arff file that contain instance-specific weights, MapBackedArffModel throws the following NPE exception. While I have only tested this in 0.5, I suspect this bug also occur in 0.6
> Exception in thread "main" java.lang.NullPointerException
>         at org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:87)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:75)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>         at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
>         at org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:159)
>         at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:127)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> The code works properly when all instance weights are set to the default value of 1. However when any instance has a non-default weight, such as in the sample Arff file below, the NPE occurs when MapBackedArffModel attempts to parse line 8. 
> -----
> @relation 'Test Mahout'
> @attribute Attr0 numeric
> @attribute Label {True,False}
> @data
> 0,False
> 1,True,{2}
> -----
> The reason is that in Weka, all data instances are assumed to have a default weight of 1 and this default weight is not saved in the Arff file. However when a data instance DOES NOT have the default weight of 1, the non-default instance weight is appended at the end of the line surrounded by curly braces. When MapBackedArffModel.getValue method tries to parse this weight as an attribute, typeMap.get(idx) returns a null ARFFtype as there is no such attribute, which results in an NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAHOUT-985) MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-985.
------------------------------

       Resolution: Fixed
    Fix Version/s: 0.8
         Assignee: Sean Owen
    
> MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-985
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-985
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.5
>            Reporter: Dave Kor
>            Assignee: Sean Owen
>            Priority: Minor
>              Labels: Arff
>             Fix For: 0.8
>
>         Attachments: MAHOUT-985.patch
>
>
> When parsing an Arff file that contain instance-specific weights, MapBackedArffModel throws the following NPE exception. While I have only tested this in 0.5, I suspect this bug also occur in 0.6
> Exception in thread "main" java.lang.NullPointerException
>         at org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:87)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:75)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>         at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
>         at org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:159)
>         at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:127)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> The code works properly when all instance weights are set to the default value of 1. However when any instance has a non-default weight, such as in the sample Arff file below, the NPE occurs when MapBackedArffModel attempts to parse line 8. 
> -----
> @relation 'Test Mahout'
> @attribute Attr0 numeric
> @attribute Label {True,False}
> @data
> 0,False
> 1,True,{2}
> -----
> The reason is that in Weka, all data instances are assumed to have a default weight of 1 and this default weight is not saved in the Arff file. However when a data instance DOES NOT have the default weight of 1, the non-default instance weight is appended at the end of the line surrounded by curly braces. When MapBackedArffModel.getValue method tries to parse this weight as an attribute, typeMap.get(idx) returns a null ARFFtype as there is no such attribute, which results in an NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-985) MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights

Posted by "Dave Kor (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216709#comment-13216709 ] 

Dave Kor commented on MAHOUT-985:
---------------------------------

This is best answered from a machine learning perspective. Instance weights are used for quite a wide variety of ML related tasks and that is why Weka supports it. Examples include:

(A) Resampling Datasets
Sometimes, there is a need to get the machine learning algorithm to focus on specific parts of the dataset. For example, when the dataset is imbalanced and the class label you are interested is swamped by a huge number of uninteresting instances (in other words, the proverbial needle in a haystack problem). Most techniques for handling such cases involve some form of careful resampling, either boosting the weightage of instances that have the desired class label, or down-weighting the unwanted instances, or both.

(B) Smoothing or Regularization (See http://en.wikipedia.org/wiki/Regularization_(mathematics) )
Some methods of Bayesian learning often take into consideration the prior distribution of labels when training a model and the simpler ways of introducing a prior is apply them as instance weights. Algorithms that can make use of instance weights include Naive Bayes, K-Means, Logistic Regression, Expectation Maximization/Gradient Descent/Conjugate Gradient, Nearest Neighbor, AdaBoost and many more. 

These are the two main uses of instance weighting I can remember off the top of my mind. I'm sure there are a few more uses that I have missed out. As to how the weights are used, it is different from algorithm to algorithm and not all algorithms will make use of instance weights. In Weka, the algorithms that to take advantage of instance weights all implement weka.core.WeightedInstanceHandler. Weka algorithms that do not implement WeightedInstanceHandler simply assume the weights don't exist. For your reference, you can see the list of algorithms at http://weka.sourceforge.net/doc.dev/weka/core/WeightedInstancesHandler.html

As for Mahout, I am really not in a position to say as I have only started evaluating Mahout this week. The easy way out is simply to make sure MapBackedArffModel is able to successfully parse Arff files that contain weights and throw these weights away. However, it would be good if the weights can be passed on to Mahout's algorithms and let them have a chance to use the weights if the algorithm so desires. 

I hope this helps.

                
> MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-985
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-985
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.5
>            Reporter: Dave Kor
>            Priority: Minor
>              Labels: Arff
>
> When parsing an Arff file that contain instance-specific weights, MapBackedArffModel throws the following NPE exception. While I have only tested this in 0.5, I suspect this bug also occur in 0.6
> Exception in thread "main" java.lang.NullPointerException
>         at org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:87)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:75)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>         at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
>         at org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:159)
>         at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:127)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> The code works properly when all instance weights are set to the default value of 1. However when any instance has a non-default weight, such as in the sample Arff file below, the NPE occurs when MapBackedArffModel attempts to parse line 8. 
> -----
> @relation 'Test Mahout'
> @attribute Attr0 numeric
> @attribute Label {True,False}
> @data
> 0,False
> 1,True,{2}
> -----
> The reason is that in Weka, all data instances are assumed to have a default weight of 1 and this default weight is not saved in the Arff file. However when a data instance DOES NOT have the default weight of 1, the non-default instance weight is appended at the end of the line surrounded by curly braces. When MapBackedArffModel.getValue method tries to parse this weight as an attribute, typeMap.get(idx) returns a null ARFFtype as there is no such attribute, which results in an NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-985) MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399274#comment-13399274 ] 

Sean Owen commented on MAHOUT-985:
----------------------------------

I want to commit this. It seems to fail in the dense vector case. I'm guessing that, in that case, the 'split' items need to be trimmed, and need to be checked for "?", right?
                
> MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-985
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-985
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.5
>            Reporter: Dave Kor
>            Priority: Minor
>              Labels: Arff
>         Attachments: MAHOUT-985.patch
>
>
> When parsing an Arff file that contain instance-specific weights, MapBackedArffModel throws the following NPE exception. While I have only tested this in 0.5, I suspect this bug also occur in 0.6
> Exception in thread "main" java.lang.NullPointerException
>         at org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:87)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:75)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>         at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
>         at org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:159)
>         at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:127)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> The code works properly when all instance weights are set to the default value of 1. However when any instance has a non-default weight, such as in the sample Arff file below, the NPE occurs when MapBackedArffModel attempts to parse line 8. 
> -----
> @relation 'Test Mahout'
> @attribute Attr0 numeric
> @attribute Label {True,False}
> @data
> 0,False
> 1,True,{2}
> -----
> The reason is that in Weka, all data instances are assumed to have a default weight of 1 and this default weight is not saved in the Arff file. However when a data instance DOES NOT have the default weight of 1, the non-default instance weight is appended at the end of the line surrounded by curly braces. When MapBackedArffModel.getValue method tries to parse this weight as an attribute, typeMap.get(idx) returns a null ARFFtype as there is no such attribute, which results in an NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-985) MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights

Posted by "Joe Prasanna Kumar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216614#comment-13216614 ] 

Joe Prasanna Kumar commented on MAHOUT-985:
-------------------------------------------

Dave,

I am trying to understand the implication of weights when we convert an ARFF to one of the mahout vectors. Should we normalize the vector values when there is a weight specified for an instance ? In Weka, When you add weights to an instance, how does it influence the actual instance values ? 
Depending on the usage of weight on ARFF instances, we can handle the situation in mahout. Please let me know your thoughts.

Joe. 
                
> MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-985
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-985
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.5
>            Reporter: Dave Kor
>            Priority: Minor
>              Labels: Arff
>
> When parsing an Arff file that contain instance-specific weights, MapBackedArffModel throws the following NPE exception. While I have only tested this in 0.5, I suspect this bug also occur in 0.6
> Exception in thread "main" java.lang.NullPointerException
>         at org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:87)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:75)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>         at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
>         at org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:159)
>         at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:127)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> The code works properly when all instance weights are set to the default value of 1. However when any instance has a non-default weight, such as in the sample Arff file below, the NPE occurs when MapBackedArffModel attempts to parse line 8. 
> -----
> @relation 'Test Mahout'
> @attribute Attr0 numeric
> @attribute Label {True,False}
> @data
> 0,False
> 1,True,{2}
> -----
> The reason is that in Weka, all data instances are assumed to have a default weight of 1 and this default weight is not saved in the Arff file. However when a data instance DOES NOT have the default weight of 1, the non-default instance weight is appended at the end of the line surrounded by curly braces. When MapBackedArffModel.getValue method tries to parse this weight as an attribute, typeMap.get(idx) returns a null ARFFtype as there is no such attribute, which results in an NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira