You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Joe Prasanna Kumar (Commented) (JIRA)" <ji...@apache.org> on 2012/02/26 03:09:48 UTC

[jira] [Commented] (MAHOUT-985) MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights

    [ https://issues.apache.org/jira/browse/MAHOUT-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216614#comment-13216614 ] 

Joe Prasanna Kumar commented on MAHOUT-985:
-------------------------------------------

Dave,

I am trying to understand the implication of weights when we convert an ARFF to one of the mahout vectors. Should we normalize the vector values when there is a weight specified for an instance ? In Weka, When you add weights to an instance, how does it influence the actual instance values ? 
Depending on the usage of weight on ARFF instances, we can handle the situation in mahout. Please let me know your thoughts.

Joe. 
                
> MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-985
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-985
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.5
>            Reporter: Dave Kor
>            Priority: Minor
>              Labels: Arff
>
> When parsing an Arff file that contain instance-specific weights, MapBackedArffModel throws the following NPE exception. While I have only tested this in 0.5, I suspect this bug also occur in 0.6
> Exception in thread "main" java.lang.NullPointerException
>         at org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:87)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:75)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>         at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
>         at org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:159)
>         at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:127)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> The code works properly when all instance weights are set to the default value of 1. However when any instance has a non-default weight, such as in the sample Arff file below, the NPE occurs when MapBackedArffModel attempts to parse line 8. 
> -----
> @relation 'Test Mahout'
> @attribute Attr0 numeric
> @attribute Label {True,False}
> @data
> 0,False
> 1,True,{2}
> -----
> The reason is that in Weka, all data instances are assumed to have a default weight of 1 and this default weight is not saved in the Arff file. However when a data instance DOES NOT have the default weight of 1, the non-default instance weight is appended at the end of the line surrounded by curly braces. When MapBackedArffModel.getValue method tries to parse this weight as an attribute, typeMap.get(idx) returns a null ARFFtype as there is no such attribute, which results in an NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira