You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Joe Prasanna Kumar (Updated) (JIRA)" <ji...@apache.org> on 2012/02/28 05:27:48 UTC

[jira] [Updated] (MAHOUT-985) MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights

     [ https://issues.apache.org/jira/browse/MAHOUT-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joe Prasanna Kumar updated MAHOUT-985:
--------------------------------------

    Attachment: MAHOUT-985.patch

Since the DenseVector and RandomAccessSparseVector dont have weight attributes, we'll currently discard the weights in ARFF files. This patch contains code to discard the weights specified for instances in ARFF files. 
                
> MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-985
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-985
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.5
>            Reporter: Dave Kor
>            Priority: Minor
>              Labels: Arff
>         Attachments: MAHOUT-985.patch
>
>
> When parsing an Arff file that contain instance-specific weights, MapBackedArffModel throws the following NPE exception. While I have only tested this in 0.5, I suspect this bug also occur in 0.6
> Exception in thread "main" java.lang.NullPointerException
>         at org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:87)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:75)
>         at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>         at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
>         at org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:159)
>         at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:127)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> The code works properly when all instance weights are set to the default value of 1. However when any instance has a non-default weight, such as in the sample Arff file below, the NPE occurs when MapBackedArffModel attempts to parse line 8. 
> -----
> @relation 'Test Mahout'
> @attribute Attr0 numeric
> @attribute Label {True,False}
> @data
> 0,False
> 1,True,{2}
> -----
> The reason is that in Weka, all data instances are assumed to have a default weight of 1 and this default weight is not saved in the Arff file. However when a data instance DOES NOT have the default weight of 1, the non-default instance weight is appended at the end of the line surrounded by curly braces. When MapBackedArffModel.getValue method tries to parse this weight as an attribute, typeMap.get(idx) returns a null ARFFtype as there is no such attribute, which results in an NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira