You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Joe Prasanna Kumar (Commented) (JIRA)" <ji...@apache.org> on 2012/02/26 03:09:48 UTC
[jira] [Commented] (MAHOUT-985) MapBackedArffModel Unable To Parse
ARFF Files Containing Instance Weights
[ https://issues.apache.org/jira/browse/MAHOUT-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216614#comment-13216614 ]
Joe Prasanna Kumar commented on MAHOUT-985:
-------------------------------------------
Dave,
I am trying to understand the implication of weights when we convert an ARFF to one of the mahout vectors. Should we normalize the vector values when there is a weight specified for an instance ? In Weka, When you add weights to an instance, how does it influence the actual instance values ?
Depending on the usage of weight on ARFF instances, we can handle the situation in mahout. Please let me know your thoughts.
Joe.
> MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights
> -------------------------------------------------------------------------
>
> Key: MAHOUT-985
> URL: https://issues.apache.org/jira/browse/MAHOUT-985
> Project: Mahout
> Issue Type: Bug
> Components: Integration
> Affects Versions: 0.5
> Reporter: Dave Kor
> Priority: Minor
> Labels: Arff
>
> When parsing an Arff file that contain instance-specific weights, MapBackedArffModel throws the following NPE exception. While I have only tested this in 0.5, I suspect this bug also occur in 0.6
> Exception in thread "main" java.lang.NullPointerException
> at org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:87)
> at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:75)
> at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
> at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
> at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
> at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
> at org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:159)
> at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:127)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> The code works properly when all instance weights are set to the default value of 1. However when any instance has a non-default weight, such as in the sample Arff file below, the NPE occurs when MapBackedArffModel attempts to parse line 8.
> -----
> @relation 'Test Mahout'
> @attribute Attr0 numeric
> @attribute Label {True,False}
> @data
> 0,False
> 1,True,{2}
> -----
> The reason is that in Weka, all data instances are assumed to have a default weight of 1 and this default weight is not saved in the Arff file. However when a data instance DOES NOT have the default weight of 1, the non-default instance weight is appended at the end of the line surrounded by curly braces. When MapBackedArffModel.getValue method tries to parse this weight as an attribute, typeMap.get(idx) returns a null ARFFtype as there is no such attribute, which results in an NPE.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira