You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samoa.apache.org by "Gianmarco De Francisci Morales (JIRA)" <ji...@apache.org> on 2015/09/09 14:14:45 UTC

[jira] [Created] (SAMOA-44) NPE when running VHT on KDD cup data

Gianmarco De Francisci Morales created SAMOA-44:
---------------------------------------------------

             Summary: NPE when running VHT on KDD cup data
                 Key: SAMOA-44
                 URL: https://issues.apache.org/jira/browse/SAMOA-44
             Project: SAMOA
          Issue Type: Bug
          Components: SAMOA-API
            Reporter: Gianmarco De Francisci Morales


>From the mailing list:

We were able to run HoeffdingTree Algorithm on the KDD Cup 99 (both on kddcup_full.arff, kddcup_10_percent.arff) data set. VerticalHoeffdingTree classifier also works fine on kddcup_10_percent.arff. However, when we try to run the VerticalHoeffdingTree classifier on kddcup_full.arff, we got the following error: 

The command we use to run SAMOA Local:

bin/samoa local target/SAMOA-Local-0.3.0-SNAPSHOT.jar "PrequentialEvaluation -i -1 -f 41920 -l (com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s (com.yahoo.labs.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)"

The console output of samoa:

bin/samoa
Deploying to LOCAL
Command line string =  PrequentialEvaluation -i -1 -f 41920 -l (com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s (com.yahoo.labs.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)
2015-09-01 22:22:16,160 [main] INFO  com.yahoo.labs.samoa.LocalDoTask (LocalDoTask.java:79) - Successfully instantiating com.yahoo.labs.samoa.tasks.PrequentialEvaluation
2015-09-01 22:22:17,741 [main] INFO  com.yahoo.labs.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:86) - 1 seconds for 41920 instances
2015-09-01 22:22:17,760 [main] INFO  com.yahoo.labs.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:172) - evaluation instances = 41,920
classified instances = 41,920
classifications correct (percent) = 99.988
Kappa Statistic (percent) = -0.002
Kappa Temporal Statistic (percent) = 28.571
Exception in thread "main" java.lang.NullPointerException
	at com.yahoo.labs.samoa.learners.classifiers.trees.ModelAggregatorProcessor.process(ModelAggregatorProcessor.java:145)
	at com.yahoo.labs.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84)
	at com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:71)
	at com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:60)
	at com.yahoo.labs.samoa.learners.classifiers.trees.FilterProcessor.process(FilterProcessor.java:95)
	at com.yahoo.labs.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84)
	at com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:71)
	at com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:60)
	at com.yahoo.labs.samoa.topology.LocalEntranceProcessingItem.injectNextEvent(LocalEntranceProcessingItem.java:46)
	at com.yahoo.labs.samoa.topology.LocalEntranceProcessingItem.startSendingEvents(LocalEntranceProcessingItem.java:66)
	at com.yahoo.labs.samoa.topology.impl.SimpleTopology.run(SimpleTopology.java:42)
	at com.yahoo.labs.samoa.topology.impl.SimpleEngine.submitTopology(SimpleEngine.java:33)
	at com.yahoo.labs.samoa.LocalDoTask.main(LocalDoTask.java:87)


We were able to track down the problem to the first instance that causes it; the instance is on the 76426th line in kddcup_full.arff. The instance is as follows:

1,tcp,smtp,SF,2252,331,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,7,0,0,0,0,1,0,1,5,216,1,0,0.2,0.01,0,0,0,0,normal

We haven’t noticed any differences between the problematic instance and the other instances. Could you lead us to the root of the problem and could you help us on how to overcome this problem?

As a workaround we’ve made the following addition to ModelAggregatorProcessor.java
if (leafNode == null)
         return false;

after the line 

ActiveLearningNode leafNode = (ActiveLearningNode) foundNode.getNode();

Now, also VeriticalHoeffdingTree Classifier works fine on kddcup_full.arff. Is this solution acceptable for the problem, what do you think?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)