You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Kürşat Kurt <ku...@kursatkurt.com> on 2016/09/30 20:52:05 UTC

SVM classification problem.

Hi;

 

I am trying to train and predict with the same set. I expect that accuracy
shuld be %100, am i wrong?

If i try to predict with the same set; it is failing, also it classifies
like "-1" which is not in the training set.

What is wrong with this code?

 

Code:

def main(args: Array[String]): Unit = {

    val env = ExecutionEnvironment.getExecutionEnvironment

    val training = Seq(

      new LabeledVector(1.0, new SparseVector(10, Array(0, 2, 3), Array(1.0,
1.0, 1.0))),

      new LabeledVector(1.0, new SparseVector(10, Array(0, 1, 5, 9),
Array(1.0, 1.0, 1.0, 1.0))),

      new LabeledVector(0.0, new SparseVector(10, Array(0, 2), Array(0.0,
1.0))),

      new LabeledVector(0.0, new SparseVector(10, Array(0), Array(0.0))),

      new LabeledVector(0.0, new SparseVector(10, Array(0, 2), Array(0.0,
1.0))),

      new LabeledVector(0.0, new SparseVector(10, Array(0), Array(0.0))))

 

    val trainingDS = env.fromCollection(training)

    val testingDS = env.fromCollection(training)

    val svm = new SVM().setBlocks(env.getParallelism)

    svm.fit(trainingDS)

    val predictions = svm.evaluate(testingDS.map(x => (x.vector, x.label)))

    predictions.print()

    

  }

 

Output:

(1.0,1.0)

(1.0,1.0)

(0.0,1.0)

(0.0,-1.0)

(0.0,1.0)

(0.0,-1.0)

Re: SVM classification problem.

Posted by Simone Robutti <si...@radicalbit.io>.

No, you don't get 100% accurracy in this case. You don't even want that, it
would be a severe case of overfitting. You would have that only in the case
that your dataset is linearly separable or separable with a finely tuned
kernel, but in that case SVM would be an overkill and more traditional
methodologies would suffice.

Flink SVM's implementation for binary classification returns "-1" as
default label for the "negative" class. It's a rather raw implementation so
it's better to use it exclusively if you have a clear idea of the
underlying process, otherwise you could have problems if you treat it as a
black box like you would do with more mature ML libraries.

2016-09-30 22:52 GMT+02:00 Kürşat Kurt <ku...@kursatkurt.com>:

> Hi;
>
>
>
> I am trying to train and predict with the same set. I expect that accuracy
> shuld be %100, am i wrong?
>
> If i try to predict with the same set; it is failing, also it classifies
> like “-1” which is not in the training set.
>
> What is wrong with this code?
>
>
>
> *Code:*
>
> *def* main(args: Array[String]): Unit = {
>
>     *val* env = ExecutionEnvironment.getExecutionEnvironment
>
>     *val* training = Seq(
>
>       *new* LabeledVector(1.0, *new* SparseVector(10, Array(0, 2, 3),
> Array(1.0, 1.0, 1.0))),
>
>       *new* LabeledVector(1.0, *new* SparseVector(10, Array(0, 1, 5, 9),
> Array(1.0, 1.0, 1.0, 1.0))),
>
>       *new* LabeledVector(0.0, *new* SparseVector(10, Array(0, 2), Array(
> 0.0, 1.0))),
>
>       *new* LabeledVector(0.0, *new* SparseVector(10, Array(0), Array(0.0
> ))),
>
>       *new* LabeledVector(0.0, *new* SparseVector(10, Array(0, 2), Array(
> 0.0, 1.0))),
>
>       *new* LabeledVector(0.0, *new* SparseVector(10, Array(0), Array(0.0
> ))))
>
>
>
>     *val* trainingDS = env.fromCollection(training)
>
>     *val* testingDS = env.fromCollection(training)
>
>     *val* svm = *new* SVM().setBlocks(env.getParallelism)
>
>     svm.fit(trainingDS)
>
>     *val* predictions = svm.evaluate(testingDS.map(x => (x.vector, x.label
> )))
>
>     predictions.print()
>
>
>
>   }
>
>
>
> *Output:*
>
> (1.0,1.0)
>
> (1.0,1.0)
>
> (0.0,1.0)
>
> (0.0,-1.0)
>
> (0.0,1.0)
>
> (0.0,-1.0)
>