You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/03/23 13:56:25 UTC

[jira] [Updated] (SPARK-14095) LogisticRegression fails when a DataFrame has only a one-class label

     [ https://issues.apache.org/jira/browse/SPARK-14095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated SPARK-14095:
------------------------------
    Issue Type: Improvement  (was: Bug)

Calling this an improvement since it's mostly about a better exception message.

Can you inline your code here for safe-keeping? and show more of the error that occurs so we can see where the AIOOBE occurs?
At least the exception should be better. No, it's ugly to just patch over it by catching it.

The behavior should be fixed if possible for all binary classifiers. Although it's a degenerate case with a trivial solution (constant classifier), it's not nonsensical to build a model with this input. I could imagine rejecting it or just changing the code to handle this situation and try to proceed anyway.

> LogisticRegression fails when a DataFrame has only a one-class label
> --------------------------------------------------------------------
>
>                 Key: SPARK-14095
>                 URL: https://issues.apache.org/jira/browse/SPARK-14095
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 1.6.1
>            Reporter: Grzegorz Chilkiewicz
>            Priority: Minor
>
> The problem might look unimportant, but imagine assigning arbitrary weights to rows or unfortunate splitting on training&test datasets.
> Even if the problem is to be ignored, we should return more meaningful error than "java.lang.ArrayIndexOutOfBoundsException: 1"
> I can investigate the problem more deeply and prepare a PR to fix it, or just intercept the exception and throw a more explanatory one. What should I do?
> Code to reproduce the bug:
> https://github.com/grzegorz-chilkiewicz/OneNonZeroWeight



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org