You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alexey Zinoviev (Jira)" <ji...@apache.org> on 2020/09/29 10:24:00 UTC

[jira] [Updated] (IGNITE-12396) [ML] Random Forest generates NaN for a part of models on small datasets

     [ https://issues.apache.org/jira/browse/IGNITE-12396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Zinoviev updated IGNITE-12396:
-------------------------------------
    Affects Version/s:     (was: 2.8)

> [ML] Random Forest generates NaN for a part of models on small datasets
> -----------------------------------------------------------------------
>
>                 Key: IGNITE-12396
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12396
>             Project: Ignite
>          Issue Type: Bug
>          Components: ml
>            Reporter: Alexey Zinoviev
>            Assignee: Alexey Zinoviev
>            Priority: Major
>             Fix For: 2.10
>
>
> @Override public Double predict(Vector features) {
>  double[] predictions = new double[models.size()];
>  for (int i = 0; i < models.size(); i++)
>  predictions[i] = models.get(i).predict(features);
>  return predictionsAggregator.apply(predictions);
> }
>  
> predictionAggreagtor gets a lot of models and part of them returns null and it could be aggregated, first of all handle this in Aggregator (using threshold for amount of broken models before aggregation) also RandomForest trees should return Double.NaN - it should fail or throw message after the training
>  
> I've tested with 100 or 1000 rows and it fails and doesn't fail on 10 000 rows
>  
> RF generates a few models with one LEAF node with empty val (Double.NaN by default)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)