You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@sdap.apache.org by "Nga Thien Chung (Jira)" <ji...@apache.org> on 2022/11/01 21:21:00 UTC

[jira] [Updated] (SDAP-406) Time series comparison stats issues

     [ https://issues.apache.org/jira/browse/SDAP-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nga Thien Chung updated SDAP-406:
---------------------------------
    Resolution: Fixed
        Status: Done  (was: To Do)

> Time series comparison stats issues
> -----------------------------------
>
>                 Key: SDAP-406
>                 URL: https://issues.apache.org/jira/browse/SDAP-406
>             Project: Apache Science Data Analytics Platform
>          Issue Type: Bug
>          Components: analysis
>            Reporter: Kevin Marlis
>            Priority: Major
>
> {*}In short{*}: the time series comparison stats only compute the linear regression for the results that have sync'd up times. ex: DS1 and DS2 are both monthly products, but DS1 data falls on the first of the month and DS2 falls on the middle of the month. With no matching times across the two datasets, none of the algorithm results data gets provided to the regression algorithm.
>  
> {*}In detail{*}: The issue is at this line: [https://github.com/apache/incubator-sdap-nexus/blob/22b10f661f02e4b8329e3973234b83b188133d8c/analysis/webservice/algorithms_spark/TimeSeriesSpark.py#L314]
> {{`xy`}} is appended to if there are 2 dictionaries of results in `{{{}item`{}}}. That only happens if there are two identical time values between the two datasets. The linear regression algorithm will return nans if x and y arrays only contain one value, which can be problematic downstream. The xs and ys for the regression never get appended to because the dates never sync up ({{{}if len(item) == 2{}}} is never satisfied). Empty comparison stats don't appear to cause an impact to the charts on the frontend.
>  
> *Possible fixes...*
>  * check if lin regression results are nan, if so set stats to empty dict
>  * Date normalization to make the time steps consistent across multiple datasets
>  
> For now we're going with the first option, although the second option could be looked into.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)