You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@climate.apache.org by "Ross Laidlaw (JIRA)" <ji...@apache.org> on 2014/10/28 00:52:35 UTC
[jira] [Comment Edited] (CLIMATE-341) Refactor "calcAnnualCycleMeans" metric from metrics_kyo.py

    [ https://issues.apache.org/jira/browse/CLIMATE-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185777#comment-14185777 ] 

Ross Laidlaw edited comment on CLIMATE-341 at 10/27/14 11:52 PM:
-----------------------------------------------------------------

I've spoken with [~mjoyce] and [~ploikith] to learn about this method.  This is my understanding of it (really I'm just repeating below what Mazi already said in his comment above):

Given a dataset with values at monthly increments spread over one or more complete years, this method will calculate the monthly means for the values.  It therefore assumes the input is of a specific structure with the number of times as multiples of 12.  Within the method, a temporary 4D numpy data structure (number of years x 12 x number of latitudes x number of longitudes) is created from the 3D numpy input (number of months x number of latitudes x number of longitudes).  The numpy 'mean' function is then called on the 4D array to produce a 3D (12 x number of latitudes x number of longitudes) result.

For example, if the dataset has four years of monthly data at 100 grid points (48 monthly timesteps, 100 latitudes and 100 longitudes), the size of the dataset's 3D values array will be 48 x 100 x 100 = 4800.  Within the original metric, this values array is copied and the copy is reshaped to a 4 dimensional array (4 x 12 x 100 x 100).  The numpy 'mean' function (using axis = 0) is then used to calculate the monthly means, returning a (12 x number of latitudes x number of longitudes) numpy data structure.

The discussion also drew out the following points:

* This is more of a dataset manipulation than a metric.  It produces an intermediate product that can then be used with metrics, for example an anomaly calculation metric (by subtracting the means from another set of values)
* This method could therefore be moved to dataset_processor.py (or utils.py as a temporary home)
* If the output of the method is a Dataset object containing the means, this could then be used with the metrics in the new design/architecture (e.g. 'Bias' or similar to calculate anomalies).
* In addition to monthly means, it might be useful to have a daily means calculation/option.


Here are some questions based on the above points:
 * If we return a Dataset object from the method, how do we populate the 'times' field?  This should be a one dimensional array of Python datetime objects.  There will be 12 values (one for each month), but I think year and day of the month are required when creating datetime objects.  Should we set them to the 1st Jan, 1st Feb, etc for an arbitrary year?
* For calculating daily means, how should we deal with leap years?  Perhaps we should have a separate method for daily means that can handle Feb 29th / March 1st indexing of timesteps so it doesn't accidentally mix these together.


Given the above questions, perhaps as an intermediate step we could transfer the method over to the utils.py module and output the means array (12 x number of latitudes x number of longitudes).


was (Author: rlaidlaw):
I've spoken with [~mjoyce] and [~ploikith] to learn about this method.  This is my understanding of it:

Given a dataset with values at monthly increments spread over one or more complete years, this method will calculate the monthly means for the values.  It therefore assumes the input is of a specific structure with the number of times as multiples of 12.  Within the method, a temporary 4D numpy data structure (number of years x 12 x number of latitudes x number of longitudes) is created from the 3D numpy input (number of months x number of latitudes x number of longitudes).  The numpy 'mean' function is then called on the 4D array to produce a 3D (12 x number of latitudes x number of longitudes) result.

For example, if the dataset has four years of monthly data at 100 grid points (48 monthly timesteps, 100 latitudes and 100 longitudes), the size of the dataset's 3D values array will be 48 x 100 x 100 = 4800.  Within the original metric, this values array is copied and the copy is reshaped to a 4 dimensional array (4 x 12 x 100 x 100).  The numpy 'mean' function (using axis = 0) is then used to calculate the monthly means, returning a (12 x number of latitudes x number of longitudes) numpy data structure.

The discussion also drew out the following points:

* This is more of a dataset manipulation than a metric.  It produces an intermediate product that can then be used with metrics, for example an anomaly calculation metric (by subtracting the means from another set of values)
* This method could therefore be moved to dataset_processor.py (or utils.py as a temporary home)
* If the output of the method is a Dataset object containing the means, this could then be used with the metrics in the new design/architecture (e.g. 'Bias' or similar to calculate anomalies).
* In addition to monthly means, it might be useful to have a daily means calculation/option.


Here are some questions based on the above points:
 * If we return a Dataset object from the method, how do we populate the 'times' field?  This should be a one dimensional array of Python datetime objects.  There will be 12 values (one for each month), but I think year and day of the month are required when creating datetime objects.  Should we set them to the 1st Jan, 1st Feb, etc for an arbitrary year?
* For calculating daily means, how should we deal with leap years?  Perhaps we should have a separate method for daily means that can handle Feb 29th / March 1st indexing of timesteps so it doesn't accidentally mix these together.


Given the above questions, perhaps as an intermediate step we could transfer the method over to the utils.py module and output the means array (12 x number of latitudes x number of longitudes).

> Refactor "calcAnnualCycleMeans" metric from metrics_kyo.py
> ----------------------------------------------------------
>
>                 Key: CLIMATE-341
>                 URL: https://issues.apache.org/jira/browse/CLIMATE-341
>             Project: Apache Open Climate Workbench
>          Issue Type: Sub-task
>          Components: metrics
>    Affects Versions: 0.3-incubating
>            Reporter: Maziyar Boustani
>            Assignee: Ross Laidlaw
>             Fix For: 0.5
>
>
> Reimplement metric "calcAnnualCycleMeans" from [1], possibly in utils.py [2].
> [1]: https://svn.apache.org/repos/asf/incubator/climate/trunk/rcmet/src/main/python/rcmes/toolkit/metrics.py
> [2]:https://svn.apache.org/repos/asf/incubator/climate/trunk/ocw/utils.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)