You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sdap.apache.org by "Joseph C. Jacob (Jira)" <ji...@apache.org> on 2021/08/18 17:23:01 UTC

[jira] [Updated] (SDAP-326) Make ingest processors optional in incubator-sdap-ingestor

     [ https://issues.apache.org/jira/browse/SDAP-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joseph C. Jacob updated SDAP-326:
---------------------------------
    Description: 
h3. The Problem:

The old *incubator-sdap-ningesterpy* / *incubator-sdap-ningester* required that we list the processors to be applied to each dataset at ingest time in the configuration file for the dataset.  The new *incubator-sdap-ingester* applies these processors automatically and has no mechanism to change the behavior via a data collection config setting.  This is a problem with the processor that converts any variable with units "kelvin" to units "celsius" because some variables are in units "kelvin", but represent a difference from a norm and should not be transformed.

Currently, "*kelvintocelsius*" is the only processor that has been identified as one that we need to be able to turn off.  However, this may apply to any units conversion or to other processors added in the future.
h3. The Details:

In particular, for the *{{MUR25-JPL-L4-GLOB-v4.2}}* dataset, we commonly ingest both the *{{analysed_sst}}* and the *{{sst_anomaly}}*, both of which natively have units of degrees Kelvin, but the {{*sst_anomaly* represents a difference from some norm and should not be subject to the “subtract 273.15” operation.  An *sst_anomaly*}} of 0 degrees in degrees Kelvin is still a 0 degree “anomaly” or “difference” in degrees Celsius.  So, we need to restrict which variables get this operation applied to them.
h3. Proposed Solution:

I propose to solve this in a way that is not specific to *kelvintocelsius* processor.  Currently that processor is the only one that has been identified as one that we need to be able to turn off, but there may be others in the future.  The proposed solution is to add a keyword in the *collections-config* where we can list any processors to be turned OFF for a dataset.  Then we would just need to check that a processor is not in this list before applying it.  This approach would work for the *kelvintocelsius* processor and any other processor that is already supported or is added in the future.

  was:
h3. The Problem:

The old *incubator-sdap-ningesterpy* / *incubator-sdap-ningester* required that we list the processors to be applied to each dataset at ingest time in the configuration file for the dataset.  The new *incubator-sdap-ingester* applies these processors automatically and has no mechanism to change the behavior via a data collection config setting.  This is a problem with the processor that converts any variable with units "kelvin" to units "celsius" because some variables are in units "kelvin", but represent a difference from a norm and should not be transformed.

Currently, "*kelvintocelsius*" is the only processor that has been identified as one that we need to be able to turn off.  However, this may apply to any units conversion or to other processors added in the future.
h3. The Details:

In particular, for the *{{MUR25-JPL-L4-GLOB-v4.2}}* dataset, we commonly ingest both the *{{analysed_sst}}* and the *{{sst_anomaly}}*, both of which natively have units of degrees Kelvin, but the {{*sst_anomaly* }}represents a difference from some norm and should not be subject to the “subtract 273.15” operation.   An *{{sst_anomaly}}* of 0 degrees in degrees Kelvin is still a 0 degree “anomaly” or “difference” in degrees Celsius.  So, we need to restrict which variables get this operation applied to them.
h3. Proposed Solution:

I propose to solve this in a way that is not specific to *kelvintocelsius* processor.  Currently that processor is the only one that has been identified as one that we need to be able to turn off, but there may be others in the future.  The proposed solution is to add a keyword in the *collections-config* where we can list any processors to be turned OFF for a dataset.  Then we would just need to check that a processor is not in this list before applying it.  This approach would work for the *kelvintocelsius* processor and any other processor that is already supported or is added in the future.


> Make ingest processors optional in incubator-sdap-ingestor
> ----------------------------------------------------------
>
>                 Key: SDAP-326
>                 URL: https://issues.apache.org/jira/browse/SDAP-326
>             Project: Apache Science Data Analytics Platform
>          Issue Type: Task
>          Components: granule-ingester
>            Reporter: Joseph C. Jacob
>            Priority: Major
>
> h3. The Problem:
> The old *incubator-sdap-ningesterpy* / *incubator-sdap-ningester* required that we list the processors to be applied to each dataset at ingest time in the configuration file for the dataset.  The new *incubator-sdap-ingester* applies these processors automatically and has no mechanism to change the behavior via a data collection config setting.  This is a problem with the processor that converts any variable with units "kelvin" to units "celsius" because some variables are in units "kelvin", but represent a difference from a norm and should not be transformed.
> Currently, "*kelvintocelsius*" is the only processor that has been identified as one that we need to be able to turn off.  However, this may apply to any units conversion or to other processors added in the future.
> h3. The Details:
> In particular, for the *{{MUR25-JPL-L4-GLOB-v4.2}}* dataset, we commonly ingest both the *{{analysed_sst}}* and the *{{sst_anomaly}}*, both of which natively have units of degrees Kelvin, but the {{*sst_anomaly* represents a difference from some norm and should not be subject to the “subtract 273.15” operation.  An *sst_anomaly*}} of 0 degrees in degrees Kelvin is still a 0 degree “anomaly” or “difference” in degrees Celsius.  So, we need to restrict which variables get this operation applied to them.
> h3. Proposed Solution:
> I propose to solve this in a way that is not specific to *kelvintocelsius* processor.  Currently that processor is the only one that has been identified as one that we need to be able to turn off, but there may be others in the future.  The proposed solution is to add a keyword in the *collections-config* where we can list any processors to be turned OFF for a dataset.  Then we would just need to check that a processor is not in this list before applying it.  This approach would work for the *kelvintocelsius* processor and any other processor that is already supported or is added in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)