You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sdap.apache.org by "Antoine Queric (Jira)" <ji...@apache.org> on 2021/07/02 07:33:00 UTC
[jira] [Commented] (SDAP-317) Open multiple netcdf files in order to generate granules with multiple time steps

    [ https://issues.apache.org/jira/browse/SDAP-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373294#comment-17373294 ] 

Antoine Queric commented on SDAP-317:
-------------------------------------

Dear [~tloubrieu] , [~skperez] , [~thuang], [~nchung] (others ?)

I think this may be an interesting conversation before deciding whether we should or should not try to implement such a feature.

 

Our aim is to be able to concatenate multiple time ranges into one tile (also geo-spatially sliced) from daily netcdf files (only one TIME step inside). We are interested into trying that because we thought that may enhance long timeseries queries on the nexus webapp (yet to be proved).

Below is the directory structure of the dataset we are working with :

```

2015
├── 001
│   └── 20150101-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 002
│   └── 20150102-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 003
│   └── 20150103-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 004
│   └── 20150104-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 005
│   └── 20150105-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 006
│   └── 20150106-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 007
│   └── 20150107-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 008
│   └── 20150108-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 009
│   └── 20150109-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc

```

Headers chunk for the dimensions size :

```

netcdf \20150101-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0 {
dimensions:
 lat = 1600 ;
 lon = 3600 ;
 time = 1 ;

```

First of all, is this something we really want to add in SDAP ingester ? I'm not sure re-processing the netcdf files in order to re-chunk on time dimension is something we will do in some cases, so my first guess is that we would want granule_ingester to do that for us.

 

If we wanted this, how to handle the collection_manager for it to properly queue a specific number of contiguous files that one granule_ingester intsance will take care of ?

 

Maybe the more simple way to handle that would simply be to (for this speciic need) bypass the collection_manager & launch a granule_ingester instance with a list of files & configuration to process ?

 

Maybe I'm missing something, we started evaluating the new SDAP ingester module this week (and already comitting code for elasticsearch support) ; maybe what we want to do is alreadyy possible & I missed that :)

 

Best regards,

Antoine

> Open multiple netcdf files in order to generate granules with multiple time steps
> ---------------------------------------------------------------------------------
>
>                 Key: SDAP-317
>                 URL: https://issues.apache.org/jira/browse/SDAP-317
>             Project: Apache Science Data Analytics Platform
>          Issue Type: Improvement
>          Components: granule-ingester
>            Reporter: Antoine Queric
>            Priority: Major
>
> When netcdf files only include one single time step, it may be interesting to open multiple files & generate a data cube which contains :
>  * longitude slice
>  * latitude slice
>  * time slice
> We will develop & test such a feature in order to compare performance when querying long timeseries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)