You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sdap.apache.org by GitBox <gi...@apache.org> on 2021/10/29 17:40:45 UTC
[GitHub] [incubator-sdap-ingester] tloubrieu-jpl commented on a change in pull request #48: Read time from filename

tloubrieu-jpl commented on a change in pull request #48:
URL: https://github.com/apache/incubator-sdap-ingester/pull/48#discussion_r739423815



##########
File path: granule_ingester/granule_ingester/processors/TileSummarizingProcessor.py
##########
@@ -27,17 +30,45 @@ class NoTimeException(Exception):
     pass
 
 
-def find_time_min_max(tile_data):
+def find_time_min_max(tile_data, time_from_granule=None):
     if tile_data.time:
         if isinstance(tile_data.time, nexusproto.ShapedArray):
             time_data = from_shaped_array(tile_data.time)
             return int(numpy.nanmin(time_data).item()), int(numpy.nanmax(time_data).item())
-        elif isinstance(tile_data.time, int):
+        elif isinstance(tile_data.time, int) and \
+             tile_data.time > datetime.datetime(1970, 1, 1).timestamp():  # time should be at least greater than Epoch, right?
             return tile_data.time, tile_data.time
 
+    if time_from_granule:
+        return time_from_granule, time_from_granule
+
     raise NoTimeException
 
 
+def get_time_from_granule(granule: str) -> int:
+    """
+    Get time from granule name. It makes the assumption that a datetime is
+    specificed in the granule name, and it has the following format "YYYYddd",
+    where YYYY is 4 digit year and ddd is day of year (i.e. 1 to 365).
+    
+    Note: This is a very narrow implmentation for a specific need. If you found
+    yourself needing to modify this function to accommodate more use cases, then
+    perhaps it is time to refactor this function to a more dynamic module.
+    """
+
+    # rs search for a sub str which starts with 19 or 20, and has 7 digits
+    search_res = re.search('((?:19|20)[0-9]{2})([0-9]{3})', os.path.basename(granule))
+    if not search_res:
+        return None
+    
+    year, days = search_res.groups()
+    year = int(year)
+    days = int(days)
+
+    # since datetime is set to 1/1 (+1 day), timedelta needs to -1 to cancel it out
+    return int((datetime.datetime(year, 1, 1) + datetime.timedelta(days-1)).timestamp())

Review comment:
       I think we want to be careful about the UTC timezone.
   
   For example with code
   
       import pytz
       aware = datetime.datetime(2011,8,15,8,15,12,0,pytz.UTC)
   
   
   Or with the example in my comment above:
   
       import pytz
       datetime.strptime(filename, 'A%Y%j.L3m_CHL_chlor_a_4km.nc').replace(tzinfo=pytz.utc)
   
   That would be more simple than the current version of the code and spare some lines of code.
   
   I am not sure if the hardcoded -1 days is also something we would like to have for all the datasets. Maybe an additional optional configuration variable for the collection should give an offset for the min and offset for the max time. 
   In this case that would be -1 and -1 for both.
   
   I guess the  most difficult change in my request is to read from the collection configmap. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sdap.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org