You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2021/09/22 19:39:07 UTC

[GitHub] [superset] betodealmeida opened a new pull request #16795: feat: handle temporal columns in group bys

betodealmeida opened a new pull request #16795:
URL: https://github.com/apache/superset/pull/16795


   <!---
   Please write the PR title following the conventions at https://www.conventionalcommits.org/en/v1.0.0/
   Example:
   fix(dashboard): load charts correctly
   -->
   
   ### SUMMARY
   <!--- Describe the change below, including rationale and design decisions -->
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   <!--- Skip this if not applicable -->
   
   Query before was invalid:
   
   ```sql
   SELECT FLOOR("__time" TO DAY) AS "__time",
          COUNT(*) AS "count"
   FROM "druid"."wikipedia"
   GROUP BY FLOOR("__time" TO DAY)
   ORDER BY "count" DESC
   LIMIT 100;
   
   SELECT FLOOR("__time" TO DAY) AS "__timestamp",
          FLOOR("__time" TO DAY) AS "__time",
          COUNT(*) AS "count"
   FROM "druid"."wikipedia"
   WHERE FLOOR("__time" TO DAY) = '2016-06-27T00:00:00.000Z'  -- THIS FAILS
   GROUP BY FLOOR("__time" TO DAY),
            FLOOR("__time" TO DAY)
   ORDER BY "count" DESC
   LIMIT 10000;
   ```
   
   Query after:
   
   ```sql
   SELECT FLOOR("__time" TO DAY) AS "__time",
          COUNT(*) AS "count"
   FROM "druid"."wikipedia"
   GROUP BY FLOOR("__time" TO DAY)
   ORDER BY "count" DESC
   LIMIT 100;
   
   SELECT FLOOR("__time" TO DAY) AS "__timestamp",
          FLOOR("__time" TO DAY) AS "__time",
          COUNT(*) AS "count"
   FROM "druid"."wikipedia"
   WHERE FLOOR("__time" TO DAY) = TIME_PARSE('2016-06-27T00:00:00+00:00') -- THIS IS CORRECT
   GROUP BY FLOOR("__time" TO DAY),
            FLOOR("__time" TO DAY)
   ORDER BY "count" DESC
   LIMIT 10000;
   ```
   
   ### TESTING INSTRUCTIONS
   <!--- Required! What steps can be taken to manually verify the changes? -->
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley edited a comment on pull request #16795: feat: handle temporal columns in group bys

Posted by GitBox <gi...@apache.org>.
john-bodley edited a comment on pull request #16795:
URL: https://github.com/apache/superset/pull/16795#issuecomment-955122235


   @betodealmeida and @villebro I don't believe this logic works for all cases. You could have a temporal field which is encoded as a string per the Python date format which shouldn't be converted to a `TIMESTAMP`. 
   
   Isn't the correct logic to use the type of the column rather than assuming it's a `TIMESTAMP`? Furthermore `self.db_engine_spec.convert_dttm("TIMESTAMP", dttm)` could return `None` (even for the `TIMESTAMP` target type). I was thinking the logic should be of the form,
   
   ```python
   if column_map[dimension].is_temporal and and isinstance(value, str):
       if result := self.db_engine_spec.convert_dttm(
           column_map[dimension].type, 
           dateutil.parser.parse(value),
       ):
           value = text(result)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] betodealmeida merged pull request #16795: feat: handle temporal columns in group bys

Posted by GitBox <gi...@apache.org>.
betodealmeida merged pull request #16795:
URL: https://github.com/apache/superset/pull/16795


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley edited a comment on pull request #16795: feat: handle temporal columns in group bys

Posted by GitBox <gi...@apache.org>.
john-bodley edited a comment on pull request #16795:
URL: https://github.com/apache/superset/pull/16795#issuecomment-955122235


   @betodealmeida and @villebro I don't believe this logic works for all cases. You could have a temporal field which is encoded as a string per the Python date format which shouldn't be converted to a `TIMESTAMP`. 
   
   Isn't the correct logic to use the type of the column rather than assuming it's a `TIMESTAMP`? Furthermore `self.db_engine_spec.convert_dttm("TIMESTAMP", dttm)` could return `None` (even for the `TIMESTAMP` target type). I was thinking the logic should be of the form,
   
   ```python
   if column_map[dimension].is_temporal and and isinstance(value, str):
       if result := self.db_engine_spec.convert_dttm(
           column_map[dimension].type, 
           dateutil.parser.parse(value),
       ):
           value = text(result)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] villebro commented on a change in pull request #16795: feat: handle temporal columns in group bys

Posted by GitBox <gi...@apache.org>.
villebro commented on a change in pull request #16795:
URL: https://github.com/apache/superset/pull/16795#discussion_r714454822



##########
File path: superset/db_engine_specs/druid.py
##########
@@ -100,3 +100,17 @@ def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]:
         if tt in (utils.TemporalType.DATETIME, utils.TemporalType.TIMESTAMP):
             return f"""TIME_PARSE('{dttm.isoformat(timespec="seconds")}')"""
         return None
+
+    @classmethod
+    def epoch_to_dttm(cls) -> str:
+        """
+        Convert from number of seconds since the epoch to a timestamp.
+        """
+        return "MILLIS_TO_TIMESTAMP({col} * 1000)"
+
+    @classmethod
+    def epoch_ms_to_dttm(cls) -> str:
+        """
+        Convert from number of milliseconds since the epoch to a timestamp.
+        """
+        return "MILLIS_TO_TIMESTAMP({col})"

Review comment:
       Oh wow, I didn't know these weren't defined yet for Druid!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on pull request #16795: feat: handle temporal columns in group bys

Posted by GitBox <gi...@apache.org>.
john-bodley commented on pull request #16795:
URL: https://github.com/apache/superset/pull/16795#issuecomment-955122235


   @betodealmeida and @villebro I don't believe this logic works for all cases. You could have a temporal field which is encoded as a string per the Python date format which shouldn't be converted to a `TIMESTAMP`. 
   
   Isn't the correct logic to use the type of the column rather than assuming it's a `TIMESTAMP`? Furthermore `self.db_engine_spec.convert_dttm("TIMESTAMP", dttm)` could return `None` (even for the `TIMESTAMP` target type). I was thinking the logic should be of the form,
   
   ```python
   if column_map[dimension].is_temporal and and isinstance(value, str):
       if rv := self.db_engine_spec.convert_dttm(
           column_map[dimension].type, 
           dateutil.parser.parse(value),
       ):
           value = text(rv)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on pull request #16795: feat: handle temporal columns in group bys

Posted by GitBox <gi...@apache.org>.
john-bodley commented on pull request #16795:
URL: https://github.com/apache/superset/pull/16795#issuecomment-956479778


   @betodealmeida I have proposed a fix in https://github.com/apache/superset/pull/17312.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on pull request #16795: feat: handle temporal columns in group bys

Posted by GitBox <gi...@apache.org>.
john-bodley commented on pull request #16795:
URL: https://github.com/apache/superset/pull/16795#issuecomment-956479778


   @betodealmeida I have proposed a fix in https://github.com/apache/superset/pull/17312.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] betodealmeida commented on a change in pull request #16795: feat: handle temporal columns in group bys

Posted by GitBox <gi...@apache.org>.
betodealmeida commented on a change in pull request #16795:
URL: https://github.com/apache/superset/pull/16795#discussion_r715138730



##########
File path: superset/db_engine_specs/druid.py
##########
@@ -100,3 +100,17 @@ def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]:
         if tt in (utils.TemporalType.DATETIME, utils.TemporalType.TIMESTAMP):
             return f"""TIME_PARSE('{dttm.isoformat(timespec="seconds")}')"""
         return None
+
+    @classmethod
+    def epoch_to_dttm(cls) -> str:
+        """
+        Convert from number of seconds since the epoch to a timestamp.
+        """
+        return "MILLIS_TO_TIMESTAMP({col} * 1000)"
+
+    @classmethod
+    def epoch_ms_to_dttm(cls) -> str:
+        """
+        Convert from number of milliseconds since the epoch to a timestamp.
+        """
+        return "MILLIS_TO_TIMESTAMP({col})"

Review comment:
       Yeah, I implemented `epoch_to_dttm` while working on a solution, and decided to leave it there even if I wasn't using it, and add `epoch_ms_to_dttm` for completeness.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org