You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "cyrilou242 (via GitHub)" <gi...@apache.org> on 2023/12/04 17:56:11 UTC

[I] DATETIMECONVERT output type is incorrect when grouping by the column on which the function is applied [pinot]

cyrilou242 opened a new issue, #12091:
URL: https://github.com/apache/pinot/issues/12091

   ## Problem observed
   `DATETIMECONVERT(date_col, config...)` does not return the correct type when `date_col` is used in the `GROUP BY`clause.
   
   **The issue may not be specific to the DATETIMECONVERT transform, but I have not tested with other transforms.**
   
   ## Example
   Using DATETIMECONVERT to transform a string column in an EPOCH format, the expected output type is LONG
   ```
   SELECT
   DATETIMECONVERT("date", '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd', '1:MILLISECONDS:EPOCH', '1:DAYS') AS ts,
   FROM pageviews
   LIMIT 10
   ```
   returns
   ```
   "dataSchema": {
         "columnNames": [
           "ts"
         ],
         "columnDataTypes": [
           "LONG"
         ]
       },
       "rows": [
         [
           1580601600000
         ],
         ...
   ```
   The `ts` column datatype is `LONG` ✅ 
   
   When we use the ts column to group by, the column datatype is still a long
   
   ```
   SELECT
   DATETIMECONVERT("date", '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd', '1:MILLISECONDS:EPOCH', '1:DAYS') AS ts,
   count(*) AS met
   FROM pageviews
   GROUP BY ts
   LIMIT 10
   ```
   
   Returns 
   ```
   "dataSchema": {
         "columnNames": [
           "ts",
           "met"
         ],
         "columnDataTypes": [
           "LONG",
           "LONG"
         ]
       },
       "rows": [
         [
           1581638400000,
           1800
         ],
         ...
   ```
   
   The `ts` column datatype is `LONG` ✅ 
   
   But when grouping by the transformed column directly, the type is different: 
   ```
   SELECT
   DATETIMECONVERT("date", '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd', '1:MILLISECONDS:EPOCH', '1:DAYS') AS ts,
   count(*) AS met
   FROM pageviews
   GROUP BY "date"  -- change is here
   LIMIT 10
   ```
   (notice: `GROUP by "date"`)
   returns 
   ```
   "dataSchema": {
         "columnNames": [
           "ts",
           "met"
         ],
         "columnDataTypes": [
           "STRING",
           "LONG"
         ]
       },
       "rows": [
         [
           "1581206400000",
           1800
         ],
         ...
   ```
   --> `ts` column is of type `STRING` ⚠️ 
   
   **Bug**: It looks like the returned type is the type of of the input time column instead of the type of the output.
   
   The data to reproduce is available here: 
   https://github.com/startreedata/thirdeye/tree/master/examples/pageviews 
   
   If this helps: in most recent versions - or maybe it depends on the type of the original column -  the incorrect type column can also be of type `OBJECT`.
   I can provide an internal online env to reproduce if this helps.
   
   ## Versions: 
   The issue can be reproduced in `0.12.1`, `1.1.0-ST.19.3` and `1.1.0-ST.19.5`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] DATETIMECONVERT output type is incorrect when grouping by the column on which the function is applied [pinot]

Posted by "cyrilou242 (via GitHub)" <gi...@apache.org>.
cyrilou242 commented on issue #12091:
URL: https://github.com/apache/pinot/issues/12091#issuecomment-1839272926

   from @Jackie-Jiang : 
   > the real problem is that our scalar function (not implementing TransformFunction) is not able to return different type based on the argument, and the post-aggregation (e.g. group by date and return the result of a function on date) is using the scalar function instead of TransformFunction. This is a known issue, and we are exploring a fix
   
   closing as duplicate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] DATETIMECONVERT output type is incorrect when grouping by the column on which the function is applied [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on issue #12091:
URL: https://github.com/apache/pinot/issues/12091#issuecomment-1839375710

   related to https://github.com/apache/pinot/issues/11734


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] DATETIMECONVERT output type is incorrect when grouping by the column on which the function is applied [pinot]

Posted by "cyrilou242 (via GitHub)" <gi...@apache.org>.
cyrilou242 closed issue #12091: DATETIMECONVERT output type is incorrect when grouping by the column on which the function is applied
URL: https://github.com/apache/pinot/issues/12091


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org