You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/01/26 23:59:39 UTC

[GitHub] [pinot] walterddr edited a comment on issue #8045: Add "DateTime" DataType

walterddr edited a comment on issue #8045:
URL: https://github.com/apache/pinot/issues/8045#issuecomment-1022709088


   Context
   ===
   
   for more context. let's say user want to configure this during ingest:
   ```
       "dateTimeFieldSpecs": [{
         "name": "Date",
         "dataType": "STRING",
         "format" : "1:SECONDS:SIMPLE_DATE_FORMAT:MM/dd/yyyy HH:mm:ss a",
         "granularity": "1:HOURS"
       }]
   ```
   
   they have to set the `"dataType"` to `STRING` because one want the result of 
   ```
   Select Date From myTable 
   ```
   to be a string that conforms with the SDF specified . (for example the STRING is directly feed into some downstream program)
   
   Challenge
   ===
   However,
   1. in SQL database, setting a column to STRING type means we need to support >= and <= in the raw data format.
   2. in Pinot, we cant support this SDF as time column format because they are not both lexical and time order consistent (e.g. `02/01/2021` comes after `01/29/2022` in string-ordering but before in timestamp-ordering), if we use this field as time field for partitioning real-time and offline table, we will have wrong results because the underlying ordering is STRING-based
   3. one can also configure the `dataType` to `TIMESTAMP` and intrinsically convert to String in query, but the result has to be the ISO SQL standard yyyy-mm-ddTHH:MM:SS format, which might not be what the user wanted.
   
   
   Problem Statement
   ===
   We want to create some kind of ingestion configurable DataType (let's name it `DateTime`) that (1) returns a String that conforms with the ingestion configured SDF; and (2) ordered by EPOCH ordering;
   
   So that
   ```
   SELECT myDateTimeType FROM myTable ORDER BY myDateTimeType
   ```
   returns 
   ```
   02/01/2021 00:00:00
   01/01/2022 00:00:00
   ```
   
   Proposal
   ===
   We can either store the actual data in STRING or LONG. but 
   1. if we store it in raw string format and force it to order by converted EPOCH, this requires us to convert it every time making a compare. very costly.
   2. if we were to store it as LONG which is natively sorted in EPOCH, and only do the conversion when query: we need to store the original SDF configured by user during ingestion somewhere, so we need to find a way to let Pinot know during query time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org