You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/10/16 14:57:36 UTC

[GitHub] [druid] hnakamor opened a new issue, #13230: SQL-based ingestion cannot generate daily segments taking time zone into account.

hnakamor opened a new issue, #13230:
URL: https://github.com/apache/druid/issues/13230

   Hi all,
   
   The SQL-based ingest by MSQ is a great feature. I have tried it and in my environment it has roughly doubled the data ingestion speed.
   
   However, it seems that all segment granularity and specific intervals, which are possible with Native Batch Ingestion, are not possible with MSQ.
   Thus, it is not possible to generate daily segments that take time zones into account.
   
   The following is an example of specifying with Navite Batch Ingestion.
   ```
   {
     "spec": {
       "dataSchema": {
         "granularitySpec": {
           "segmentGranularity": "all",
           "intervals": [
             "2000-01-01T15:00/2000-01-02T15:00"
           ]
         }
       }
     }
   }
   ```
   
   Confirmed on 25.0.0-SNAPSHOT and 24.0.0。
   
   It would be useful if MSQ could generate daily segments that take time zones into account.
   Considering the overwrite, in my timezone, the only partition option in MSQ is HOUR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] LakshSingla commented on issue #13230: SQL-based ingestion cannot generate daily segments taking time zone into account.

Posted by GitBox <gi...@apache.org>.
LakshSingla commented on issue #13230:
URL: https://github.com/apache/druid/issues/13230#issuecomment-1280438861

   It is great to hear that you are having positive results with the new MSQ based ingestion 😃 
   
   Based on the `granularitySpec` that you mentioned in the issue, does filtering the data and then ingesting it like the following query work? 
   
   ```sql
   INSERT INTO
   ...
   WHERE WHERE __time >= TIMESTAMP '2000-01-01 15:00:00' AND __time < TIMESTAMP '2000-01-02 15:00:00'
   PARTITIONED BY ALL
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [I] SQL-based ingestion cannot generate daily segments taking time zone into account. (druid)

Posted by "darkbaby (via GitHub)" <gi...@apache.org>.
darkbaby commented on issue #13230:
URL: https://github.com/apache/druid/issues/13230#issuecomment-1752477267

   Any updates on this issue?
   
   I face this problem too and want this use case to be real.
   
   Many thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] hnakamor commented on issue #13230: SQL-based ingestion cannot generate daily segments taking time zone into account.

Posted by GitBox <gi...@apache.org>.
hnakamor commented on issue #13230:
URL: https://github.com/apache/druid/issues/13230#issuecomment-1280926592

   @LakshSingla Thank you for your response.
   
   I have tried the matter you commented on.
   ```
   INSERT INTO inline_data
   WITH ext AS (SELECT *
   FROM TABLE(
     EXTERN(
       '{"type":"inline","data":"{\"time\":\"2000-01-01T15:00:00\"}"}',
       '{"type":"json"}',
       '[{"name":"time","type":"string"}]'
     )
   ))
   SELECT TIME_PARSE("time") AS __time
   FROM ext
   WHERE "time" >= TIMESTAMP '2000-01-01 15:00:00' AND "time" < TIMESTAMP '2000-01-02 15:00:00'
   PARTITIONED BY ALL
   ```
   In this case, the segment interval generated is `-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z`.
   ex) `inline_data_-146136543-09-08T08:23:32.096Z_146140482-04-24T15:36:27.903Z_2022-10-17T12:53:06.938Z`
   `REPLACE INTO inline_data OVERWRITE ALL` gave the same result.
   
   `REPLACE INTO inline_data OVERWRITE WHERE __time >= TIMESTAMP '2000-01-01 15:00:00' AND __time < TIMESTAMP '2000-01-02 15:00:00'` is the following error occurs.
   ```
   Error: Plan validation failed
   
   OVERWRITE WHERE clause contains an interval [2000-01-01T15:00:00.000Z/2000-01-02T15:00:00.000Z] which is not aligned with PARTITIONED BY granularity AllGranularity
   
   org.apache.calcite.tools.ValidationException
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] hnakamor commented on issue #13230: SQL-based ingestion cannot generate daily segments taking time zone into account.

Posted by GitBox <gi...@apache.org>.
hnakamor commented on issue #13230:
URL: https://github.com/apache/druid/issues/13230#issuecomment-1280990301

   @gianm Thank you for your response.
   Your idea is great.
   And it would be even more useful if we could apply the "sqlTimeZone" context.
   Here is the first query and context I tried.
   ````
   INSERT INTO inline_data
   WITH ext AS (SELECT *
   FROM TABLE(
     EXTERN(
       '{"type": "inline", "data":"{\"time\":\"2000-01-01T15:00:00\"}"}',
       '{"type": "json"}',
       '[{"name": "time", "type": "string"}]'
     )
   ))
   SELECT TIME_PARSE("time") AS __time
   FROM ext
   PARTITIONED BY DAY
   ````
   ```
   {
     "sqlTimeZone":"+09:00"
   }
   ```
   The above result was `2000-01-02T00:00:00.000Z/2000-01-03T00:00:00.000Z`, but I expected `2000-01-01T15:00:00.000Z/2000-01-02T15:00:00.000Z` or `2000-01-02T00:00:00.000+09:00/2000-01-03T00:00:00.000+09:00`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org