You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/06/05 11:44:26 UTC

[GitHub] [druid] enenuki opened a new issue #9991: Loss of sub-second time info when parsing posix-formatted time field

enenuki opened a new issue #9991:
URL: https://github.com/apache/druid/issues/9991


   Hi, 
   we are using kafka supervisor to ingest data and we seem to have issue with loss of precision while parsing posix-formatted time field.
   Input data is formatted as in: 
   `{"host":"somehost.hr","ident":"dpdk","pid":"182","msgid":"-","extradata":"-","message":"Some msgs","severity":"err","severity_val":3,"facility":"daemon","facility_val":3,"priority_val":27,"tag":null,"time":1591349534.205346}`
   Note that time is posix format, but as DOUBLE not as integer, so it contains info about milli/nano seconds.
   `"timestampSpec":{ "column":"time", "format":"posix" }`
   
   However, after ingestion we get data rounded to second.
   
   Maybe we are doing something wrong? 
   Is there any way we can get this sub-second time preserved? We found that in transformation expressions  we can refer to "original" time field (used to populate __time field), but this seems like hacky solution.
   
   Version of druid is 0.18.1
   
   Thanks!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #9991: Loss of sub-second time info when parsing posix-formatted time field

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #9991:
URL: https://github.com/apache/druid/issues/9991#issuecomment-640708047


   Hi @enenuki, the `posix` timestamp format rounds timestamps to second. Does `millis` timestamp format work for you?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] enenuki edited a comment on issue #9991: Loss of sub-second time info when parsing posix-formatted time field

Posted by GitBox <gi...@apache.org>.
enenuki edited a comment on issue #9991:
URL: https://github.com/apache/druid/issues/9991#issuecomment-641113015


   @jihoonson Unfortunately we get the input time as posix double number not long. 
   
   And FrankChen021 is right, posix standard does state number as integer... 
   
   Not sure which options do we have without changing input format... All transformations take place only after time-parsing so we cannot use __time field? If that is right we do not have much options... 
   
   The only option that we currently use is using transformation expression to calculate ID by using the original field used to populate __time. This field is available in transformation expressions so this enables us to use it to generate new filed with milliseconds preserved... 
   
   This ability to use the original field used to populate __time was discovered by accident, so this feature should probably be documented somewhere... (if it is not already, I did not manage to find it) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on issue #9991: Loss of sub-second time info when parsing posix-formatted time field

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on issue #9991:
URL: https://github.com/apache/druid/issues/9991#issuecomment-639989256


   posix timestamp is a value of SECONDS since epoch. Druid takes the input as type of LONG, so the fraction is ignored. To keep the subsecond, you could turn the value into format of MILLISCONDS timestamp, that is 1591349534205 in your example. 
   
   In practice, query granularity is widely used to  utilize the roll-up functionality, which means there's no need to care about the exact timestamp of every record. Because of loss of subsecond, it's equivalent to set the query granularity to SECOND. So I think the loss is acceptable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] enenuki commented on issue #9991: Loss of sub-second time info when parsing posix-formatted time field

Posted by GitBox <gi...@apache.org>.
enenuki commented on issue #9991:
URL: https://github.com/apache/druid/issues/9991#issuecomment-640446250


   Thanks for your quick response.
   I did not understand you exactly... you are suggestion to change our input format or we can do some pre-time parsing transformation to achieve this?
   We plan to use rollup functionality but with minimal none/millisecond query granularity. Seconds as query granularity is not an option.
   Still, getting input as double and than ignoring sub-second info seems somewhat unreasonable. Especially if you do have an option to use millisecond granularity. Maybe this affects ingestion time?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on issue #9991: Loss of sub-second time info when parsing posix-formatted time field

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on issue #9991:
URL: https://github.com/apache/druid/issues/9991#issuecomment-640659131






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] enenuki commented on issue #9991: Loss of sub-second time info when parsing posix-formatted time field

Posted by GitBox <gi...@apache.org>.
enenuki commented on issue #9991:
URL: https://github.com/apache/druid/issues/9991#issuecomment-641113015


   @jihoonson Unfortunately we get the input time as posix double number not long. 
   
   And FrankChen021 is right, posix standard does state number as integer... 
   
   Not sure which options do we have without changing input format... All transformations take place only after time-parsing so we cannot use __time field? If that is right we do not have much options... 
   
   The only option that we currently use is using transformation expression to calculate ID by using the original field used to populate __time. This field is available in transformation expressions so this enables us to use it to generate new filed with milliseconds preserved...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org