You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Faisal Durrani <te...@gmail.com> on 2018/12/03 02:58:44 UTC

Expression language - Convert ISO 8601 to unixtimestamp

Hi Guys,

I am having trouble converting the ISO 8601 to a unixtimestamp . Here is
what i have tried

current_ts: 2018-11-11T00:17:27.937000

Using updateAttribute, I have configured the below property

${current_ts:toDate("yyyy-MM-dd'T'HH:mm:ss.SSSSSS"):toNumber()}
This gives the Output value : 1541863984000
if converted back from epochcoverter, its giving

Assuming that this timestamp is in milliseconds:
GMT: Saturday, 10 November 2018 15:33:04
Your time zone: Sunday, 11 November 2018 00:33:04 GMT+09:00
<https://www.epochconverter.com/timezones?q=1541863984>
Relative: 22 days ago

I have also tried using ${current_ts:toDate("yyyy-MM-dd'T'HH:mm:ss.SSSSSS",
"UTC"):toNumber()} but the result is the same and its completely wrong.
Kindly help finding out correct expression.

Regards,
Faisal

Re: Expression language - Convert ISO 8601 to unixtimestamp

Posted by Kevin Doran <kd...@gmail.com>.
Hi Faisal,

Out of curiosity, does the upstream system(s) guarantee that these timestamps will be unique, even at the microsecond level? (I’ve seen some systems that write a “microsecond precision” which is really just a millisecond value + a sub-millisecond value that increments every time it is queried within a certain time window and thus guaranteeing uniqueness within a set of values taken from the same clock).

Disclaimer on all of the below: I’m not an HBase expert, so I’m just applying my knowledge of using other databases to try to help you troubleshoot this situation.

Do you have control over the HBase Schema, and if so, is there another field/value you could use for transaction id other than timestamp?  According to the little I have read, one should avoid using timestamps for row keys anyway [1]. That aside, it seems like you have a uniqueness challenge truncating your microseconds, so you’ll have to consider one of the following:


  1.  Change the HBase schema to use some other key. If you can,  just derive a primary key to use as a transaction id for the Hbase record. You could just use the string representation of the microsecond timestamp (assuming you have guarantees that is going to be unique in your input data) or run it through some hash algorithm and store the digest, which might also improve the HBase region distribution when bulk writing. You can still generate a millisecond timestamp to write to Hbase, but you can drop the unique requirement for the data as you will have some other key. If the upstream system is not guaranteeing that the timestamps will be unique, it sounds to me like you should probably not be relying on the timestamp as a transaction id at all.
  2.  If you don’t control the target Hbase system and you’re stuck using millisecond timestamps as transactions ids, then you could consider modifying the data you are inserting so that the timestamps are always unique. This may/may not be an option depending on how many transactions per second you are dealing with in the input data. If the rate of transactions is low (e.g., much less than 1000 per second), then if you get a millisecond collision when truncating microseconds you can deal with that by modifying the duplicate timestamp (i.e., the second one to get truncated to the same value) to the closest unused millisecond value. This adds overhead (you have to keep track of milliseconds used within a time window / batch, or have a failure path for data flow that tries to insert into Hbase until successful, modifyingthe key each time), and again, doesn’t work if you have more than 1000 transactions per second as then you will inevitably start skewing the timestamps by quite a bit.

If it were me, I would try hard to use something other than timestamp as your row key / transaction id to avoid this altogether… even if the timestamps is part of whatever you end up using as that id.

Hope this helps and good luck,
Kevin

[1] http://hbase.apache.org/book.html#rowkey.design

From: Faisal Durrani <te...@gmail.com>
Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
Date: Monday, December 3, 2018 at 22:03
To: "users@nifi.apache.org" <us...@nifi.apache.org>
Subject: Re: Expression language - Convert ISO 8601 to unixtimestamp

Hi Kevin,

Thank you for your mail and suggestions. I was also able to achieve the same by using the below expression

${current_ts:substring(0,19):toDate("yyyy-MM-dd'T'HH:mm:ss","UTC"):toNumber():plus(${current_ts:substring(20,23)})}

So I am taking out the millisecond part of the microsecond time and adding into the unix time generated from yyyy-MM-dd'T'HH:mm:ss. While this worked for the one of the case i was working with, sadly it failed for another one. As it turns out I need to have a microsecond precision so e.g. I have one transaction that is coming at current_ts: 2018-11-11T00:17:27.937000 and then I have another one coming at current_ts: 2018-11-11T00:17:27.937001 . The Unix timestamp converted from the above expression is in millisecond (13 digit epoch number) so as i understand it cannot be more precise then this. If it take out the microsecond precision the target system which is Hbase in my case considers it as the same transaction and over writes the record received at the earlier time. Hbase timestamp too can only accept unix time in millisecond (13 digit epoch number) precision. So i am lost at this point how to order these transactions :(. Please do me know if you have any suggestion on this.

Regards,
Faisal

On Tue, Dec 4, 2018 at 1:49 AM Kevin Doran <kd...@gmail.com>> wrote:
Hi Faisal,

It appears whatever is writing these date strings that you have for inputs is writing microseconds, not milliseconds,

So when you are using `.SSSSSS`, that is, in this case `937000` microseconds being interpreted by the expression language java parser as 937000 milliseconds (15 minutes 37 seconds), hence the “slightly off” factor you are experiencing for the minutes/seconds of the result.

The difference in hour/day can be explained by timezone. Your input timestring must be localtime, but you are treating it as UTC. Specify the correct timezone when calling toDate() and that will shift the hours.

So, my suggestion is to run these date strings through and expression that first “converts” microseconds to milliseconds via truncation (using substring to drop off the trailing “000” is equivalent to dividing by 1000 and dropping the remainder/fraction), and then using your expression with the timezone specified. Given that your example was off by 9 hours (in addition to the ~15 minutes), I used UTC+9 time zone and got this expression, which worked for me:

${current_ts:substring(0,${current_ts:length():minus(3)}):toDate("yyyy-MM-dd'T'HH:mm:ss.SSSSSS", "AWDT"):toNumber()}

Lastly, depending on how you are using the result of toNumber(), keep in mind that some systems expect seconds since epoch (not milliseconds, which toNumber() outputs) for a Unix timestamp.

Cheers,
Kevin

From: Faisal Durrani <te...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Sunday, December 2, 2018 at 21:59
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Expression language - Convert ISO 8601 to unixtimestamp

Hi Guys,

I am having trouble converting the ISO 8601 to a unixtimestamp . Here is what i have tried

current_ts: 2018-11-11T00:17:27.937000

Using updateAttribute, I have configured the below property

${current_ts:toDate("yyyy-MM-dd'T'HH:mm:ss.SSSSSS"):toNumber()}
This gives the Output value : 1541863984000
if converted back from epochcoverter, its giving

Assuming that this timestamp is in milliseconds:

GMT: Saturday, 10 November 2018 15:33:04

Your time zone: Sunday, 11 November 2018 00:33:04 GMT+09:00<https://www.epochconverter.com/timezones?q=1541863984>

Relative: 22 days ago



I have also tried using ${current_ts:toDate("yyyy-MM-dd'T'HH:mm:ss.SSSSSS", "UTC"):toNumber()} but the result is the same and its completely wrong. Kindly help finding out correct expression.

Regards,
Faisal


Re: Expression language - Convert ISO 8601 to unixtimestamp

Posted by Faisal Durrani <te...@gmail.com>.
Hi Kevin,

Thank you for your mail and suggestions. I was also able to achieve the
same by using the below expression

${current_ts:substring(0,19):toDate("yyyy-MM-dd'T'HH:mm:ss","UTC"):toNumber():plus(${current_ts:substring(20,23)})}

So I am taking out the millisecond part of the microsecond time and adding
into the unix time generated from yyyy-MM-dd'T'HH:mm:ss. While this worked
for the one of the case i was working with, sadly it failed for another
one. As it turns out I need to have a microsecond precision so e.g. I have
one transaction that is coming at current_ts: 2018-11-11T00:17:27.937000
and then I have another one coming at
current_ts: 2018-11-11T00:17:27.937001 . The Unix timestamp converted from
the above expression is in millisecond (13 digit epoch number) so as i
understand it cannot be more precise then this. If it take out the
microsecond precision the target system which is Hbase in my case considers
it as the same transaction and over writes the record received at the
earlier time. Hbase timestamp too can only accept unix time in millisecond
(13 digit epoch number) precision. So i am lost at this point how to order
these transactions :(. Please do me know if you have any suggestion on this.

Regards,
Faisal

On Tue, Dec 4, 2018 at 1:49 AM Kevin Doran <kd...@gmail.com> wrote:

> Hi Faisal,
>
>
>
> It appears whatever is writing these date strings that you have for inputs
> is writing microseconds, not milliseconds,
>
>
>
> So when you are using `.SSSSSS`, that is, in this case `937000`
> microseconds being interpreted by the expression language java parser as
> 937000 milliseconds (15 minutes 37 seconds), hence the “slightly off”
> factor you are experiencing for the minutes/seconds of the result.
>
>
>
> The difference in hour/day can be explained by timezone. Your input
> timestring must be localtime, but you are treating it as UTC. Specify the
> correct timezone when calling toDate() and that will shift the hours.
>
>
>
> So, my suggestion is to run these date strings through and expression that
> first “converts” microseconds to milliseconds via truncation (using
> substring to drop off the trailing “000” is equivalent to dividing by 1000
> and dropping the remainder/fraction), and then using your expression with
> the timezone specified. Given that your example was off by 9 hours (in
> addition to the ~15 minutes), I used UTC+9 time zone and got this
> expression, which worked for me:
>
>
>
> ${current_ts:substring(0,${current_ts:length():minus(3)}):toDate("yyyy-MM-dd'T'HH:mm:ss.SSSSSS",
> "AWDT"):toNumber()}
>
>
>
> Lastly, depending on how you are using the result of toNumber(), keep in
> mind that some systems expect *seconds since epoch* (not *milliseconds*,
> which toNumber() outputs) for a Unix timestamp.
>
>
>
> Cheers,
> Kevin
>
>
>
> *From: *Faisal Durrani <te...@gmail.com>
> *Reply-To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Date: *Sunday, December 2, 2018 at 21:59
> *To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Subject: *Expression language - Convert ISO 8601 to unixtimestamp
>
>
>
> Hi Guys,
>
>
>
> I am having trouble converting the ISO 8601 to a unixtimestamp . Here is
> what i have tried
>
>
>
> current_ts: 2018-11-11T00:17:27.937000
>
>
>
> Using updateAttribute, I have configured the below property
>
> ${current_ts:toDate("yyyy-MM-dd'T'HH:mm:ss.SSSSSS"):toNumber()}
>
> This gives the Output value : 1541863984000
>
> if converted back from epochcoverter, its giving
>
>
>
> *Assuming that this timestamp is in milliseconds:*
>
> *GMT*: Saturday, 10 November 2018 15:33:04
>
> Your time zone: Sunday, 11 November 2018 00:33:04 GMT+09:00
> <https://www.epochconverter.com/timezones?q=1541863984>
>
> *Relative*: 22 days ago
>
>
>
> I have also tried using
> ${current_ts:toDate("yyyy-MM-dd'T'HH:mm:ss.SSSSSS", "UTC"):toNumber()} but
> the result is the same and its completely wrong. Kindly help finding out
> correct expression.
>
>
>
> Regards,
>
> Faisal
>
>
>

Re: Expression language - Convert ISO 8601 to unixtimestamp

Posted by Kevin Doran <kd...@gmail.com>.
Hi Faisal,

It appears whatever is writing these date strings that you have for inputs is writing microseconds, not milliseconds,

So when you are using `.SSSSSS`, that is, in this case `937000` microseconds being interpreted by the expression language java parser as 937000 milliseconds (15 minutes 37 seconds), hence the “slightly off” factor you are experiencing for the minutes/seconds of the result.

The difference in hour/day can be explained by timezone. Your input timestring must be localtime, but you are treating it as UTC. Specify the correct timezone when calling toDate() and that will shift the hours.

So, my suggestion is to run these date strings through and expression that first “converts” microseconds to milliseconds via truncation (using substring to drop off the trailing “000” is equivalent to dividing by 1000 and dropping the remainder/fraction), and then using your expression with the timezone specified. Given that your example was off by 9 hours (in addition to the ~15 minutes), I used UTC+9 time zone and got this expression, which worked for me:

${current_ts:substring(0,${current_ts:length():minus(3)}):toDate("yyyy-MM-dd'T'HH:mm:ss.SSSSSS", "AWDT"):toNumber()}

Lastly, depending on how you are using the result of toNumber(), keep in mind that some systems expect seconds since epoch (not milliseconds, which toNumber() outputs) for a Unix timestamp.

Cheers,
Kevin

From: Faisal Durrani <te...@gmail.com>
Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
Date: Sunday, December 2, 2018 at 21:59
To: "users@nifi.apache.org" <us...@nifi.apache.org>
Subject: Expression language - Convert ISO 8601 to unixtimestamp

Hi Guys,

I am having trouble converting the ISO 8601 to a unixtimestamp . Here is what i have tried

current_ts: 2018-11-11T00:17:27.937000

Using updateAttribute, I have configured the below property

${current_ts:toDate("yyyy-MM-dd'T'HH:mm:ss.SSSSSS"):toNumber()}
This gives the Output value : 1541863984000
if converted back from epochcoverter, its giving

Assuming that this timestamp is in milliseconds:

GMT: Saturday, 10 November 2018 15:33:04

Your time zone: Sunday, 11 November 2018 00:33:04 GMT+09:00<https://www.epochconverter.com/timezones?q=1541863984>

Relative: 22 days ago



I have also tried using ${current_ts:toDate("yyyy-MM-dd'T'HH:mm:ss.SSSSSS", "UTC"):toNumber()} but the result is the same and its completely wrong. Kindly help finding out correct expression.

Regards,
Faisal