You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Istvan Toth (Jira)" <ji...@apache.org> on 2022/10/11 14:44:00 UTC

[jira] [Comment Edited] (PHOENIX-5066) The TimeZone is incorrectly used during writing or reading data

    [ https://issues.apache.org/jira/browse/PHOENIX-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615876#comment-17615876 ] 

Istvan Toth edited comment on PHOENIX-5066 at 10/11/22 2:43 PM:
----------------------------------------------------------------

I made quite a bit of progress, but I'm a bit stuck now.

On the server side a ThreadLocal seems to work well, as each scan has a dedicated thread.

However, on the client side there is no dedicated thread for the query, some expressions (at least standard upserts) are evaluated in the application thread, so a ThreadLocal set from the Connection cannot solve the problem there. (It would break as soon as operate on more than one Connection in one thread).

My current WIP patch at [https://github.com/apache/phoenix/pull/1504/commits/4a0994078933b7b6b80b7022aed48c522c747966] sets the ThreadLocal when the connection is created, but this is not a workable solution.

The main problem is that the Expression and PType system is designed to be a proper functional system, without access to globals like Connection, or Statement. I was able to push the necessary context to some expression nodes, but a lot of the conversions happen in the PDate.toObject() ... methods. PDate.INSTANCE is designed to be a singleton.

I am looking at the following solutions:
 - Setting and unsetting a ThreadLocal with the context at the JDBC entry/exit points as needed. (Perhaps using the CallRunner system)

 - Adding an extra StatementContext parameter to most methods in the Expression objects, so that we eventually have the context when PDate.toObject() is called. – huge volume of changes, and very ugly

 - Pushing the StatementContext to every Expression object created, and using that when calling the PDate.toObject()  methods

 - Breaking the rule that the PDataTypes are singletons, and injecting the context to PDataType objects as needed.

I am looking for input on what would be the best of the above, or even better, if you have a more elegant solution.

Fore reference, a client side stack trace of where we need to have ExpressionContext available:
{noformat}
java.lang.NullPointerException
    at org.apache.phoenix.util.ThreadExpressionCtx.parseTimestamp(ThreadExpressionCtx.java:46)
    at org.apache.phoenix.schema.types.PTimestamp.toObject(PTimestamp.java:135)
    at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:187)
    at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:172)
    at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:159)
    at org.apache.phoenix.compile.UpsertCompiler$UpdateColumnCompiler.visit(UpsertCompiler.java:970)
    at org.apache.phoenix.compile.ExpressionCompiler.visit(ExpressionCompiler.java:1)
    at org.apache.phoenix.parse.LiteralParseNode.accept(LiteralParseNode.java:80)
    at org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:840)
    at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:1008)
    at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:1)
    at org.apache.phoenix.jdbc.PhoenixStatement$3.call(PhoenixStatement.java:539)
    at org.apache.phoenix.jdbc.PhoenixStatement$3.call(PhoenixStatement.java:1)
    at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:519)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:507)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:2151)
    at org.apache.phoenix.end2end.CompliantDateTimeIT.testUpsertTZ(CompliantDateTimeIT.java:233)
    <JUnit setup deleted for brevity>{noformat}


was (Author: stoty):
I made quite a bit of progress, but I'm a bit stuck now.

On the server side a ThreadLocal seems to work well, as each scan has a dedicated thread.

However, on the client side there is no dedicated thread for the query, some expressions (at least standard upserts) are evaluated in the application thread, so a ThreadLocal set from the Connection cannot solve the problem there.

My current WIP patch at [https://github.com/apache/phoenix/pull/1504/commits/4a0994078933b7b6b80b7022aed48c522c747966] sets the ThreadLocal when the connection is created, but this is not a workable solution.

The main problem is that the Expression and PType system is designed to be a proper functional system, without access to globals like Connection, or Statement. I was able to push the necessary context to some expression nodes, but a lot of the conversions happen in the PDate.toObject() ... methods. PDate.INSTANCE is designed to be a singleton.

I am looking at the following solutions:

- Setting and unsetting a ThreadLocal with the context at the JDBC entry/exit points as needed.

- Adding an extra StatementContext parameter to most methods in the Expression objects, so that we eventually have the context when PDate.toObject() is called. – huge volume of changes, and very ugly

- Pushing the StatementContext to every Expression object created, and using that when calling the PDate.toObject()  methods

- Breaking the rule that the PDataTypes are singletons, and injecting the context to PDataType objects as needed.

I am looking for input on what would be the best of the above, or even better, if you have a more elegant solution.


Fore reference, a client side stack trace of where we need to have ExpressionContext available:


{noformat}
java.lang.NullPointerException
    at org.apache.phoenix.util.ThreadExpressionCtx.parseTimestamp(ThreadExpressionCtx.java:46)
    at org.apache.phoenix.schema.types.PTimestamp.toObject(PTimestamp.java:135)
    at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:187)
    at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:172)
    at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:159)
    at org.apache.phoenix.compile.UpsertCompiler$UpdateColumnCompiler.visit(UpsertCompiler.java:970)
    at org.apache.phoenix.compile.ExpressionCompiler.visit(ExpressionCompiler.java:1)
    at org.apache.phoenix.parse.LiteralParseNode.accept(LiteralParseNode.java:80)
    at org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:840)
    at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:1008)
    at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:1)
    at org.apache.phoenix.jdbc.PhoenixStatement$3.call(PhoenixStatement.java:539)
    at org.apache.phoenix.jdbc.PhoenixStatement$3.call(PhoenixStatement.java:1)
    at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:519)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:507)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:2151)
    at org.apache.phoenix.end2end.CompliantDateTimeIT.testUpsertTZ(CompliantDateTimeIT.java:233)
    <JUnit setup deleted for brevity>{noformat}

> The TimeZone is incorrectly used during writing or reading data
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-5066
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5066
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0, 4.14.1
>            Reporter: Jaanai Zhang
>            Assignee: Istvan Toth
>            Priority: Critical
>             Fix For: 5.3.0
>
>         Attachments: DateTest.java, PHOENIX-5066.4x.v1.patch, PHOENIX-5066.4x.v2.patch, PHOENIX-5066.4x.v3.patch, PHOENIX-5066.master.v1.patch, PHOENIX-5066.master.v2.patch, PHOENIX-5066.master.v3.patch, PHOENIX-5066.master.v4.patch, PHOENIX-5066.master.v5.patch, PHOENIX-5066.master.v6.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have two methods to write data when uses JDBC API.
> #1. Uses _the exceuteUpdate_ method to execute a string that is an upsert SQL.
> #2. Uses the _prepareStatement_ method to set some objects and execute.
> The _string_ data needs to convert to a new object by the schema information of tables. we'll use some date formatters to convert string data to object for Date/Time/Timestamp types when writes data and the formatters are used when reads data as well.
>  
> *Uses default timezone test*
>  Writing 3 records by the different ways.
> {code:java}
> UPSERT INTO date_test VALUES (1,'2018-12-10 15:40:47','2018-12-10 15:40:47','2018-12-10 15:40:47') 
> UPSERT INTO date_test VALUES (2,to_date('2018-12-10 15:40:47'),to_time('2018-12-10 15:40:47'),to_timestamp('2018-12-10 15:40:47'))
> stmt.setInt(1, 3);stmt.setDate(2, date);stmt.setTime(3, time);stmt.setTimestamp(4, ts);
> {code}
> Reading the table by the getObject(getDate/getTime/getTimestamp) methods.
> {code:java}
> 1 | 2018-12-10 | 23:45:07 | 2018-12-10 23:45:07.0 
> 2 | 2018-12-10 | 23:45:07 | 2018-12-10 23:45:07.0 
> 3 | 2018-12-10 | 15:45:07 | 2018-12-10 15:45:07.66 
> {code}
> Reading the table by the getString methods 
> {code:java}
> 1 | 2018-12-10 15:45:07.000 | 2018-12-10 15:45:07.000 | 2018-12-10 15:45:07.000 
> 2 | 2018-12-10 15:45:07.000 | 2018-12-10 15:45:07.000 | 2018-12-10 15:45:07.000 
> 3 | 2018-12-10 07:45:07.660 | 2018-12-10 07:45:07.660 | 2018-12-10 07:45:07.660
> {code}
>  *Uses GMT+8 test*
>  Writing 3 records by the different ways.
> {code:java}
> UPSERT INTO date_test VALUES (1,'2018-12-10 15:40:47','2018-12-10 15:40:47','2018-12-10 15:40:47')
> UPSERT INTO date_test VALUES (2,to_date('2018-12-10 15:40:47'),to_time('2018-12-10 15:40:47'),to_timestamp('2018-12-10 15:40:47'))
> stmt.setInt(1, 3);stmt.setDate(2, date);stmt.setTime(3, time);stmt.setTimestamp(4, ts);
> {code}
> Reading the table by the getObject(getDate/getTime/getTimestamp) methods.
> {code:java}
> 1 | 2018-12-10 | 23:40:47 | 2018-12-10 23:40:47.0 
> 2 | 2018-12-10 | 15:40:47 | 2018-12-10 15:40:47.0 
> 3 | 2018-12-10 | 15:40:47 | 2018-12-10 15:40:47.106 {code}
> Reading the table by the getString methods
> {code:java}
>  1 | 2018-12-10 23:40:47.000 | 2018-12-10 23:40:47.000 | 2018-12-10 23:40:47.000
> 2 | 2018-12-10 15:40:47.000 | 2018-12-10 15:40:47.000 | 2018-12-10 15:40:47.000
> 3 | 2018-12-10 15:40:47.106 | 2018-12-10 15:40:47.106 | 2018-12-10 15:40:47.106
> {code}
>  
> _We_ have a historical problem,  we'll parse the string to Date/Time/Timestamp objects with timezone in #1, which means the actual data is going to be changed when stored in HBase table。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)