You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/30 08:40:54 UTC

[GitHub] [spark] sadikovi opened a new pull request, #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

sadikovi opened a new pull request, #36726:
URL: https://github.com/apache/spark/pull/36726

### What changes were proposed in this pull request?

This PR adds support for TimestampNTZ (TIMESTAMP WITHOUT TIME ZONE) in JDBC data source. It also introduces a new configuration option `inferTimestampNTZType` which allows to read written timestamps as timestamp without time zone. By default this is set to `false`, i.e. all timestamps are read as legacy timestamp type.

Here is the state of timestamp without time zone support in the built-in dialects:
- H2: timestamp without time zone, seems to map to timestamp type
- Derby: only has timestamp type
- MySQL: only has timestamp type
- Postgres: has timestamp without time zone, which maps to timestamp
- SQL Server: only datetime/datetime2, neither are time zone aware
- Oracle: seems to only have timestamp and timestamp with time zone
- Teradata: similar to Oracle but I could not verify
- DB2: has TIMESTAMP WITHOUT TIME ZONE but I could not make this type work in my test, only TIMESTAMP seems to work

### Why are the changes needed?

Adds support for the new TimestampNTZ type, see https://issues.apache.org/jira/browse/SPARK-35662.

### Does this PR introduce _any_ user-facing change?

JDBC data source is now capable of writing and reading TimestampNTZ types. When reading timestamp values, configuration option `inferTimestampNTZType` allows to infer those values as TIMESTAMP WITHOUT TIME ZONE. By default the option is set to `false` so the behaviour is unchanged and all timestamps are read TIMESTAMP WITH LOCAL TIME ZONE.

### How was this patch tested?

I added a unit test to ensure the general functionality works. I also manually verified the write/read test for TimestampNTZ in the following databases (all I could get access to):
- H2, `jdbc:h2:mem:testdb0`
- Derby, `jdbc:derby:<filepath>`
- MySQL, `docker run --name mysql -e MYSQL_ROOT_PASSWORD=secret -e MYSQL_DATABASE=db -e MYSQL_USER=user -e MYSQL_PASSWORD=secret -p 3306:3306 -d mysql:5.7`, `jdbc:mysql://127.0.0.1:3306/db?user=user&password=secret`
- PostgreSQL, `docker run -d --name postgres -e POSTGRES_PASSWORD=secret -e POSTGRES_USER=user -e POSTGRES_DB=db -p 5432:5432 postgres:12.11`, `jdbc:postgresql://127.0.0.1:5432/db?user=user&password=secret`
- SQL Server, `docker run -e "ACCEPT_EULA=Y" -e SA_PASSWORD='yourStrong(!)Password' -p 1433:1433 -d mcr.microsoft.com/mssql/server:2019-CU15-ubuntu-20.04`, `jdbc:sqlserver://127.0.0.1:1433;user=sa;password=yourStrong(!)Password`
- DB2, ` docker run -itd --name mydb2 --privileged=true -p 50000:50000 -e LICENSE=accept -e DB2INST1_PASSWORD=secret -e DBNAME=db ibmcom/db2`, `jdbc:db2://127.0.0.1:50000/db:user=db2inst1;password=secret;`.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org