You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "WillAyd (via GitHub)" <gi...@apache.org> on 2023/06/30 19:55:41 UTC

[GitHub] [arrow-adbc] WillAyd opened a new pull request, #868: feat(c/driver/postgresql): TimestampTz write

WillAyd opened a new pull request, #868:
URL: https://github.com/apache/arrow-adbc/pull/868

   closes #867 
   
   The lifecycle of setting the session time zone in this isn't great. I think should be handled during connect somehow? But not sure if postgres even supports that reading things like:
   
   https://stackoverflow.com/a/11779621/621736
   
   I think there is also something awry with the release callback for timezone schemas; needs further investigation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #868: feat(c/driver/postgresql): TimestampTz write

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd commented on code in PR #868:
URL: https://github.com/apache/arrow-adbc/pull/868#discussion_r1256108212


##########
c/driver/postgresql/statement.cc:
##########
@@ -255,6 +252,45 @@ struct BindStream {
 
   AdbcStatusCode Prepare(PGconn* conn, const std::string& query,
                          struct AdbcError* error) {
+    // tz-aware timestamps require special handling to set the timezone to UTC
+    // prior to sending over the binary protocol; must be reset after execute
+    for (int64_t col = 0; col < bind_schema->n_children; col++) {
+      if ((bind_schema_fields[col].type == ArrowType::NANOARROW_TYPE_TIMESTAMP) &&
+          (strcmp("", bind_schema_fields[col].timezone))) {
+        has_tz_field = true;
+
+        PGresult* begin_result = PQexec(conn, "BEGIN");

Review Comment:
   As far as autocommit goes I do think you are right about that being murky. I'm not sure how to best resolve that yet either - beginning / ending the transaction here within the `BindStream` class is probably at odds with most of the autocommit stuff being managed via the connection.
   
   Are there tests already for autocommit in the ADBC suite? I was looking for a reference on how those are expected to behave. Wasn't sure if I was overlooking those of if it is something that hasn't been heavily implemented yet



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #868: feat(c/driver/postgresql): TimestampTz write

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #868:
URL: https://github.com/apache/arrow-adbc/pull/868#discussion_r1251226981


##########
c/driver/postgresql/connection.cc:
##########
@@ -1025,6 +1025,18 @@ AdbcStatusCode PostgresConnection::SetOption(const char* key, const char* value,
     }
     return ADBC_STATUS_OK;
   }
+
+  if (std::strcmp(key, "TIME ZONE") == 0) {
+    std::string query = std::string("SET TIME ZONE '") + std::string(value) + "'";

Review Comment:
   Hmm, I think for this case it might be, but we probably are going to need some more intensive refactoring at some point



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #868: feat(c/driver/postgresql): TimestampTz write

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #868:
URL: https://github.com/apache/arrow-adbc/pull/868#discussion_r1250838355


##########
c/driver/postgresql/connection.cc:
##########
@@ -1025,6 +1025,18 @@ AdbcStatusCode PostgresConnection::SetOption(const char* key, const char* value,
     }
     return ADBC_STATUS_OK;
   }
+
+  if (std::strcmp(key, "TIME ZONE") == 0) {
+    std::string query = std::string("SET TIME ZONE '") + std::string(value) + "'";

Review Comment:
   If we do want an option for this, it should be something like `adbc.postgresql.time_zone` based on other drivers
   
   But if we just need it for ingestion, maybe we could do everything inside a transaction, and set/restore the timezone within the transaction?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm merged pull request #868: feat(c/driver/postgresql): TimestampTz write

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm merged PR #868:
URL: https://github.com/apache/arrow-adbc/pull/868


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #868: feat(c/driver/postgresql): TimestampTz write

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd commented on code in PR #868:
URL: https://github.com/apache/arrow-adbc/pull/868#discussion_r1256248116


##########
c/driver/postgresql/statement.cc:
##########
@@ -255,6 +252,45 @@ struct BindStream {
 
   AdbcStatusCode Prepare(PGconn* conn, const std::string& query,
                          struct AdbcError* error) {
+    // tz-aware timestamps require special handling to set the timezone to UTC
+    // prior to sending over the binary protocol; must be reset after execute
+    for (int64_t col = 0; col < bind_schema->n_children; col++) {
+      if ((bind_schema_fields[col].type == ArrowType::NANOARROW_TYPE_TIMESTAMP) &&
+          (strcmp("", bind_schema_fields[col].timezone))) {
+        has_tz_field = true;
+
+        PGresult* begin_result = PQexec(conn, "BEGIN");

Review Comment:
   Any ideas on how to best forward the autocommit setting to the `BindStream` class? Right now I only see that as a private member of the connection, so it is pretty far removed from this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #868: feat(c/driver/postgresql): TimestampTz write

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd commented on code in PR #868:
URL: https://github.com/apache/arrow-adbc/pull/868#discussion_r1251171948


##########
c/driver/postgresql/connection.cc:
##########
@@ -1025,6 +1025,18 @@ AdbcStatusCode PostgresConnection::SetOption(const char* key, const char* value,
     }
     return ADBC_STATUS_OK;
   }
+
+  if (std::strcmp(key, "TIME ZONE") == 0) {
+    std::string query = std::string("SET TIME ZONE '") + std::string(value) + "'";

Review Comment:
   Yea either of those are viable; I think better to start with the latter, though I'm not sure where to manage that lifecycle. Is the `BindStream` struct in statement.cc supposed to be a suitable home for managing a transaction? The type introspection happens there, but I don't know that is the best place to put BEGIN/COMMIT statements. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #868: feat(c/driver/postgresql): TimestampTz write

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #868:
URL: https://github.com/apache/arrow-adbc/pull/868#discussion_r1256195219


##########
c/driver/postgresql/statement.cc:
##########
@@ -255,6 +252,45 @@ struct BindStream {
 
   AdbcStatusCode Prepare(PGconn* conn, const std::string& query,
                          struct AdbcError* error) {
+    // tz-aware timestamps require special handling to set the timezone to UTC
+    // prior to sending over the binary protocol; must be reset after execute
+    for (int64_t col = 0; col < bind_schema->n_children; col++) {
+      if ((bind_schema_fields[col].type == ArrowType::NANOARROW_TYPE_TIMESTAMP) &&
+          (strcmp("", bind_schema_fields[col].timezone))) {
+        has_tz_field = true;
+
+        PGresult* begin_result = PQexec(conn, "BEGIN");

Review Comment:
   ah, I missed that little break at the end.
   
   There are some simple tests that test the options, but not really the semantics.
   
   At least here, I think you 'just' need to check if autocommit is off, if so, there's no need to BEGIN (you're already in a transaction), and you can still commit at the end (~though maybe semantically we shouldn't? This isn't a feature that's normally available in other APIs so I'm not sure what people expect)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #868: feat(c/driver/postgresql): TimestampTz write

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd commented on code in PR #868:
URL: https://github.com/apache/arrow-adbc/pull/868#discussion_r1248279315


##########
c/driver/postgresql/connection.cc:
##########
@@ -1025,6 +1025,18 @@ AdbcStatusCode PostgresConnection::SetOption(const char* key, const char* value,
     }
     return ADBC_STATUS_OK;
   }
+
+  if (std::strcmp(key, "TIME ZONE") == 0) {
+    std::string query = std::string("SET TIME ZONE '") + std::string(value) + "'";

Review Comment:
   Could also set / unset this in the lifecycle of the statement



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #868: feat(c/driver/postgresql): TimestampTz write

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #868:
URL: https://github.com/apache/arrow-adbc/pull/868#discussion_r1256260551


##########
c/driver/postgresql/statement.cc:
##########
@@ -255,6 +252,45 @@ struct BindStream {
 
   AdbcStatusCode Prepare(PGconn* conn, const std::string& query,
                          struct AdbcError* error) {
+    // tz-aware timestamps require special handling to set the timezone to UTC
+    // prior to sending over the binary protocol; must be reset after execute
+    for (int64_t col = 0; col < bind_schema->n_children; col++) {
+      if ((bind_schema_fields[col].type == ArrowType::NANOARROW_TYPE_TIMESTAMP) &&
+          (strcmp("", bind_schema_fields[col].timezone))) {
+        has_tz_field = true;
+
+        PGresult* begin_result = PQexec(conn, "BEGIN");

Review Comment:
   A statement already holds a connection, so we can just add a getter to the PostgresConnection class



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #868: feat(c/driver/postgresql): TimestampTz write

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #868:
URL: https://github.com/apache/arrow-adbc/pull/868#discussion_r1255902661


##########
c/driver/postgresql/statement.cc:
##########
@@ -145,6 +145,9 @@ struct BindStream {
   // XXX: this assumes fixed-length fields only - will need more
   // consideration to deal with variable-length fields
 
+  bool has_tz_field = false;
+  std::string tz_setting;

Review Comment:
   agh, I was going to ask for this to be `std::optional`...but we can't do that



##########
c/driver/postgresql/statement.cc:
##########
@@ -255,6 +252,45 @@ struct BindStream {
 
   AdbcStatusCode Prepare(PGconn* conn, const std::string& query,
                          struct AdbcError* error) {
+    // tz-aware timestamps require special handling to set the timezone to UTC
+    // prior to sending over the binary protocol; must be reset after execute
+    for (int64_t col = 0; col < bind_schema->n_children; col++) {
+      if ((bind_schema_fields[col].type == ArrowType::NANOARROW_TYPE_TIMESTAMP) &&
+          (strcmp("", bind_schema_fields[col].timezone))) {
+        has_tz_field = true;
+
+        PGresult* begin_result = PQexec(conn, "BEGIN");

Review Comment:
   +how does this interact with auto-commit off?



##########
c/driver/postgresql/statement.cc:
##########
@@ -255,6 +252,45 @@ struct BindStream {
 
   AdbcStatusCode Prepare(PGconn* conn, const std::string& query,
                          struct AdbcError* error) {
+    // tz-aware timestamps require special handling to set the timezone to UTC
+    // prior to sending over the binary protocol; must be reset after execute
+    for (int64_t col = 0; col < bind_schema->n_children; col++) {
+      if ((bind_schema_fields[col].type == ArrowType::NANOARROW_TYPE_TIMESTAMP) &&
+          (strcmp("", bind_schema_fields[col].timezone))) {
+        has_tz_field = true;
+
+        PGresult* begin_result = PQexec(conn, "BEGIN");

Review Comment:
   Don't we need to hoist this out of the loop? Or else, only do this if `has_tz_field` was previously false. Otherwise, it looks like we'll try to begin the transaction twice.
   
   (Also: I wonder if we shouldn't always begin/end a transaction when dealing with multiple bind parameters...)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #868: feat(c/driver/postgresql): TimestampTz write

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd commented on code in PR #868:
URL: https://github.com/apache/arrow-adbc/pull/868#discussion_r1256100380


##########
c/driver/postgresql/statement.cc:
##########
@@ -255,6 +252,45 @@ struct BindStream {
 
   AdbcStatusCode Prepare(PGconn* conn, const std::string& query,
                          struct AdbcError* error) {
+    // tz-aware timestamps require special handling to set the timezone to UTC
+    // prior to sending over the binary protocol; must be reset after execute
+    for (int64_t col = 0; col < bind_schema->n_children; col++) {
+      if ((bind_schema_fields[col].type == ArrowType::NANOARROW_TYPE_TIMESTAMP) &&
+          (strcmp("", bind_schema_fields[col].timezone))) {
+        has_tz_field = true;
+
+        PGresult* begin_result = PQexec(conn, "BEGIN");

Review Comment:
   I have the `break` at the end of the loop to make sure this only happens once, but that could be more clearly signaled. Could even just set `has_tz_field` within the loop then check `if (has_tz_field)` right after before potentially starting a transaction



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #868: feat(c/driver/postgresql): TimestampTz write

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #868:
URL: https://github.com/apache/arrow-adbc/pull/868#discussion_r1257307492


##########
c/driver/postgresql/connection.h:
##########
@@ -54,6 +54,7 @@ class PostgresConnection {
   const std::shared_ptr<PostgresTypeResolver>& type_resolver() const {
     return type_resolver_;
   }
+  const bool autocommit() { return autocommit_; }

Review Comment:
   ```suggestion
     bool autocommit() const { return autocommit_; }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] WillAyd commented on pull request #868: feat(c/driver/postgresql): TimestampTz write

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd commented on PR #868:
URL: https://github.com/apache/arrow-adbc/pull/868#issuecomment-1624278653

   Think the polars issue is unrelated


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org