You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "lidavidm (via GitHub)" <gi...@apache.org> on 2023/05/24 19:48:23 UTC

[GitHub] [arrow-adbc] lidavidm opened a new pull request, #707: feat(c/driver/postgresql): handle non-SELECT statements

lidavidm opened a new pull request, #707:
URL: https://github.com/apache/arrow-adbc/pull/707

   Before we tried to infer the query schema before COPY by wrapping it in a "SELECT * FROM (...) LIMIT 0".  This broke if the query was (for example) a CREATE or UPDATE.  Instead, use a prepared statement to infer instead.  If we find that there are no result columns, then execute without the COPY path.
   
   Also, test what happens with "INSERT INTO ... RETURNING" (this works with COPY).
   
   Fixes #701.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on pull request #707: feat(c/driver/postgresql): handle non-SELECT statements

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on PR #707:
URL: https://github.com/apache/arrow-adbc/pull/707#issuecomment-1562880633

   I added Python tests on a whim and found https://github.com/apache/arrow-nanoarrow/issues/203 and https://github.com/apache/arrow/issues/35760 so I'm going to go shave some yaks here first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #707: feat(c/driver/postgresql): handle non-SELECT statements

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #707:
URL: https://github.com/apache/arrow-adbc/pull/707#discussion_r1204803781


##########
c/driver/postgresql/statement.cc:
##########
@@ -683,6 +693,17 @@ AdbcStatusCode PostgresStatement::ExecuteQuery(struct ArrowArrayStream* stream,
       return na_res;
     }
 
+    // If there are no output columns (e.g. a CREATE or UPDATE), then
+    // don't use COPY (which would fail anyways)
+    if (root_type.n_children() == 0) {
+      RAISE_ADBC(ExecuteUpdateQuery(rows_affected, error));
+      struct ArrowSchema schema;
+      std::memset(&schema, 0, sizeof(schema));

Review Comment:
   Mostly just a bad habit of mine since it's consistent between C/C++; {0} is perfectly fine too



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on pull request #707: feat(c/driver/postgresql): handle non-SELECT statements

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on PR #707:
URL: https://github.com/apache/arrow-adbc/pull/707#issuecomment-1562807331

   Which `PQexecParams` are you referring to? The one in `ExecuteUpdateQuery`? I can rearrange the control flow so that we always prepare before calling that.
   
   (We can't use it in `ExecuteQuery` since apparently you can't prepare a query involving COPY.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #707: feat(c/driver/postgresql): handle non-SELECT statements

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #707:
URL: https://github.com/apache/arrow-adbc/pull/707#discussion_r1205422500


##########
c/driver/postgresql/statement.cc:
##########
@@ -649,17 +649,27 @@ AdbcStatusCode PostgresStatement::ExecuteQuery(struct ArrowArrayStream* stream,
     return ADBC_STATUS_INVALID_STATE;
   }
 
-  // 1. Execute the query with LIMIT 0 to get the schema
+  // 1. Prepare the query to get the schema
   {
     // TODO: we should pipeline here and assume this will succeed
-    std::string schema_query = "SELECT * FROM (" + query_ + ") AS ignored LIMIT 0";
-    PGresult* result =
-        PQexecParams(connection_->conn(), query_.c_str(), /*nParams=*/0,
-                     /*paramTypes=*/nullptr, /*paramValues=*/nullptr,
-                     /*paramLengths=*/nullptr, /*paramFormats=*/nullptr, kPgBinaryFormat);
-    if (PQresultStatus(result) != PGRES_TUPLES_OK) {
-      SetError(error, "%s%s", "[libpq] Query was: ", schema_query.c_str());
-      SetError(error, "%s%s", "[libpq] Failed to execute query: could not infer schema: ",
+    PGresult* result = PQprepare(connection_->conn(), /*stmtName=*/"", query_.c_str(),
+                                 /*nParams=*/0, nullptr);
+    if (PQresultStatus(result) != PGRES_COMMAND_OK) {
+      SetError(error, "%s%s", "[libpq] Query was: ", query_.c_str());
+      SetError(error, "%s%s",
+               "[libpq] Failed to execute query: could not infer schema: failed to "
+               "prepare query: ",
+               PQerrorMessage(connection_->conn()));

Review Comment:
   Ah, the old implementation would merge the errors. I'll fix these.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm merged pull request #707: feat(c/driver/postgresql): handle non-SELECT statements

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm merged PR #707:
URL: https://github.com/apache/arrow-adbc/pull/707


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #707: feat(c/driver/postgresql): handle non-SELECT statements

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd commented on code in PR #707:
URL: https://github.com/apache/arrow-adbc/pull/707#discussion_r1204788923


##########
c/driver/postgresql/statement.cc:
##########
@@ -683,6 +693,17 @@ AdbcStatusCode PostgresStatement::ExecuteQuery(struct ArrowArrayStream* stream,
       return na_res;
     }
 
+    // If there are no output columns (e.g. a CREATE or UPDATE), then
+    // don't use COPY (which would fail anyways)
+    if (root_type.n_children() == 0) {
+      RAISE_ADBC(ExecuteUpdateQuery(rows_affected, error));
+      struct ArrowSchema schema;
+      std::memset(&schema, 0, sizeof(schema));

Review Comment:
   Is there any consideration to using memset versus `{0}` during assignment? I think I've seen the latter more often in the code base



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on pull request #707: feat(c/driver/postgresql): handle non-SELECT statements

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on PR #707:
URL: https://github.com/apache/arrow-adbc/pull/707#issuecomment-1562929458

   - Fix SetError calls
   - Use PQexecPrepared
   - Add Python test and work around issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] kou commented on pull request #707: feat(c/driver/postgresql): handle non-SELECT statements

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #707:
URL: https://github.com/apache/arrow-adbc/pull/707#issuecomment-1563488604

   > Which `PQexecParams` are you referring to?
   
   Sorry. I should have mentioned it explicitly.
   
   > The one in `ExecuteUpdateQuery`?
   
   Yes.
   
   > (We can't use it in `ExecuteQuery` since apparently you can't prepare a query involving COPY.)
   
   Oh, I didn't notice it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] kou commented on pull request #707: feat(c/driver/postgresql): handle non-SELECT statements

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #707:
URL: https://github.com/apache/arrow-adbc/pull/707#issuecomment-1563488699

   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] kou commented on a diff in pull request #707: feat(c/driver/postgresql): handle non-SELECT statements

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on code in PR #707:
URL: https://github.com/apache/arrow-adbc/pull/707#discussion_r1204933000


##########
c/driver/postgresql/statement.cc:
##########
@@ -649,17 +649,27 @@ AdbcStatusCode PostgresStatement::ExecuteQuery(struct ArrowArrayStream* stream,
     return ADBC_STATUS_INVALID_STATE;
   }
 
-  // 1. Execute the query with LIMIT 0 to get the schema
+  // 1. Prepare the query to get the schema
   {
     // TODO: we should pipeline here and assume this will succeed
-    std::string schema_query = "SELECT * FROM (" + query_ + ") AS ignored LIMIT 0";
-    PGresult* result =
-        PQexecParams(connection_->conn(), query_.c_str(), /*nParams=*/0,
-                     /*paramTypes=*/nullptr, /*paramValues=*/nullptr,
-                     /*paramLengths=*/nullptr, /*paramFormats=*/nullptr, kPgBinaryFormat);
-    if (PQresultStatus(result) != PGRES_TUPLES_OK) {
-      SetError(error, "%s%s", "[libpq] Query was: ", schema_query.c_str());
-      SetError(error, "%s%s", "[libpq] Failed to execute query: could not infer schema: ",
+    PGresult* result = PQprepare(connection_->conn(), /*stmtName=*/"", query_.c_str(),
+                                 /*nParams=*/0, nullptr);
+    if (PQresultStatus(result) != PGRES_COMMAND_OK) {
+      SetError(error, "%s%s", "[libpq] Query was: ", query_.c_str());
+      SetError(error, "%s%s",
+               "[libpq] Failed to execute query: could not infer schema: failed to "
+               "prepare query: ",
+               PQerrorMessage(connection_->conn()));

Review Comment:
   How about unifying these `SetError()`s to one `SetError()`?
   It seems that the current `SetError()` implementation discards the previous error message. (`[libpq] Query was: ${query}` is discarded.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org