You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "WillAyd (via GitHub)" <gi...@apache.org> on 2023/05/12 18:58:41 UTC

[GitHub] [arrow-adbc] WillAyd opened a new pull request, #683: refactor(c/driver/postgres): Implement InputIterator for ResultHelper

WillAyd opened a new pull request, #683:
URL: https://github.com/apache/arrow-adbc/pull/683

   ref https://github.com/apache/arrow-adbc/pull/679/files/38805ecae5e73210dffdee8cf2479de9880f8efd#r1191784402


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm merged pull request #683: refactor(c/driver/postgresql): Implement InputIterator for ResultHelper

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm merged PR #683:
URL: https://github.com/apache/arrow-adbc/pull/683


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] WillAyd commented on pull request #683: refactor(c/driver/postgres): Implement InputIterator for ResultHelper

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd commented on PR #683:
URL: https://github.com/apache/arrow-adbc/pull/683#issuecomment-1546295777

   On further review need a few more tweaks here. I think the number of rows from executing the statement needs to be accessible outside the iterator, for cases where we need to pre-allocate an arrow array to a give size


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #683: refactor(c/driver/postgresql): Implement InputIterator for ResultHelper

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd commented on code in PR #683:
URL: https://github.com/apache/arrow-adbc/pull/683#discussion_r1195673238


##########
c/driver/postgresql/connection.cc:
##########
@@ -260,39 +328,39 @@ AdbcStatusCode PostgresConnection::GetTableSchema(const char* catalog,
 
   PqResultHelper result_helper = PqResultHelper{conn_, query.buffer};
   StringBuilderReset(&query);
-  pg_result* result = result_helper.Execute();
-
-  ExecStatusType pq_status = PQresultStatus(result);
-  auto uschema = nanoarrow::UniqueSchema();
 
-  if (pq_status == PGRES_TUPLES_OK) {
-    int num_rows = PQntuples(result);
+  if (result_helper.Status() != PGRES_TUPLES_OK) {
+    SetError(error, "%s%s", "Failed to get table schema: ", PQerrorMessage(conn_));
+    final_status = ADBC_STATUS_IO;
+  } else {
+    auto uschema = nanoarrow::UniqueSchema();
     ArrowSchemaInit(uschema.get());
-    CHECK_NA(INTERNAL, ArrowSchemaSetTypeStruct(uschema.get(), num_rows), error);
+    CHECK_NA(INTERNAL, ArrowSchemaSetTypeStruct(uschema.get(), result_helper.NumRows()),
+             error);
 
     ArrowError na_error;
-    for (int row = 0; row < num_rows; row++) {
-      const char* colname = PQgetvalue(result, row, 0);
+    int row_counter = 0;
+    for (auto row : result_helper) {
+      const char* colname = row[0].data;
       const Oid pg_oid = static_cast<uint32_t>(
-          std::strtol(PQgetvalue(result, row, 1), /*str_end=*/nullptr, /*base=*/10));
+          std::strtol(row[1].data, /*str_end=*/nullptr, /*base=*/10));
 
       PostgresType pg_type;
       if (type_resolver_->Find(pg_oid, &pg_type, &na_error) != NANOARROW_OK) {
-        SetError(error, "%s%d%s%s%s%" PRIu32, "Column #", row + 1, " (\"", colname,
-                 "\") has unknown type code ", pg_oid);
+        SetError(error, "%s%d%s%s%s%" PRIu32, "Column #", row_counter + 1, " (\"",

Review Comment:
   Each row here represents a column in a table:
   
   ```sh
   column_name | type
   ------------|------
   column_a    | INTEGER
   column_b    | VARCHAR
   column_c    | UNKNOWN
   ```
   
   So while the code internally recognizes these as rows from an error reporting perspective it might make more sense for the user to know that the 3rd column of the table they are querying has an unknown type



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #683: refactor(c/driver/postgres): Implement InputIterator for ResultHelper

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd commented on code in PR #683:
URL: https://github.com/apache/arrow-adbc/pull/683#discussion_r1192708283


##########
c/driver/postgresql/connection.cc:
##########
@@ -170,13 +271,11 @@ AdbcStatusCode PostgresConnectionGetObjectsImpl(
 
     PqResultHelper result_helper = PqResultHelper{conn, query.buffer};
     StringBuilderReset(&query);
-    pg_result* result = result_helper.Execute();
 
-    ExecStatusType pq_status = PQresultStatus(result);
-    if (pq_status == PGRES_TUPLES_OK) {
-      int num_rows = PQntuples(result);
-      for (int row = 0; row < num_rows; row++) {
-        const char* db_name = PQgetvalue(result, row, 0);
+    result_helper.Execute();

Review Comment:
   Not sold on the current state management, but essentially it is required to call `Execute` prior to iteration. If we see this class evolving to serve commands that don't return result rows, then that may be OK. Otherwise we might be able to move the `Execute` call into the `begin` method



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on pull request #683: refactor(c/driver/postgresql): Implement InputIterator for ResultHelper

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on PR #683:
URL: https://github.com/apache/arrow-adbc/pull/683#issuecomment-1551702327

   Ah, it should be .5.0.0 now, but it wasn't updated. I can fix that upstream.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #683: refactor(c/driver/postgresql): Implement InputIterator for ResultHelper

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #683:
URL: https://github.com/apache/arrow-adbc/pull/683#discussion_r1196556016


##########
c/driver/postgresql/connection.cc:
##########
@@ -170,13 +241,10 @@ AdbcStatusCode PostgresConnectionGetObjectsImpl(
 
     PqResultHelper result_helper = PqResultHelper{conn, query.buffer};
     StringBuilderReset(&query);
-    pg_result* result = result_helper.Execute();
 
-    ExecStatusType pq_status = PQresultStatus(result);
-    if (pq_status == PGRES_TUPLES_OK) {
-      int num_rows = PQntuples(result);
-      for (int row = 0; row < num_rows; row++) {
-        const char* db_name = PQgetvalue(result, row, 0);
+    if (result_helper.Status() == PGRES_TUPLES_OK) {

Review Comment:
   Ah, I see now.
   
   That's probably not the right code (and we should have error details) + I think that should really pair with the `if (!catalog)` but we can fix that up later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] WillAyd commented on pull request #683: refactor(c/driver/postgresql): Implement InputIterator for ResultHelper

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd commented on PR #683:
URL: https://github.com/apache/arrow-adbc/pull/683#issuecomment-1551630322

   Super strange. Guessing that is an issue with delvewheel or flightsql? Seems unable to find adbc_driver_flightsql.dll.4.0.0?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #683: refactor(c/driver/postgresql): Implement InputIterator for ResultHelper

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd commented on code in PR #683:
URL: https://github.com/apache/arrow-adbc/pull/683#discussion_r1195669887


##########
c/driver/postgresql/connection.cc:
##########
@@ -170,13 +241,10 @@ AdbcStatusCode PostgresConnectionGetObjectsImpl(
 
     PqResultHelper result_helper = PqResultHelper{conn, query.buffer};
     StringBuilderReset(&query);
-    pg_result* result = result_helper.Execute();
 
-    ExecStatusType pq_status = PQresultStatus(result);
-    if (pq_status == PGRES_TUPLES_OK) {
-      int num_rows = PQntuples(result);
-      for (int row = 0; row < num_rows; row++) {
-        const char* db_name = PQgetvalue(result, row, 0);
+    if (result_helper.Status() == PGRES_TUPLES_OK) {

Review Comment:
   I don't think the diff makes it very clear but this returns `ADBC_STATUS_NOT_IMPLEMENTED` when this condition is not satisified (already in existing code)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on pull request #683: refactor(c/driver/postgresql): Implement InputIterator for ResultHelper

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on PR #683:
URL: https://github.com/apache/arrow-adbc/pull/683#issuecomment-1551607181

   Hmm, something seems to have actually broken with the Windows build, though I don't see how it's related here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #683: refactor(c/driver/postgres): Implement InputIterator for ResultHelper

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd commented on code in PR #683:
URL: https://github.com/apache/arrow-adbc/pull/683#discussion_r1192837538


##########
c/driver/postgresql/connection.cc:
##########
@@ -260,39 +353,40 @@ AdbcStatusCode PostgresConnection::GetTableSchema(const char* catalog,
 
   PqResultHelper result_helper = PqResultHelper{conn_, query.buffer};
   StringBuilderReset(&query);
-  pg_result* result = result_helper.Execute();
 
-  ExecStatusType pq_status = PQresultStatus(result);
-  auto uschema = nanoarrow::UniqueSchema();
-
-  if (pq_status == PGRES_TUPLES_OK) {
-    int num_rows = PQntuples(result);
+  if (result_helper.Status() != PGRES_TUPLES_OK) {
+    SetError(error, "%s%s", "Failed to get table schema: ", PQerrorMessage(conn_));
+    final_status = ADBC_STATUS_IO;
+  } else {
+    auto uschema = nanoarrow::UniqueSchema();
     ArrowSchemaInit(uschema.get());
-    CHECK_NA(INTERNAL, ArrowSchemaSetTypeStruct(uschema.get(), num_rows), error);
+    CHECK_NA(INTERNAL, ArrowSchemaSetTypeStruct(uschema.get(), result_helper.NumRows()),
+             error);
 
     ArrowError na_error;
-    for (int row = 0; row < num_rows; row++) {
-      const char* colname = PQgetvalue(result, row, 0);
+    int row_counter = 0;
+    for (auto row : result_helper) {
+      auto iter = row.begin();

Review Comment:
   Not sure how idiomatic this is; wanted to make the Row iterator random access so you can pick and choose the records you want to extract, but this may not be very well expressed



##########
c/driver/postgresql/connection.cc:
##########
@@ -35,19 +36,116 @@ static const uint32_t kSupportedInfoCodes[] = {
     ADBC_INFO_DRIVER_VERSION, ADBC_INFO_DRIVER_ARROW_VERSION,
 };
 
+struct PqRecord {
+  const char* data;
+  const int len;
+  const bool is_null;
+};
+
+class PqResultRow {
+ public:
+  PqResultRow(pg_result* result, int row_num) : result_(result), row_num_(row_num) {
+    ncols_ = PQnfields(result);
+  }
+  class iterator {
+    const PqResultRow& outer_;
+    int col_num_ = 0;
+
+   public:
+    explicit iterator(const PqResultRow& outer, int col_num = 0)
+        : outer_(outer), col_num_(col_num) {}
+    iterator& operator++() {
+      col_num_++;
+      return *this;
+    }
+
+    iterator operator++(int) {
+      iterator retval = *this;
+      ++(*this);
+      return retval;
+    }
+
+    iterator operator+=(const int& n) {
+      col_num_ += n;
+      return *this;
+    }
+
+    iterator operator+(const int& n) {
+      iterator tmp(*this);
+      tmp += n;
+      return tmp;
+    }
+
+    PqRecord operator[](const int& n) { return *(*this + n); }
+
+    bool operator==(iterator other) const { return col_num_ == other.col_num_; }

Review Comment:
   Not sure if C++ would require an equality comparison of the containing class as well. Excluded for now to cut down on verbosity but someone with better C++ knowledge likely knows more



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #683: refactor(c/driver/postgres): Implement InputIterator for ResultHelper

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #683:
URL: https://github.com/apache/arrow-adbc/pull/683#discussion_r1194323872


##########
c/driver/postgresql/connection.cc:
##########
@@ -170,24 +234,17 @@ AdbcStatusCode PostgresConnectionGetObjectsImpl(
 
     PqResultHelper result_helper = PqResultHelper{conn, query.buffer};
     StringBuilderReset(&query);
-    pg_result* result = result_helper.Execute();
-
-    ExecStatusType pq_status = PQresultStatus(result);
-    if (pq_status == PGRES_TUPLES_OK) {
-      int num_rows = PQntuples(result);
-      for (int row = 0; row < num_rows; row++) {
-        const char* db_name = PQgetvalue(result, row, 0);
-        CHECK_NA(INTERNAL,
-                 ArrowArrayAppendString(catalog_name_col, ArrowCharView(db_name)), error);
-        if (depth == ADBC_OBJECT_DEPTH_CATALOGS) {
-          CHECK_NA(INTERNAL, ArrowArrayAppendNull(catalog_db_schemas_col, 1), error);
-        } else {
-          return ADBC_STATUS_NOT_IMPLEMENTED;
-        }
-        CHECK_NA(INTERNAL, ArrowArrayFinishElement(array), error);
+
+    for (auto row : result_helper) {

Review Comment:
   Where do we check the status now?



##########
c/driver/postgresql/connection.cc:
##########
@@ -170,24 +234,17 @@ AdbcStatusCode PostgresConnectionGetObjectsImpl(
 
     PqResultHelper result_helper = PqResultHelper{conn, query.buffer};
     StringBuilderReset(&query);
-    pg_result* result = result_helper.Execute();
-
-    ExecStatusType pq_status = PQresultStatus(result);
-    if (pq_status == PGRES_TUPLES_OK) {
-      int num_rows = PQntuples(result);
-      for (int row = 0; row < num_rows; row++) {
-        const char* db_name = PQgetvalue(result, row, 0);
-        CHECK_NA(INTERNAL,
-                 ArrowArrayAppendString(catalog_name_col, ArrowCharView(db_name)), error);
-        if (depth == ADBC_OBJECT_DEPTH_CATALOGS) {
-          CHECK_NA(INTERNAL, ArrowArrayAppendNull(catalog_db_schemas_col, 1), error);
-        } else {
-          return ADBC_STATUS_NOT_IMPLEMENTED;
-        }
-        CHECK_NA(INTERNAL, ArrowArrayFinishElement(array), error);
+
+    for (auto row : result_helper) {

Review Comment:
   might be good to spell out the autos here



##########
c/driver/postgresql/connection.cc:
##########
@@ -19,8 +19,10 @@
 
 #include <cinttypes>
 #include <cstring>
+#include <iterator>

Review Comment:
   Where do we use this header?



##########
c/driver/postgresql/connection.cc:
##########
@@ -35,20 +37,82 @@ static const uint32_t kSupportedInfoCodes[] = {
     ADBC_INFO_DRIVER_VERSION, ADBC_INFO_DRIVER_ARROW_VERSION,
 };
 
+struct PqRecord {

Review Comment:
   I think it's worth documenting these classes at this point



##########
c/driver/postgresql/connection.cc:
##########
@@ -35,19 +37,77 @@ static const uint32_t kSupportedInfoCodes[] = {
     ADBC_INFO_DRIVER_VERSION, ADBC_INFO_DRIVER_ARROW_VERSION,
 };
 
+struct PqRecord {
+  const char* data;
+  const int len;
+  const bool is_null;
+};
+
+class PqResultRow {
+ public:
+  PqResultRow(pg_result* result, int row_num) : result_(result), row_num_(row_num) {
+    ncols_ = PQnfields(result);
+  }
+
+  PqRecord operator[](const int& col_num) {
+    const char* data = PQgetvalue(result_, row_num_, col_num);

Review Comment:
   Maybe an assert?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #683: refactor(c/driver/postgres): Implement InputIterator for ResultHelper

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd commented on code in PR #683:
URL: https://github.com/apache/arrow-adbc/pull/683#discussion_r1193009258


##########
c/driver/postgresql/connection.cc:
##########
@@ -35,19 +37,77 @@ static const uint32_t kSupportedInfoCodes[] = {
     ADBC_INFO_DRIVER_VERSION, ADBC_INFO_DRIVER_ARROW_VERSION,
 };
 
+struct PqRecord {
+  const char* data;
+  const int len;
+  const bool is_null;
+};
+
+class PqResultRow {
+ public:
+  PqResultRow(pg_result* result, int row_num) : result_(result), row_num_(row_num) {
+    ncols_ = PQnfields(result);
+  }
+
+  PqRecord operator[](const int& col_num) {
+    const char* data = PQgetvalue(result_, row_num_, col_num);

Review Comment:
   Since we have ncols_ we could do bounds checking here - wasn't sure if we wanted to throw a C++ exception or need to stick with the arrow error handling



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #683: refactor(c/driver/postgresql): Implement InputIterator for ResultHelper

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #683:
URL: https://github.com/apache/arrow-adbc/pull/683#discussion_r1195072804


##########
c/driver/postgresql/connection.cc:
##########
@@ -170,13 +241,10 @@ AdbcStatusCode PostgresConnectionGetObjectsImpl(
 
     PqResultHelper result_helper = PqResultHelper{conn, query.buffer};
     StringBuilderReset(&query);
-    pg_result* result = result_helper.Execute();
 
-    ExecStatusType pq_status = PQresultStatus(result);
-    if (pq_status == PGRES_TUPLES_OK) {
-      int num_rows = PQntuples(result);
-      for (int row = 0; row < num_rows; row++) {
-        const char* db_name = PQgetvalue(result, row, 0);
+    if (result_helper.Status() == PGRES_TUPLES_OK) {

Review Comment:
   Hmm, I suppose the original code had the same issue, but we should be returning an error if the status here is not OK.



##########
c/driver/postgresql/connection.cc:
##########
@@ -260,39 +328,39 @@ AdbcStatusCode PostgresConnection::GetTableSchema(const char* catalog,
 
   PqResultHelper result_helper = PqResultHelper{conn_, query.buffer};
   StringBuilderReset(&query);
-  pg_result* result = result_helper.Execute();
-
-  ExecStatusType pq_status = PQresultStatus(result);
-  auto uschema = nanoarrow::UniqueSchema();
 
-  if (pq_status == PGRES_TUPLES_OK) {
-    int num_rows = PQntuples(result);
+  if (result_helper.Status() != PGRES_TUPLES_OK) {
+    SetError(error, "%s%s", "Failed to get table schema: ", PQerrorMessage(conn_));
+    final_status = ADBC_STATUS_IO;
+  } else {
+    auto uschema = nanoarrow::UniqueSchema();
     ArrowSchemaInit(uschema.get());
-    CHECK_NA(INTERNAL, ArrowSchemaSetTypeStruct(uschema.get(), num_rows), error);
+    CHECK_NA(INTERNAL, ArrowSchemaSetTypeStruct(uschema.get(), result_helper.NumRows()),
+             error);
 
     ArrowError na_error;
-    for (int row = 0; row < num_rows; row++) {
-      const char* colname = PQgetvalue(result, row, 0);
+    int row_counter = 0;
+    for (auto row : result_helper) {
+      const char* colname = row[0].data;
       const Oid pg_oid = static_cast<uint32_t>(
-          std::strtol(PQgetvalue(result, row, 1), /*str_end=*/nullptr, /*base=*/10));
+          std::strtol(row[1].data, /*str_end=*/nullptr, /*base=*/10));
 
       PostgresType pg_type;
       if (type_resolver_->Find(pg_oid, &pg_type, &na_error) != NANOARROW_OK) {
-        SetError(error, "%s%d%s%s%s%" PRIu32, "Column #", row + 1, " (\"", colname,
-                 "\") has unknown type code ", pg_oid);
+        SetError(error, "%s%d%s%s%s%" PRIu32, "Column #", row_counter + 1, " (\"",

Review Comment:
   Hmm, `Column #` but row_counter? (again this was also in the original code but I just noticed it)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org