You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "WillAyd (via GitHub)" <gi...@apache.org> on 2023/06/28 17:13:27 UTC

[GitHub] [arrow-adbc] WillAyd opened a new pull request, #861: feat(c/driver/postgresql): Timestamp write support

WillAyd opened a new pull request, #861:
URL: https://github.com/apache/arrow-adbc/pull/861

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #861: feat(c/driver/postgresql): Timestamp write support

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #861:
URL: https://github.com/apache/arrow-adbc/pull/861#discussion_r1245548625


##########
c/validation/adbc_validation_util.cc:
##########
@@ -141,7 +141,16 @@ int MakeSchema(struct ArrowSchema* schema, const std::vector<SchemaField>& field
   CHECK_ERRNO(ArrowSchemaSetTypeStruct(schema, fields.size()));
   size_t i = 0;
   for (const SchemaField& field : fields) {
-    CHECK_ERRNO(ArrowSchemaSetType(schema->children[i], field.type));
+    switch (field.type) {
+      case NANOARROW_TYPE_TIMESTAMP:
+        // TODO: don't hardcode unit here
+        CHECK_ERRNO(ArrowSchemaSetTypeDateTime(schema->children[i], field.type,
+                                               NANOARROW_TIME_UNIT_MICRO,

Review Comment:
   I think this is OK for now but I think what I might end up doing is writing up a set of Googletest matchers so we can write out code like
   
   ```cpp
   ASSERT_THAT(view->children[0], IsTimestamp(/*any unit*/));
   ASSERT_THAT(view->children[0], IsTimestamp(NANOARROW_TIME_UNIT_MICRO));
   ```
   
   or something like that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #861: feat(c/driver/postgresql): Timestamp write support

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #861:
URL: https://github.com/apache/arrow-adbc/pull/861#discussion_r1245589777


##########
c/validation/adbc_validation_util.cc:
##########
@@ -141,7 +141,16 @@ int MakeSchema(struct ArrowSchema* schema, const std::vector<SchemaField>& field
   CHECK_ERRNO(ArrowSchemaSetTypeStruct(schema, fields.size()));
   size_t i = 0;
   for (const SchemaField& field : fields) {
-    CHECK_ERRNO(ArrowSchemaSetType(schema->children[i], field.type));
+    switch (field.type) {
+      case NANOARROW_TYPE_TIMESTAMP:
+        // TODO: don't hardcode unit here
+        CHECK_ERRNO(ArrowSchemaSetTypeDateTime(schema->children[i], field.type,
+                                               NANOARROW_TIME_UNIT_MICRO,

Review Comment:
   Yes, sorry, I realized I had gotten the context completely wrong once I clicked through and started reviewing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #861: feat(c/driver/postgresql): Timestamp write support

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #861:
URL: https://github.com/apache/arrow-adbc/pull/861#discussion_r1246744462


##########
c/validation/adbc_validation.cc:
##########
@@ -1094,6 +1094,83 @@ void StatementTest::TestSqlIngestBinary() {
       NANOARROW_TYPE_BINARY, {std::nullopt, "", "\x00\x01\x02\x04", "\xFE\xFF"}));
 }
 
+template <enum ArrowTimeUnit TU>

Review Comment:
   I need to fix things here, but it should be quite straightforward to also write a matcher like `IsError(&error, ::testing::HasSubstr("..."))`



##########
c/driver/postgresql/statement.cc:
##########
@@ -337,6 +347,53 @@ struct BindStream {
               param_values[col] = const_cast<char*>(view.data.as_char);
               break;
             }
+            case ArrowType::NANOARROW_TYPE_TIMESTAMP: {
+              int64_t val = array_view->children[col]->buffer_views[1].data.as_int64[row];
+              if (strcmp("", bind_schema_fields[col].timezone)) {
+                SetError(error, "%s%" PRIi64 "%s%s%s", "[libpq] Column #", col + 1,
+                         " (\"", PQfname(result, col),
+                         "\") has unsupported type code timestamp with timezone");
+                return ADBC_STATUS_NOT_IMPLEMENTED;
+              }
+
+              // 2000-01-01 00:00:00.000000 in microseconds
+              constexpr int64_t kPostgresTimestampEpoch = 946684800000000;
+              constexpr int64_t kSecOverflowLimit = 9223372036854;
+              constexpr int64_t kmSecOverflowLimit = 9223372036854775;
+
+              auto unit = bind_schema_fields[col].time_unit;
+              switch (unit) {
+                case NANOARROW_TIME_UNIT_SECOND:
+                  if (abs(val) > kSecOverflowLimit) {
+                    SetError(error, "%s%" PRId64 "%s%s%s%" PRId64 "%s", "[libpq] Field #",

Review Comment:
   FWIW, why not have the format string be `"[libpq] Field #" PRId64 ...`?



##########
c/validation/adbc_validation.cc:
##########
@@ -1094,6 +1094,83 @@ void StatementTest::TestSqlIngestBinary() {
       NANOARROW_TYPE_BINARY, {std::nullopt, "", "\x00\x01\x02\x04", "\xFE\xFF"}));
 }
 
+template <enum ArrowTimeUnit TU>

Review Comment:
   It might be easiest to have the signature be `AdbcStatusCode TestSqlIngestTemporalType(..., struct AdbcError* error)` and then you can use the usual `ASSERT_THAT(..., IsOkStatus(&error))` 



##########
c/validation/adbc_validation.cc:
##########
@@ -1094,6 +1094,83 @@ void StatementTest::TestSqlIngestBinary() {
       NANOARROW_TYPE_BINARY, {std::nullopt, "", "\x00\x01\x02\x04", "\xFE\xFF"}));
 }
 
+template <enum ArrowTimeUnit TU>

Review Comment:
   This seems reasonable as is though



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm merged pull request #861: feat(c/driver/postgresql): Timestamp write support

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm merged PR #861:
URL: https://github.com/apache/arrow-adbc/pull/861


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #861: feat(c/driver/postgresql): Timestamp write support

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.

WillAyd commented on code in PR #861:
URL: https://github.com/apache/arrow-adbc/pull/861#discussion_r1245533292


##########
c/validation/adbc_validation_util.cc:
##########
@@ -141,7 +141,16 @@ int MakeSchema(struct ArrowSchema* schema, const std::vector<SchemaField>& field
   CHECK_ERRNO(ArrowSchemaSetTypeStruct(schema, fields.size()));
   size_t i = 0;
   for (const SchemaField& field : fields) {
-    CHECK_ERRNO(ArrowSchemaSetType(schema->children[i], field.type));
+    switch (field.type) {
+      case NANOARROW_TYPE_TIMESTAMP:
+        // TODO: don't hardcode unit here
+        CHECK_ERRNO(ArrowSchemaSetTypeDateTime(schema->children[i], field.type,
+                                               NANOARROW_TIME_UNIT_MICRO,

Review Comment:
   `SchemaFields` are an invention of the `adbc_validation_util` header - is there a reason we use this custom struct instead of an `ArrowSchemaView`? I'm not sure if its worth adding `time_unit` to the former or porting over to the latter in the test suite



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #861: feat(c/driver/postgresql): Timestamp write support

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.

WillAyd commented on code in PR #861:
URL: https://github.com/apache/arrow-adbc/pull/861#discussion_r1245574579


##########
c/validation/adbc_validation_util.cc:
##########
@@ -141,7 +141,16 @@ int MakeSchema(struct ArrowSchema* schema, const std::vector<SchemaField>& field
   CHECK_ERRNO(ArrowSchemaSetTypeStruct(schema, fields.size()));
   size_t i = 0;
   for (const SchemaField& field : fields) {
-    CHECK_ERRNO(ArrowSchemaSetType(schema->children[i], field.type));
+    switch (field.type) {
+      case NANOARROW_TYPE_TIMESTAMP:
+        // TODO: don't hardcode unit here
+        CHECK_ERRNO(ArrowSchemaSetTypeDateTime(schema->children[i], field.type,
+                                               NANOARROW_TIME_UNIT_MICRO,

Review Comment:
   Ah OK cool - will try that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #861: feat(c/driver/postgresql): Timestamp write support

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.

WillAyd commented on code in PR #861:
URL: https://github.com/apache/arrow-adbc/pull/861#discussion_r1245579088


##########
c/driver/postgresql/statement.cc:
##########
@@ -605,6 +634,9 @@ AdbcStatusCode PostgresStatement::CreateBulkTable(
       case ArrowType::NANOARROW_TYPE_BINARY:
         create += " BYTEA";
         break;
+      case ArrowType::NANOARROW_TYPE_TIMESTAMP:

Review Comment:
   Makes sense. I mistakenly assumed with TIMEZONE was a different type. For this PR planning to just raise if a timezone is detected, as I'm not yet sure how to transmit that information via the binary protocol



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #861: feat(c/driver/postgresql): Timestamp write support

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.

WillAyd commented on code in PR #861:
URL: https://github.com/apache/arrow-adbc/pull/861#discussion_r1245807375


##########
c/driver/postgresql/statement.cc:
##########
@@ -216,6 +216,16 @@ struct BindStream {
           type_id = PostgresTypeId::kBytea;
           param_lengths[i] = 0;
           break;
+        case ArrowType::NANOARROW_TYPE_TIMESTAMP:
+          if (strcmp("", bind_schema_fields[i].timezone)) {

Review Comment:
   I was a bit surprised that nanoarrow assigns an empty string to the `timezone` member when constructing a schema with a timestamp; was expecting it to be nullptr



##########
c/validation/adbc_validation.cc:
##########
@@ -1094,6 +1094,83 @@ void StatementTest::TestSqlIngestBinary() {
       NANOARROW_TYPE_BINARY, {std::nullopt, "", "\x00\x01\x02\x04", "\xFE\xFF"}));
 }
 
+template <enum ArrowTimeUnit TU>

Review Comment:
   The templating here might be over-engineering. I was thinking we could also change the signature to something like:
   
   ```cpp
   void StatementTest::TestSqlIngestTemporalType(
       std::vector<std::optional<int64_t>>& values, 
       enum ArrowTimeUnit unit, const char* timezone) {
   ...
   }
   ```
   
   While nothing else is using that pattern currently, it might be nice for gtest to do something like:
   
   ```cpp
     ASSERT_NO_FATAL_FAILURE(TestSqlIngestTemporalType(
       {std::nullopt, 0, 42}, NANOARROW_TIME_UNIT_SECOND, nullptr
     ));
     EXPECT_FATAL_FAILURE(TestSqlIngestTemporalType(
       {std::nullopt, INT64_MIN, INT64_MAX}, NANOARROW_TIME_UNIT_SECOND, nullptr), 
       "overflow")
     );
     EXPECT_FATAL_FAILURE(TestSqlIngestTemporalType(
       {std::nullopt, 0, 42}, NANOARROW_TIME_UNIT_SECOND, "America/Los Angeles"), 
       "not implemented")
     );
   ```
   
   (N.B. I have no experience with EXPECT_FATAL_FAILURE)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #861: feat(c/driver/postgresql): Timestamp write support

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #861:
URL: https://github.com/apache/arrow-adbc/pull/861#discussion_r1247802450


##########
c/driver/postgresql/statement.cc:
##########
@@ -605,6 +634,9 @@ AdbcStatusCode PostgresStatement::CreateBulkTable(
       case ArrowType::NANOARROW_TYPE_BINARY:
         create += " BYTEA";
         break;
+      case ArrowType::NANOARROW_TYPE_TIMESTAMP:

Review Comment:
   Don't we need to check for timezone here, too?



##########
c/validation/adbc_validation_util.cc:
##########
@@ -141,7 +141,16 @@ int MakeSchema(struct ArrowSchema* schema, const std::vector<SchemaField>& field
   CHECK_ERRNO(ArrowSchemaSetTypeStruct(schema, fields.size()));
   size_t i = 0;
   for (const SchemaField& field : fields) {
-    CHECK_ERRNO(ArrowSchemaSetType(schema->children[i], field.type));
+    switch (field.type) {
+      case NANOARROW_TYPE_TIMESTAMP:
+        // TODO: don't hardcode unit here
+        CHECK_ERRNO(ArrowSchemaSetTypeDateTime(schema->children[i], field.type,
+                                               NANOARROW_TIME_UNIT_MICRO,

Review Comment:
   Is this change still needed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #861: feat(c/driver/postgresql): Timestamp write support

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #861:
URL: https://github.com/apache/arrow-adbc/pull/861#discussion_r1245565413


##########
c/driver/postgresql/statement.cc:
##########
@@ -605,6 +634,9 @@ AdbcStatusCode PostgresStatement::CreateBulkTable(
       case ArrowType::NANOARROW_TYPE_BINARY:
         create += " BYTEA";
         break;
+      case ArrowType::NANOARROW_TYPE_TIMESTAMP:

Review Comment:
   We should probably differentiate between WITH/WITHOUT TIMEZONE?
   
   note that in WITH TIMEZONE, Arrow always stores the underlying value in UTC so there's no need for us to do any time zone math (thankfully!)



##########
c/validation/adbc_validation_util.cc:
##########
@@ -141,7 +141,16 @@ int MakeSchema(struct ArrowSchema* schema, const std::vector<SchemaField>& field
   CHECK_ERRNO(ArrowSchemaSetTypeStruct(schema, fields.size()));
   size_t i = 0;
   for (const SchemaField& field : fields) {
-    CHECK_ERRNO(ArrowSchemaSetType(schema->children[i], field.type));
+    switch (field.type) {
+      case NANOARROW_TYPE_TIMESTAMP:
+        // TODO: don't hardcode unit here
+        CHECK_ERRNO(ArrowSchemaSetTypeDateTime(schema->children[i], field.type,
+                                               NANOARROW_TIME_UNIT_MICRO,

Review Comment:
   Ah, for here...you could also just write out the code to build the schema by hand? This was meant as a convenience to cut down on boilerplate for simple schemas. But it was from a very early nanoarrow version, and nowadays I'm not sure how much boilerplate it actually saves compared to just writing it out.



##########
c/driver/postgresql/statement.cc:
##########
@@ -337,6 +341,31 @@ struct BindStream {
               param_values[col] = const_cast<char*>(view.data.as_char);
               break;
             }
+            case ArrowType::NANOARROW_TYPE_TIMESTAMP: {
+              int64_t val = array_view->children[col]->buffer_views[1].data.as_int64[row];
+              auto unit = bind_schema_fields[col].time_unit;
+
+              // TODO: maybe upstream to nanoarrow as ArrowTimeUnitGetMultiplier
+              switch (unit) {
+                case NANOARROW_TIME_UNIT_SECOND:
+                  val *= 1000000;
+                  break;
+                case NANOARROW_TIME_UNIT_MILLI:
+                  val *= 1000;
+                  break;
+                case NANOARROW_TIME_UNIT_MICRO:
+                  break;
+                case NANOARROW_TIME_UNIT_NANO:
+                  val /= 1000;

Review Comment:
   We should handle truncation/overflow here



##########
c/driver/postgresql/statement.cc:
##########
@@ -216,6 +216,10 @@ struct BindStream {
           type_id = PostgresTypeId::kBytea;
           param_lengths[i] = 0;
           break;
+        case ArrowType::NANOARROW_TYPE_TIMESTAMP:
+          type_id = PostgresTypeId::kTimestamp;

Review Comment:
   We should differentiate between with/without timezone?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #861: feat(c/driver/postgresql): Timestamp write support

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.

WillAyd commented on code in PR #861:
URL: https://github.com/apache/arrow-adbc/pull/861#discussion_r1245568653


##########
c/validation/adbc_validation_util.cc:
##########
@@ -141,7 +141,16 @@ int MakeSchema(struct ArrowSchema* schema, const std::vector<SchemaField>& field
   CHECK_ERRNO(ArrowSchemaSetTypeStruct(schema, fields.size()));
   size_t i = 0;
   for (const SchemaField& field : fields) {
-    CHECK_ERRNO(ArrowSchemaSetType(schema->children[i], field.type));
+    switch (field.type) {
+      case NANOARROW_TYPE_TIMESTAMP:
+        // TODO: don't hardcode unit here
+        CHECK_ERRNO(ArrowSchemaSetTypeDateTime(schema->children[i], field.type,
+                                               NANOARROW_TIME_UNIT_MICRO,

Review Comment:
   That makes sense for matching, though I think we still have a gap with that when creating test data in non-default precisions that needs to be solved



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #861: feat(c/driver/postgresql): Timestamp write support

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #861:
URL: https://github.com/apache/arrow-adbc/pull/861#discussion_r1245547220


##########
c/validation/adbc_validation_util.cc:
##########
@@ -141,7 +141,16 @@ int MakeSchema(struct ArrowSchema* schema, const std::vector<SchemaField>& field
   CHECK_ERRNO(ArrowSchemaSetTypeStruct(schema, fields.size()));
   size_t i = 0;
   for (const SchemaField& field : fields) {
-    CHECK_ERRNO(ArrowSchemaSetType(schema->children[i], field.type));
+    switch (field.type) {
+      case NANOARROW_TYPE_TIMESTAMP:
+        // TODO: don't hardcode unit here
+        CHECK_ERRNO(ArrowSchemaSetTypeDateTime(schema->children[i], field.type,
+                                               NANOARROW_TIME_UNIT_MICRO,

Review Comment:
   I think I originally just wanted something that could be easily stack allocated, versus a nested heap allocated structure. I was also struggling with how to deal with validation of more complex types. Especially in the validation code where you may not always want full exact matches (maybe you don't care about the field name, etc.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org