You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/31 12:33:42 UTC

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10998: ARROW-13684: [C++][Compute] Strftime kernel follow-up

jorisvandenbossche commented on a change in pull request #10998:
URL: https://github.com/apache/arrow/pull/10998#discussion_r699271130



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -835,8 +840,12 @@ const FunctionDoc subsecond_doc{
 const FunctionDoc strftime_doc{
     "Format timestamps according to a format string",
     ("For each input timestamp, emit a formatted string.\n"
-     "The time format string and locale can be set using StrftimeOptions.\n"
-     "An error is returned if the timestamps don't have a defined timezone,\n"
+     "The time format string and locale can be set using StrftimeOptions. "
+     "Output precision of %S (seconds) flag depends on the input timestamp precision. "
+     "Timestamps with second precision are represented as integers while milliseconds, "
+     "microsecond and nanoseconds are represented as fixed floating point numbers with "
+     "3, 6 and 9 decimal places respectively.\n"

Review comment:
       I would also add something about "To obtain integer seconds, cast to timestamp with second resolution" to be explicit about the workaround.

##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -835,8 +840,12 @@ const FunctionDoc subsecond_doc{
 const FunctionDoc strftime_doc{
     "Format timestamps according to a format string",
     ("For each input timestamp, emit a formatted string.\n"
-     "The time format string and locale can be set using StrftimeOptions.\n"
-     "An error is returned if the timestamps don't have a defined timezone,\n"
+     "The time format string and locale can be set using StrftimeOptions. "
+     "Output precision of %S (seconds) flag depends on the input timestamp precision. "
+     "Timestamps with second precision are represented as integers while milliseconds, "
+     "microsecond and nanoseconds are represented as fixed floating point numbers with "
+     "3, 6 and 9 decimal places respectively.\n"
+     "An error is returned if the timestamps don't have a defined timezone,"

Review comment:
       This is no longer up to date

##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -476,11 +476,16 @@ struct Strftime {
   static Result<Strftime> Make(KernelContext* ctx, const DataType& type) {
     const StrftimeOptions& options = StrftimeState::Get(ctx);
 
-    const auto& timezone = GetInputTimezone(type);
+    auto timezone = GetInputTimezone(type);
     if (timezone.empty()) {
-      return Status::Invalid(
-          "Timestamps without a time zone cannot be reliably formatted.");
+      if ((options.format.find("%z") != std::string::npos) ||
+          (options.format.find("%Z") != std::string::npos)) {
+        return Status::Invalid("Timezone not present, cannot convert to string: ",

Review comment:
       ```suggestion
           return Status::Invalid("Timezone not present, cannot convert to string with timezone: ",
   ```

##########
File path: docs/source/cpp/compute.rst
##########
@@ -1094,19 +1094,29 @@ number of input and output types.  The type to cast to can be passed in a
 :struct:`CastOptions` instance.  As an alternative, the same service is
 provided by a concrete function :func:`~arrow::compute::Cast`.
 
-+--------------------------+------------+--------------------+------------------+------------------------------+
-| Function name            | Arity      | Input types        | Output type      | Options class                |
-+==========================+============+====================+==================+==============================+
-| cast                     | Unary      | Many               | Variable         | :struct:`CastOptions`        |
-+--------------------------+------------+--------------------+------------------+------------------------------+
-| strftime                 | Unary      | Timestamp          | String           | :struct:`StrftimeOptions`    |
-+--------------------------+------------+--------------------+------------------+------------------------------+
-| strptime                 | Unary      | String-like        | Timestamp        | :struct:`StrptimeOptions`    |
-+--------------------------+------------+--------------------+------------------+------------------------------+
++-----------------+------------+--------------------+------------------+------------------------------+-------+
+| Function name   | Arity      | Input types        | Output type      | Options class                | Notes |
++=================+============+====================+==================+==============================+=======+
+| cast            | Unary      | Many               | Variable         | :struct:`CastOptions`        |       |
++-----------------+------------+--------------------+------------------+------------------------------+-------+
+| strftime        | Unary      | Timestamp          | String           | :struct:`StrftimeOptions`    | \(1)  |
++-----------------+------------+--------------------+------------------+------------------------------+-------+
+| strptime        | Unary      | String-like        | Timestamp        | :struct:`StrptimeOptions`    |       |
++-----------------+------------+--------------------+------------------+------------------------------+-------+
 
 The conversions available with ``cast`` are listed below.  In all cases, a
 null input value is converted into a null output value.
 
+* \(1) Output precision of seconds (``%S``) flag depends on the input timestamp
+  precision. If the number of seconds can not be exactly represented with seconds,

Review comment:
       It seems you used some explanation about this from https://howardhinnant.github.io/date/date.html#to_stream_formatting, but I would maybe use the same text as you added above in `strftime_doc`, as I think that's clearer / more relevant from Arrow's perspective.

##########
File path: docs/source/cpp/compute.rst
##########
@@ -1094,19 +1094,29 @@ number of input and output types.  The type to cast to can be passed in a
 :struct:`CastOptions` instance.  As an alternative, the same service is
 provided by a concrete function :func:`~arrow::compute::Cast`.
 
-+--------------------------+------------+--------------------+------------------+------------------------------+
-| Function name            | Arity      | Input types        | Output type      | Options class                |
-+==========================+============+====================+==================+==============================+
-| cast                     | Unary      | Many               | Variable         | :struct:`CastOptions`        |
-+--------------------------+------------+--------------------+------------------+------------------------------+
-| strftime                 | Unary      | Timestamp          | String           | :struct:`StrftimeOptions`    |
-+--------------------------+------------+--------------------+------------------+------------------------------+
-| strptime                 | Unary      | String-like        | Timestamp        | :struct:`StrptimeOptions`    |
-+--------------------------+------------+--------------------+------------------+------------------------------+
++-----------------+------------+--------------------+------------------+------------------------------+-------+
+| Function name   | Arity      | Input types        | Output type      | Options class                | Notes |
++=================+============+====================+==================+==============================+=======+
+| cast            | Unary      | Many               | Variable         | :struct:`CastOptions`        |       |
++-----------------+------------+--------------------+------------------+------------------------------+-------+
+| strftime        | Unary      | Timestamp          | String           | :struct:`StrftimeOptions`    | \(1)  |
++-----------------+------------+--------------------+------------------+------------------------------+-------+
+| strptime        | Unary      | String-like        | Timestamp        | :struct:`StrptimeOptions`    |       |
++-----------------+------------+--------------------+------------------+------------------------------+-------+
 
 The conversions available with ``cast`` are listed below.  In all cases, a
 null input value is converted into a null output value.
 
+* \(1) Output precision of seconds (``%S``) flag depends on the input timestamp
+  precision. If the number of seconds can not be exactly represented with seconds,

Review comment:
       I don't think this is accurate? It doesn't depend on the number of seconds present, but solely on the unit of the timestamp type?

##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -836,6 +841,10 @@ const FunctionDoc strftime_doc{
     "Format timestamps according to a format string",
     ("For each input timestamp, emit a formatted string.\n"
      "The time format string and locale can be set using StrftimeOptions.\n"
+     "Output precision of %S (seconds) flag depends on the input timestamp precision. "

Review comment:
       > I don't know if the elaboration about the "%S" is useful to put here
   
   @pitrou since this behaves differently as eg Python and R's version of `strftime`, I think it's important to note this somewhere (but can also be somewhere else, eg in compute.rst as a note below the table. EDIT: I see it's already added there)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org