You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/04 22:09:00 UTC

[GitHub] [arrow] rok opened a new pull request #10457: [ARROW-12980

rok opened a new pull request #10457:
URL: https://github.com/apache/arrow/pull/10457


   https://issues.apache.org/jira/browse/ARROW-12980


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-878188496


   This is ready for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r651827989



##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1255,6 +1255,78 @@ def test_strptime():
     assert got == expected
 
 
+def _check_datetime_components(timestamps, timezone=None):
+    import pandas as pd
+
+    if timezone:
+        ts = pd.to_datetime(timestamps).tz_localize(timezone).to_series()
+    else:
+        ts = pd.to_datetime(timestamps).to_series()
+
+    tsa = pa.array(ts)
+
+    subseconds = ((ts.dt.microsecond * 10**3 +
+                  ts.dt.nanosecond) * 10**-9).round(9)
+    iso_calendar_fields = [
+        pa.field('iso_year', pa.int64()),
+        pa.field('iso_week', pa.int64()),
+        pa.field('iso_day_of_week', pa.int64())
+    ]
+
+    iso_year = ts.dt.isocalendar()["year"].astype(int)
+    iso_week = ts.dt.isocalendar()["week"].astype(int)
+    iso_day = ts.dt.isocalendar()["day"].astype(int)
+    iso_calendar = pa.StructArray.from_arrays(
+        [iso_year, iso_week, iso_day], fields=iso_calendar_fields)
+
+    assert pc.year(tsa).equals(pa.array(ts.dt.year))
+    assert pc.month(tsa).equals(pa.array(ts.dt.month))
+    assert pc.day(tsa).equals(pa.array(ts.dt.day))
+    assert pc.day_of_week(tsa).equals(pa.array(ts.dt.day_of_week))
+    assert pc.day_of_year(tsa).equals(pa.array(ts.dt.day_of_year))
+    assert pc.iso_year(tsa).equals(pa.array(iso_year))
+    assert pc.iso_week(tsa).equals(pa.array(iso_week))
+    assert pc.iso_calendar(tsa).equals(iso_calendar)
+    assert pc.quarter(tsa).equals(pa.array(ts.dt.quarter))
+    assert pc.hour(tsa).equals(pa.array(ts.dt.hour))
+    assert pc.minute(tsa).equals(pa.array(ts.dt.minute))
+    assert pc.second(tsa).equals(pa.array(ts.dt.second.values))
+    assert pc.millisecond(tsa).equals(pa.array(ts.dt.microsecond // 10**3))
+    assert pc.microsecond(tsa).equals(pa.array(ts.dt.microsecond % 10**3))
+    assert pc.nanosecond(tsa).equals(pa.array(ts.dt.nanosecond))
+    assert pc.subsecond(tsa).equals(pa.array(subseconds))
+
+
+@pytest.mark.pandas
+def test_extract_datetime_components():
+    # TODO: see https://github.com/pandas-dev/pandas/issues/41834
+    # "1899-01-01T00:59:20.001001001"

Review comment:
       Removed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-876821558


   @jorisvandenbossche @pitrou I'm going to look into the R issues but other than that this is ready for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-899579195


   @pitrou thanks for the review and the changes - those look much better!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10457: [ARROW-12980

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-855033521


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW
   
   Opening JIRAs ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r657533421



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -143,39 +142,204 @@ TEST(ScalarTemporalTest, TestTemporalComponentExtractionWithDifferentUnits) {
     CheckScalarUnary("quarter", unit, times, int64(), quarter);
     CheckScalarUnary("hour", unit, times, int64(), hour);
     CheckScalarUnary("minute", unit, times, int64(), minute);
-    CheckScalarUnary("second", unit, times, float64(), second);
+    CheckScalarUnary("second", unit, times, int64(), second);
     CheckScalarUnary("millisecond", unit, times, int64(), zeros);
     CheckScalarUnary("microsecond", unit, times, int64(), zeros);
     CheckScalarUnary("nanosecond", unit, times, int64(), zeros);
     CheckScalarUnary("subsecond", unit, times, float64(), zeros);
   }
 }
 
-TEST(ScalarTemporalTest, TestZonedTemporalComponentExtraction) {
-  std::string timezone = "Asia/Kolkata";
-  const char* times = R"(["1970-01-01T00:00:59", null])";
+TEST(ScalarTemporalTest, TestOutsideNanosecondRange) {
+  const char* times = R"(["1677-09-20T00:00:59.123456", "2262-04-13T23:23:23.999999"])";
 
-  for (auto u : internal::AllTimeUnits()) {
-    auto unit = timestamp(u, timezone);
-    auto timestamps = ArrayFromJSON(unit, times);
-
-    ASSERT_RAISES(NotImplemented, Year(timestamps));
-    ASSERT_RAISES(NotImplemented, Month(timestamps));
-    ASSERT_RAISES(NotImplemented, Day(timestamps));
-    ASSERT_RAISES(NotImplemented, DayOfWeek(timestamps));
-    ASSERT_RAISES(NotImplemented, DayOfYear(timestamps));
-    ASSERT_RAISES(NotImplemented, ISOYear(timestamps));
-    ASSERT_RAISES(NotImplemented, ISOWeek(timestamps));
-    ASSERT_RAISES(NotImplemented, ISOCalendar(timestamps));
-    ASSERT_RAISES(NotImplemented, Quarter(timestamps));
-    ASSERT_RAISES(NotImplemented, Hour(timestamps));
-    ASSERT_RAISES(NotImplemented, Minute(timestamps));
-    ASSERT_RAISES(NotImplemented, Second(timestamps));
-    ASSERT_RAISES(NotImplemented, Millisecond(timestamps));
-    ASSERT_RAISES(NotImplemented, Microsecond(timestamps));
-    ASSERT_RAISES(NotImplemented, Nanosecond(timestamps));
-    ASSERT_RAISES(NotImplemented, Subsecond(timestamps));
-  }
+  auto unit = timestamp(TimeUnit::MICRO);
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+
+  auto year = "[1677, 2262]";
+  auto month = "[9, 4]";
+  auto day = "[20, 13]";
+  auto day_of_week = "[0, 6]";
+  auto day_of_year = "[263, 103]";
+  auto iso_year = "[1677, 2262]";
+  auto iso_week = "[38, 15]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1677, "iso_week": 38, "iso_day_of_week": 1},
+                          {"iso_year": 2262, "iso_week": 15, "iso_day_of_week": 7}])");
+  auto quarter = "[3, 2]";
+  auto hour = "[0, 23]";
+  auto minute = "[0, 23]";
+  auto second = "[59, 23]";
+  auto millisecond = "[123, 999]";
+  auto microsecond = "[456, 999]";
+  auto nanosecond = "[0, 0]";
+  auto subsecond = "[0.123456, 0.999999]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+#ifndef _WIN32

Review comment:
       I may misunderstand, but currently we don't have a folder which we bundle with the C++ libraries, though we could certainly bundle certain files into the binaries. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-858486248


   > Localize kernel would be great. I suppose we'd need a scalar and a vector one depending if timezone is shared between rows or not?
   
   I think a scalar kernel is the most important (with signature like `localize(array[timestamp], string tz) -> array[timestamp, tz]`, where array can also be chunked array or scalar of course). Mixed timezones is a rather corner case (and in principle also could be done as a scalar kernel if the second `tz` argument is also an array). I think ARROW-5912 is more a bug in the python->arrow conversion code, and not necessarily related / fixable by a localize kernel.
   
   I opened https://issues.apache.org/jira/browse/ARROW-13033 for this.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r651034869



##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1212,6 +1212,80 @@ def test_strptime():
     assert got == expected
 
 
+def _check_datetime_components(ts):
+    tsa = [pa.array(ts)]
+    subseconds = ((ts.microsecond * 10**3 + ts.nanosecond) *
+                  10**-9).values.round(9)
+    iso_calendar_fields = [
+        pa.field('iso_year', pa.int64()),
+        pa.field('iso_week', pa.int64()),
+        pa.field('iso_day_of_week', pa.int64())
+    ]
+    iso_calendar = pa.StructArray.from_arrays([
+        ts.isocalendar()["year"].astype(int),
+        ts.isocalendar()["week"].astype(int),
+        ts.isocalendar()["day"].astype(int)],
+        fields=iso_calendar_fields)
+
+    assert pc.call_function("year", tsa).equals(pa.array(ts.year))

Review comment:
       You can directly use `pc.year(..)` instead of `pc.call_function("year", ..)` (and the same for the others below)

##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1212,6 +1212,80 @@ def test_strptime():
     assert got == expected
 
 
+def _check_datetime_components(ts):
+    tsa = [pa.array(ts)]
+    subseconds = ((ts.microsecond * 10**3 + ts.nanosecond) *
+                  10**-9).values.round(9)
+    iso_calendar_fields = [
+        pa.field('iso_year', pa.int64()),
+        pa.field('iso_week', pa.int64()),
+        pa.field('iso_day_of_week', pa.int64())
+    ]
+    iso_calendar = pa.StructArray.from_arrays([
+        ts.isocalendar()["year"].astype(int),
+        ts.isocalendar()["week"].astype(int),
+        ts.isocalendar()["day"].astype(int)],
+        fields=iso_calendar_fields)
+
+    assert pc.call_function("year", tsa).equals(pa.array(ts.year))
+    assert pc.call_function("month", tsa).equals(pa.array(ts.month))
+    assert pc.call_function("day", tsa).equals(pa.array(ts.day))
+    assert pc.call_function("day_of_week", tsa).equals(
+        pa.array(ts.day_of_week))
+    assert pc.call_function("day_of_year", tsa).equals(
+        pa.array(ts.day_of_year))
+    assert pc.call_function("iso_year", tsa).equals(
+        pa.array(ts.isocalendar()["year"].astype(int)))
+    assert pc.call_function("iso_week", tsa).equals(
+        pa.array(ts.isocalendar()["week"].astype(int)))
+    assert pc.call_function("iso_calendar", tsa).equals(iso_calendar)
+    assert pc.call_function("quarter", tsa).equals(pa.array(ts.quarter))
+    assert pc.call_function("hour", tsa).equals(pa.array(ts.hour))
+    assert pc.call_function("minute", tsa).equals(pa.array(ts.minute))
+    assert pc.call_function("second", tsa).equals(pa.array(ts.second.values))
+    assert pc.call_function("millisecond", tsa).equals(
+        pa.array(ts.microsecond // 10**3))
+    assert pc.call_function("microsecond", tsa).equals(
+        pa.array(ts.microsecond % 10**3))
+    assert pc.call_function("nanosecond", tsa).equals(pa.array(ts.nanosecond))
+    assert pc.call_function("subsecond", tsa).equals(pa.array(subseconds))
+
+
+@pytest.mark.pandas
+def test_extract_datetime_components():
+    import pandas as pd
+
+    # TODO: see https://github.com/pandas-dev/pandas/issues/41834
+    # "1899-01-01T00:59:20.001001001"
+    timestamps = ["1970-01-01T00:00:59.123456789",
+                  "2000-02-29T23:23:23.999999999",
+                  "2033-05-18T03:33:20.000000000",
+                  "2020-01-01T01:05:05.001",
+                  "2019-12-31T02:10:10.002",
+                  "2019-12-30T03:15:15.003",
+                  "2009-12-31T04:20:20.004132",
+                  "2010-01-01T05:25:25.005321",
+                  "2010-01-03T06:30:30.006163",
+                  "2010-01-04T07:35:35",
+                  "2006-01-01T08:40:40",
+                  "2005-12-31T09:45:45",
+                  "2008-12-28",
+                  "2008-12-29",
+                  "2012-01-01 01:02:03"]
+    timezones = ["US/Central", "Pacific/Marquesas", "Asia/Kolkata",
+                 "Etc/GMT-4", "Etc/GMT+4", "Pacific/Marquesas",
+                 "Australia/Broken_Hill"]
+
+    # Test timezone naive timestamp array
+    ts = pd.to_datetime(timestamps)
+    _check_datetime_components(ts)
+
+    # Test timezone aware timestamp array
+    for timezone in timezones:
+        ts = pd.to_datetime(timestamps).tz_localize("UTC").tz_convert(timezone)

Review comment:
       ```suggestion
           ts = pd.to_datetime(timestamps).tz_localize(timezone)
   ```
   
   Otherwise the "local" timestamps might change, and some of the corner cases might be shifted to another day.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-867658423


   > I think we mean the same with 3. and 5.?
   
   No, option 3 will download the database at runtime (first time, or if there is a more recent version, AFAIU) to a default location. For option 5 I mean that I want to say at runtime: "the tzdata can be found at this location" (in case you already downloaded it, or want to test against a custom version).
   
   > By default we have USE_OS_TZDB=1 on non-windows and USE_OS_TZDB=1 on windows
   
   Ah, OK, I didn't see that since that wasn't in the diff. That sounds good then. 
   (BTW, I think it will then always use the OS supplied version, and never try to download, based on https://howardhinnant.github.io/date/tz.html#Installation)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r673255609



##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1427,9 +1428,20 @@ def test_extract_datetime_components():
                   "2008-12-28",
                   "2008-12-29",
                   "2012-01-01 01:02:03"]
+    timezones = ["UTC", "US/Central", "Asia/Kolkata",
+                 "Etc/GMT-4", "Etc/GMT+4", "Australia/Broken_Hill"]
 
+    # Test timezone naive timestamp array
     _check_datetime_components(timestamps)
 
+    # Test timezone aware timestamp array
+    if sys.platform == 'win32':
+        # TODO: We should test on windows once ARROW-13168 is resolved.
+        pytest.skip('Timezone database is not available on Windows yet')
+    else:
+        for timezone in timezones:
+            _check_datetime_components(timestamps, timezone)

Review comment:
       Right, it should trip [this check](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_temporal.cc#L81).
   Let me check some more.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r673573050



##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1427,9 +1428,20 @@ def test_extract_datetime_components():
                   "2008-12-28",
                   "2008-12-29",
                   "2012-01-01 01:02:03"]
+    timezones = ["UTC", "US/Central", "Asia/Kolkata",
+                 "Etc/GMT-4", "Etc/GMT+4", "Australia/Broken_Hill"]
 
+    # Test timezone naive timestamp array
     _check_datetime_components(timestamps)
 
+    # Test timezone aware timestamp array
+    if sys.platform == 'win32':
+        # TODO: We should test on windows once ARROW-13168 is resolved.
+        pytest.skip('Timezone database is not available on Windows yet')
+    else:
+        for timezone in timezones:
+            _check_datetime_components(timestamps, timezone)

Review comment:
       It turned out that `DayOfWeekExec` was not passing the timezone to the `DayOfWeek` kernel. Should be fixed now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r673306462



##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1427,9 +1428,20 @@ def test_extract_datetime_components():
                   "2008-12-28",
                   "2008-12-29",
                   "2012-01-01 01:02:03"]
+    timezones = ["UTC", "US/Central", "Asia/Kolkata",
+                 "Etc/GMT-4", "Etc/GMT+4", "Australia/Broken_Hill"]
 
+    # Test timezone naive timestamp array
     _check_datetime_components(timestamps)
 
+    # Test timezone aware timestamp array
+    if sys.platform == 'win32':
+        # TODO: We should test on windows once ARROW-13168 is resolved.
+        pytest.skip('Timezone database is not available on Windows yet')
+    else:
+        for timezone in timezones:
+            _check_datetime_components(timestamps, timezone)

Review comment:
       Timezone is now passed properly but a weekday issue appeared. Let me see.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-881339418


   @jorisvandenbossche @pitrou ping :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r688205633



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -216,5 +354,6 @@ TEST_F(ScalarTemporalTest, DayOfWeek) {
                 DayOfWeek(timestamps, DayOfWeekOptions(/*one_based_numbering=*/false,
                                                        /*week_start=*/8)));
 }
+#endif

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r651805036



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -143,39 +142,203 @@ TEST(ScalarTemporalTest, TestTemporalComponentExtractionWithDifferentUnits) {
     CheckScalarUnary("quarter", unit, times, int64(), quarter);
     CheckScalarUnary("hour", unit, times, int64(), hour);
     CheckScalarUnary("minute", unit, times, int64(), minute);
-    CheckScalarUnary("second", unit, times, float64(), second);
+    CheckScalarUnary("second", unit, times, int64(), second);
     CheckScalarUnary("millisecond", unit, times, int64(), zeros);
     CheckScalarUnary("microsecond", unit, times, int64(), zeros);
     CheckScalarUnary("nanosecond", unit, times, int64(), zeros);
     CheckScalarUnary("subsecond", unit, times, float64(), zeros);
   }
 }
 
-TEST(ScalarTemporalTest, TestZonedTemporalComponentExtraction) {
-  std::string timezone = "Asia/Kolkata";
-  const char* times = R"(["1970-01-01T00:00:59", null])";
+TEST(ScalarTemporalTest, TestZoned1) {
+  const char* times =
+      R"(["1970-01-01T00:00:59.123456789","2000-02-29T23:23:23.999999999",
+          "1899-01-01T00:59:20.001001001","2033-05-18T03:33:20.000000000",
+          "2020-01-01T01:05:05.001", "2019-12-31T02:10:10.002",
+          "2019-12-30T03:15:15.003", "2009-12-31T04:20:20.004132",
+          "2010-01-01T05:25:25.005321", "2010-01-03T06:30:30.006163",
+          "2010-01-04T07:35:35", "2006-01-01T08:40:40", "2005-12-31T09:45:45",
+          "2008-12-28", "2008-12-29", "2012-01-01 01:02:03", null])";
+  auto unit = timestamp(TimeUnit::NANO, "Pacific/Marquesas");
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+  auto year =
+      "[1969, 2000, 1898, 2033, 2019, 2019, 2019, 2009, 2009, 2010, 2010, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto month = "[12, 2, 12, 5, 12, 12, 12, 12, 12, 1, 1, 12, 12, 12, 12, 12, null]";
+  auto day = "[31, 29, 31, 17, 31, 30, 29, 30, 31, 2, 3, 31, 31, 27, 28, 31, null]";
+  auto day_of_week = "[2, 1, 5, 1, 1, 0, 6, 2, 3, 5, 6, 5, 5, 5, 6, 5, null]";
+  auto day_of_year =
+      "[365, 60, 365, 137, 365, 364, 363, 364, 365, 2, 3, 365, 365, 362, 363, 365, null]";
+  auto iso_year =
+      "[1970, 2000, 1898, 2033, 2020, 2020, 2019, 2009, 2009, 2009, 2009, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto iso_week = "[1, 9, 52, 20, 1, 1, 52, 53, 53, 53, 53, 52, 52, 52, 52, 52, null]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 3},
+                        {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 2},
+                        {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2019, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 3},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 6},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 6}, null])");
+  auto quarter = "[4, 1, 4, 2, 4, 4, 4, 4, 4, 1, 1, 4, 4, 4, 4, 4, null]";
+  auto hour = "[14, 13, 15, 18, 15, 16, 17, 18, 19, 21, 22, 23, 0, 14, 14, 15, null]";
+  auto minute = "[30, 53, 41, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+  auto second = "[59, 23, 20, 20, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 0, 3, null]";
+  auto millisecond = "[123, 999, 1, 0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0, null]";
+  auto microsecond = "[456, 999, 1, 0, 0, 0, 0, 132, 321, 163, 0, 0, 0, 0, 0, 0, null]";
+  auto nanosecond = "[789, 999, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null]";
+  auto subsecond =
+      "[0.123456789, 0.999999999, 0.001001001, 0.0, 0.001, 0.002, 0.003, 0.004132, "
+      "0.005321, 0.006163, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, null]";
 
-  for (auto u : internal::AllTimeUnits()) {
-    auto unit = timestamp(u, timezone);
-    auto timestamps = ArrayFromJSON(unit, times);
-
-    ASSERT_RAISES(Invalid, Year(timestamps));
-    ASSERT_RAISES(Invalid, Month(timestamps));
-    ASSERT_RAISES(Invalid, Day(timestamps));
-    ASSERT_RAISES(Invalid, DayOfWeek(timestamps));
-    ASSERT_RAISES(Invalid, DayOfYear(timestamps));
-    ASSERT_RAISES(Invalid, ISOYear(timestamps));
-    ASSERT_RAISES(Invalid, ISOWeek(timestamps));
-    ASSERT_RAISES(Invalid, ISOCalendar(timestamps));
-    ASSERT_RAISES(Invalid, Quarter(timestamps));
-    ASSERT_RAISES(Invalid, Hour(timestamps));
-    ASSERT_RAISES(Invalid, Minute(timestamps));
-    ASSERT_RAISES(Invalid, Second(timestamps));
-    ASSERT_RAISES(Invalid, Millisecond(timestamps));
-    ASSERT_RAISES(Invalid, Microsecond(timestamps));
-    ASSERT_RAISES(Invalid, Nanosecond(timestamps));
-    ASSERT_RAISES(Invalid, Subsecond(timestamps));
-  }
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+TEST(ScalarTemporalTest, TestZoned2) {
+  const char* times =
+      R"(["1970-01-01T00:00:59.123456789","2000-02-29T23:23:23.999999999",
+          "1899-01-01T00:59:20.001001001","2033-05-18T03:33:20.000000000",
+          "2020-01-01T01:05:05.001", "2019-12-31T02:10:10.002",
+          "2019-12-30T03:15:15.003", "2009-12-31T04:20:20.004132",
+          "2010-01-01T05:25:25.005321", "2010-01-03T06:30:30.006163",
+          "2010-01-04T07:35:35", "2006-01-01T08:40:40", "2005-12-31T09:45:45",
+          "2008-12-28", "2008-12-29", "2012-01-01 01:02:03", null])";
+  auto unit = timestamp(TimeUnit::NANO, "Australia/Broken_Hill");
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+  auto year =
+      "[1970, 2000, 1899, 2033, 2020, 2019, 2019, 2009, 2010, 2010, 2010, 2006, 2005, "
+      "2008, 2008, 2012, null]";
+  auto month = "[1, 3, 1, 5, 1, 12, 12, 12, 1, 1, 1, 1, 12, 12, 12, 1, null]";
+  auto day = "[1, 1, 1, 18, 1, 31, 30, 31, 1, 3, 4, 1, 31, 28, 29, 1, null]";
+  auto day_of_week = "[3, 2, 6, 2, 2, 1, 0, 3, 4, 6, 0, 6, 5, 6, 0, 6, null]";
+  auto day_of_year =
+      "[1, 61, 1, 138, 1, 365, 364, 365, 1, 3, 4, 1, 365, 363, 364, 1, null]";
+  auto iso_year =
+      "[1970, 2000, 1898, 2033, 2020, 2020, 2020, 2009, 2009, 2009, 2010, 2005, 2005, "
+      "2008, 2009, 2011, null]";
+  auto iso_week = "[1, 9, 52, 20, 1, 1, 1, 53, 53, 53, 1, 52, 52, 52, 1, 52, null]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 4},
+                        {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 3},
+                        {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 3},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 3},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 5},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                        {"iso_year": 2010, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2009, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 7}, null])");
+  auto quarter = "[1, 1, 1, 2, 1, 4, 4, 4, 1, 1, 1, 1, 4, 4, 4, 1, null]";
+  auto hour = "[9, 9, 9, 13, 11, 12, 13, 14, 15, 17, 18, 19, 20, 10, 10, 11, null]";
+  auto minute = "[30, 53, 59, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+  auto second = "[59, 23, 20, 20, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 0, 3, null]";
+  auto millisecond = "[123, 999, 1, 0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0, null]";
+  auto microsecond = "[456, 999, 1, 0, 0, 0, 0, 132, 321, 163, 0, 0, 0, 0, 0, 0, null]";
+  auto nanosecond = "[789, 999, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null]";
+  auto subsecond =
+      "[0.123456789, 0.999999999, 0.001001001, 0.0, 0.001, 0.002, 0.003, 0.004132, "
+      "0.005321, 0.006163, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, null]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+TEST(ScalarTemporalTest, TestOverflow) {

Review comment:
       ```suggestion
   TEST(ScalarTemporalTest, TestOutsideNanosecondRange) {
   ```
   
   ?
   
   (if it's not directly clear what the name means, you can always add a comment explaining the test




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche edited a comment on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-858395074


   I think it would be good to write some tests in python as well, as currently the C++ tests are very hard to verify since we don't yet have the ability to parse strings localized in the timezone (I mean: the strings in the tests are interpreted as UTC and not the "Australia/Broken_Hill" timezone. And thus as a result, the expected values can't be read/verified from the strings). 
   (while in python we could create the localized input timestamps with pandas)
   
   Given that, it might also make sense to first add a "localize" kernel for converting timestamps from naive to a certain timezone.
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r657531497



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -143,39 +142,204 @@ TEST(ScalarTemporalTest, TestTemporalComponentExtractionWithDifferentUnits) {
     CheckScalarUnary("quarter", unit, times, int64(), quarter);
     CheckScalarUnary("hour", unit, times, int64(), hour);
     CheckScalarUnary("minute", unit, times, int64(), minute);
-    CheckScalarUnary("second", unit, times, float64(), second);
+    CheckScalarUnary("second", unit, times, int64(), second);
     CheckScalarUnary("millisecond", unit, times, int64(), zeros);
     CheckScalarUnary("microsecond", unit, times, int64(), zeros);
     CheckScalarUnary("nanosecond", unit, times, int64(), zeros);
     CheckScalarUnary("subsecond", unit, times, float64(), zeros);
   }
 }
 
-TEST(ScalarTemporalTest, TestZonedTemporalComponentExtraction) {
-  std::string timezone = "Asia/Kolkata";
-  const char* times = R"(["1970-01-01T00:00:59", null])";
+TEST(ScalarTemporalTest, TestOutsideNanosecondRange) {
+  const char* times = R"(["1677-09-20T00:00:59.123456", "2262-04-13T23:23:23.999999"])";
 
-  for (auto u : internal::AllTimeUnits()) {
-    auto unit = timestamp(u, timezone);
-    auto timestamps = ArrayFromJSON(unit, times);
-
-    ASSERT_RAISES(NotImplemented, Year(timestamps));
-    ASSERT_RAISES(NotImplemented, Month(timestamps));
-    ASSERT_RAISES(NotImplemented, Day(timestamps));
-    ASSERT_RAISES(NotImplemented, DayOfWeek(timestamps));
-    ASSERT_RAISES(NotImplemented, DayOfYear(timestamps));
-    ASSERT_RAISES(NotImplemented, ISOYear(timestamps));
-    ASSERT_RAISES(NotImplemented, ISOWeek(timestamps));
-    ASSERT_RAISES(NotImplemented, ISOCalendar(timestamps));
-    ASSERT_RAISES(NotImplemented, Quarter(timestamps));
-    ASSERT_RAISES(NotImplemented, Hour(timestamps));
-    ASSERT_RAISES(NotImplemented, Minute(timestamps));
-    ASSERT_RAISES(NotImplemented, Second(timestamps));
-    ASSERT_RAISES(NotImplemented, Millisecond(timestamps));
-    ASSERT_RAISES(NotImplemented, Microsecond(timestamps));
-    ASSERT_RAISES(NotImplemented, Nanosecond(timestamps));
-    ASSERT_RAISES(NotImplemented, Subsecond(timestamps));
-  }
+  auto unit = timestamp(TimeUnit::MICRO);
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+
+  auto year = "[1677, 2262]";
+  auto month = "[9, 4]";
+  auto day = "[20, 13]";
+  auto day_of_week = "[0, 6]";
+  auto day_of_year = "[263, 103]";
+  auto iso_year = "[1677, 2262]";
+  auto iso_week = "[38, 15]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1677, "iso_week": 38, "iso_day_of_week": 1},
+                          {"iso_year": 2262, "iso_week": 15, "iso_day_of_week": 7}])");
+  auto quarter = "[3, 2]";
+  auto hour = "[0, 23]";
+  auto minute = "[0, 23]";
+  auto second = "[59, 23]";
+  auto millisecond = "[123, 999]";
+  auto microsecond = "[456, 999]";
+  auto nanosecond = "[0, 0]";
+  auto subsecond = "[0.123456, 0.999999]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+#ifndef _WIN32

Review comment:
       We're using `tz.h` library which needs an updated timezone database to correctly handle timezoned timestamps. See [installation instructions](https://howardhinnant.github.io/date/tz.html#Installation).
   
   We have the following options (if I understand correctly) for getting a timezone database:
   1. local (non-windows) OS timezone database  - no work required.
   2. arrow bundled folder - we could bundle the database at build time for windows. Database would slowly go stale.
   3. download it from IANA Time Zone Database at runtime - `tz.h` gets the database at runtime, but curl (and 7-zip on windows) are required.
   4. local user-provided folder - user could provide a location at buildtime. Nice to have.
   
   @kszucs - where would be a good place to implement 2?
   Does 3 look like something we could do?
   
   For now it would probably be best to create a Jira for this and keep these test disabled on windows.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r681043263



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -142,29 +141,168 @@ TEST_F(ScalarTemporalTest, TestTemporalComponentExtractionWithDifferentUnits) {
   }
 }
 
-TEST_F(ScalarTemporalTest, TestZonedTemporalComponentExtraction) {
-  std::string timezone = "Asia/Kolkata";
+TEST_F(ScalarTemporalTest, TestOutsideNanosecondRange) {
+  const char* times = R"(["1677-09-20T00:00:59.123456", "2262-04-13T23:23:23.999999"])";
+
+  auto unit = timestamp(TimeUnit::MICRO);
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+
+  auto year = "[1677, 2262]";
+  auto month = "[9, 4]";
+  auto day = "[20, 13]";
+  auto day_of_week = "[0, 6]";
+  auto day_of_year = "[263, 103]";
+  auto iso_year = "[1677, 2262]";
+  auto iso_week = "[38, 15]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1677, "iso_week": 38, "iso_day_of_week": 1},
+                          {"iso_year": 2262, "iso_week": 15, "iso_day_of_week": 7}])");
+  auto quarter = "[3, 2]";
+  auto hour = "[0, 23]";
+  auto minute = "[0, 23]";
+  auto second = "[59, 23]";
+  auto millisecond = "[123, 999]";
+  auto microsecond = "[456, 999]";
+  auto nanosecond = "[0, 0]";
+  auto subsecond = "[0.123456, 0.999999]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+#ifndef _WIN32
+// TODO: We should test on windows once ARROW-13168 is resolved.
+TEST_F(ScalarTemporalTest, TestZoned1) {
+  auto unit = timestamp(TimeUnit::NANO, "Pacific/Marquesas");

Review comment:
       What happens with a timezone that doesn't exist in the local database? Can you add a test for that?

##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -187,16 +243,28 @@ struct DayOfYear {
 
 template <typename Duration>
 struct ISOYear {
+  explicit ISOYear(const time_zone* tz = nullptr) : tz_(tz) {}
+
   template <typename T, typename Arg0>
-  static T Call(KernelContext*, Arg0 arg, Status*) {
-    const auto t = floor<days>(sys_time<Duration>(Duration{arg}));
+  T Call(KernelContext*, Arg0 arg, Status*) const {
+    if (tz_ == nullptr) {
+      const auto t = floor<days>(sys_time<Duration>(Duration{arg}));
+      auto y = year_month_day{t + days{3}}.year();
+      auto start = sys_time<days>((y - years{1}) / dec / thu[last]) + (mon - thu);
+      if (t < start) {
+        --y;
+      }
+      return static_cast<T>(static_cast<int32_t>(y));
+    }
+    const auto t = floor<days>(tz_->to_local(sys_time<Duration>(Duration{arg})));
     auto y = year_month_day{t + days{3}}.year();
-    auto start = sys_time<days>((y - years{1}) / dec / thu[last]) + (mon - thu);
+    auto start = local_days((y - years{1}) / dec / thu[last]) + (mon - thu);
     if (t < start) {

Review comment:
       It looks like we would like to be able to write:
   ```c++
   template <typename Duration>
   struct ISOYearExtractor {
     template <typename T, typename TimePointConverter, typename DaysConverter>
     static T Call(Duration d, TimePointConverter&& convert_timepoint, DaysConverter&& convert_days) {
       const auto t = floor<days>(convert_timepoint(sys_time<Duration>(d));
       auto y = year_month_day{t + days{3}}.year();
       auto start = convert_days((y - years{1}) / dec / thu[last]) + (mon - thu);
       if (t < start) {
         --y;
       }
       return static_cast<T>(static_cast<int32_t>(y));
     }
   };
   ```
   
   and then something like:
   ```c++
   
   template <typename Duration, template <typename Duration> ExtractorType>
   struct ExtractorExec {
     using Extractor = ExtractorType<Duration>;
   
     explicit ExtractorExec(const time_zone* tz = nullptr) : tz_(tz) {}
   
     template <typename T, typename Arg0>
     T Call(KernelContext*, Arg0 arg, Status*) const {
       const Duration d{arg};
       if (tz_ == nullptr) {
         return Extractor::Call(arg,
             [](sys_time<Duration> t) { return t; },
             [](sys_days d) { return d; });
       }
       return Extractor::Call(arg,
           [this](sys_time<Duration> t) { return tz_->to_local(t); },
           [](sys_days d) { return local_days(d); });
       }
     };
   };
   
   template <typename Duration>
   using ISOYear = ExtractorExec<Duration, ISOYearExtractor>;
   // same for other extractors...
   ```
   

##########
File path: docs/source/cpp/compute.rst
##########
@@ -1056,7 +1056,8 @@ Temporal component extraction
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 These functions extract datetime components (year, month, day, etc) from timestamp type.
-Note: this is currently not supported for timestamps with timezone information.
+Note: supports timestamps with timezone information and will return localized timestamp
+components.

Review comment:
       Rewrite this as (for example): "If the input timestamps have a non-empty timezone, localized timestamp components will be returned".

##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -216,5 +354,6 @@ TEST_F(ScalarTemporalTest, DayOfWeek) {
                 DayOfWeek(timestamps, DayOfWeekOptions(/*one_based_numbering=*/false,
                                                        /*week_start=*/8)));
 }
+#endif

Review comment:
       You should move this up so that it only encloses tests that don't work on Windows.

##########
File path: docs/source/cpp/compute.rst
##########
@@ -1056,7 +1056,8 @@ Temporal component extraction
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 These functions extract datetime components (year, month, day, etc) from timestamp type.
-Note: this is currently not supported for timestamps with timezone information.
+Note: supports timestamps with timezone information and will return localized timestamp
+components.
 
 +--------------------+------------+-------------------+---------------+----------------------------+-------+
 | Function name      | Arity      | Input types       | Output type   | Options class              | Notes |

Review comment:
       By the way, one question: the table below says "temporal" for the input types, but only timestamp inputs are supported, right?

##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -399,6 +533,44 @@ struct ISOCalendar {
   }
 };
 
+template <template <typename...> class Op, typename OutType>
+std::shared_ptr<ScalarFunction> MakeTemporalZoned(std::string name,

Review comment:
       I'm sure you can find a way to share this with `MakeTemporal` below.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r689816847



##########
File path: r/tests/testthat/test-dplyr-lubridate.R
##########
@@ -26,22 +26,13 @@ library(dplyr)
 # TODO: consider reevaluating this workaround after ARROW-12980
 withr::local_timezone("UTC")
 
-test_date <- as.POSIXct("2017-01-01 00:00:12.3456789", tz = "")
+if (tolower(Sys.info()[["sysname"]]) == "windows") {

Review comment:
       This is due to the [timezone db issue on windows](https://issues.apache.org/jira/browse/ARROW-13168). 
   A comment here could state:
   ```
   # TODO: We should test on windows once ARROW-13168 is resolved.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r651290261



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -143,39 +142,202 @@ TEST(ScalarTemporalTest, TestTemporalComponentExtractionWithDifferentUnits) {
     CheckScalarUnary("quarter", unit, times, int64(), quarter);
     CheckScalarUnary("hour", unit, times, int64(), hour);
     CheckScalarUnary("minute", unit, times, int64(), minute);
-    CheckScalarUnary("second", unit, times, float64(), second);
+    CheckScalarUnary("second", unit, times, int64(), second);
     CheckScalarUnary("millisecond", unit, times, int64(), zeros);
     CheckScalarUnary("microsecond", unit, times, int64(), zeros);
     CheckScalarUnary("nanosecond", unit, times, int64(), zeros);
     CheckScalarUnary("subsecond", unit, times, float64(), zeros);
   }
 }
 
-TEST(ScalarTemporalTest, TestZonedTemporalComponentExtraction) {
-  std::string timezone = "Asia/Kolkata";
-  const char* times = R"(["1970-01-01T00:00:59", null])";
+TEST(ScalarTemporalTest, TestZoned1) {
+  const char* times =
+      R"(["1970-01-01T00:00:59.123456789","2000-02-29T23:23:23.999999999",
+          "1899-01-01T00:59:20.001001001","2033-05-18T03:33:20.000000000",
+          "2020-01-01T01:05:05.001", "2019-12-31T02:10:10.002",
+          "2019-12-30T03:15:15.003", "2009-12-31T04:20:20.004132",
+          "2010-01-01T05:25:25.005321", "2010-01-03T06:30:30.006163",
+          "2010-01-04T07:35:35", "2006-01-01T08:40:40", "2005-12-31T09:45:45",
+          "2008-12-28", "2008-12-29", "2012-01-01 01:02:03", null])";
+  auto unit = timestamp(TimeUnit::NANO, "Pacific/Marquesas");
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+  auto year =
+      "[1969, 2000, 1898, 2033, 2019, 2019, 2019, 2009, 2009, 2010, 2010, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto month = "[12, 2, 12, 5, 12, 12, 12, 12, 12, 1, 1, 12, 12, 12, 12, 12, null]";
+  auto day = "[31, 29, 31, 17, 31, 30, 29, 30, 31, 2, 3, 31, 31, 27, 28, 31, null]";
+  auto day_of_week = "[2, 1, 5, 1, 1, 0, 6, 2, 3, 5, 6, 5, 5, 5, 6, 5, null]";
+  auto day_of_year =
+      "[365, 60, 365, 137, 365, 364, 363, 364, 365, 2, 3, 365, 365, 362, 363, 365, null]";
+  auto iso_year =
+      "[1970, 2000, 1898, 2033, 2020, 2020, 2019, 2009, 2009, 2009, 2009, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto iso_week = "[1, 9, 52, 20, 1, 1, 52, 53, 53, 53, 53, 52, 52, 52, 52, 52, null]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 3},
+                        {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 2},
+                        {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2019, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 3},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 6},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 6}, null])");
+  auto quarter = "[4, 1, 4, 2, 4, 4, 4, 4, 4, 1, 1, 4, 4, 4, 4, 4, null]";
+  auto hour = "[14, 13, 15, 18, 15, 16, 17, 18, 19, 21, 22, 23, 0, 14, 14, 15, null]";
+  auto minute = "[30, 53, 41, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+  auto second = "[59, 23, 20, 20, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 0, 3, null]";
+  auto millisecond = "[123, 999, 1, 0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0, null]";
+  auto microsecond = "[456, 999, 1, 0, 0, 0, 0, 132, 321, 163, 0, 0, 0, 0, 0, 0, null]";
+  auto nanosecond = "[789, 999, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null]";
+  auto subsecond =
+      "[0.123456789, 0.999999999, 0.001001001, 0.0, 0.001, 0.002, 0.003, 0.004132, "
+      "0.005321, 0.006163, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, null]";
 
-  for (auto u : internal::AllTimeUnits()) {
-    auto unit = timestamp(u, timezone);
-    auto timestamps = ArrayFromJSON(unit, times);
-
-    ASSERT_RAISES(Invalid, Year(timestamps));
-    ASSERT_RAISES(Invalid, Month(timestamps));
-    ASSERT_RAISES(Invalid, Day(timestamps));
-    ASSERT_RAISES(Invalid, DayOfWeek(timestamps));
-    ASSERT_RAISES(Invalid, DayOfYear(timestamps));
-    ASSERT_RAISES(Invalid, ISOYear(timestamps));
-    ASSERT_RAISES(Invalid, ISOWeek(timestamps));
-    ASSERT_RAISES(Invalid, ISOCalendar(timestamps));
-    ASSERT_RAISES(Invalid, Quarter(timestamps));
-    ASSERT_RAISES(Invalid, Hour(timestamps));
-    ASSERT_RAISES(Invalid, Minute(timestamps));
-    ASSERT_RAISES(Invalid, Second(timestamps));
-    ASSERT_RAISES(Invalid, Millisecond(timestamps));
-    ASSERT_RAISES(Invalid, Microsecond(timestamps));
-    ASSERT_RAISES(Invalid, Nanosecond(timestamps));
-    ASSERT_RAISES(Invalid, Subsecond(timestamps));
-  }
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+TEST(ScalarTemporalTest, TestZoned2) {
+  const char* times =
+      R"(["1970-01-01T00:00:59.123456789","2000-02-29T23:23:23.999999999",
+          "1899-01-01T00:59:20.001001001","2033-05-18T03:33:20.000000000",
+          "2020-01-01T01:05:05.001", "2019-12-31T02:10:10.002",
+          "2019-12-30T03:15:15.003", "2009-12-31T04:20:20.004132",
+          "2010-01-01T05:25:25.005321", "2010-01-03T06:30:30.006163",
+          "2010-01-04T07:35:35", "2006-01-01T08:40:40", "2005-12-31T09:45:45",
+          "2008-12-28", "2008-12-29", "2012-01-01 01:02:03", null])";
+  auto unit = timestamp(TimeUnit::NANO, "Australia/Broken_Hill");
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+  auto year =
+      "[1970, 2000, 1899, 2033, 2020, 2019, 2019, 2009, 2010, 2010, 2010, 2006, 2005, "
+      "2008, 2008, 2012, null]";
+  auto month = "[1, 3, 1, 5, 1, 12, 12, 12, 1, 1, 1, 1, 12, 12, 12, 1, null]";
+  auto day = "[1, 1, 1, 18, 1, 31, 30, 31, 1, 3, 4, 1, 31, 28, 29, 1, null]";
+  auto day_of_week = "[3, 2, 6, 2, 2, 1, 0, 3, 4, 6, 0, 6, 5, 6, 0, 6, null]";
+  auto day_of_year =
+      "[1, 61, 1, 138, 1, 365, 364, 365, 1, 3, 4, 1, 365, 363, 364, 1, null]";
+  auto iso_year =
+      "[1970, 2000, 1898, 2033, 2020, 2020, 2020, 2009, 2009, 2009, 2010, 2005, 2005, "
+      "2008, 2009, 2011, null]";
+  auto iso_week = "[1, 9, 52, 20, 1, 1, 1, 53, 53, 53, 1, 52, 52, 52, 1, 52, null]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 4},
+                        {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 3},
+                        {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 3},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 3},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 5},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                        {"iso_year": 2010, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2009, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 7}, null])");
+  auto quarter = "[1, 1, 1, 2, 1, 4, 4, 4, 1, 1, 1, 1, 4, 4, 4, 1, null]";
+  auto hour = "[9, 9, 9, 13, 11, 12, 13, 14, 15, 17, 18, 19, 20, 10, 10, 11, null]";
+  auto minute = "[30, 53, 59, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+  auto second = "[59, 23, 20, 20, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 0, 3, null]";
+  auto millisecond = "[123, 999, 1, 0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0, null]";
+  auto microsecond = "[456, 999, 1, 0, 0, 0, 0, 132, 321, 163, 0, 0, 0, 0, 0, 0, null]";
+  auto nanosecond = "[789, 999, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null]";
+  auto subsecond =
+      "[0.123456789, 0.999999999, 0.001001001, 0.0, 0.001, 0.002, 0.003, 0.004132, "
+      "0.005321, 0.006163, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, null]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+TEST(ScalarTemporalTest, TestOverflow) {
+  const char* times = R"(["1677-09-20T00:00:59.123456789", "2262-04-13T23:23:23.999999999"])";
+
+  auto unit = timestamp(TimeUnit::NANO);
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+
+  auto year = "[2262, 1677]";
+  auto month = "[4, 9]";
+  auto day = "[10, 22]";
+  auto day_of_week = "[3, 2]";
+  auto day_of_year = "[100, 265]";
+  auto iso_year = "[2262, 1677]";
+  auto iso_week = "[15, 38]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 2262, "iso_week": 15, "iso_day_of_week": 4},
+                        {"iso_year": 1677, "iso_week": 38, "iso_day_of_week": 3}])");
+  auto quarter = "[2, 3]";
+  auto hour = "[23, 23]";
+  auto minute = "[35, 48]";
+  auto second = "[32, 50]";
+  auto millisecond = "[833, 290]";
+  auto microsecond = "[8, 448]";
+  auto nanosecond = "[405, 383]";
+  auto subsecond = "[0.833008405, 0.290448383]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
 }

Review comment:
       It this what you had in mind [here](https://github.com/apache/arrow/pull/10176#issuecomment-858392898).

##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1255,6 +1255,78 @@ def test_strptime():
     assert got == expected
 
 
+def _check_datetime_components(timestamps, timezone=None):
+    import pandas as pd
+
+    if timezone:
+        ts = pd.to_datetime(timestamps).tz_localize(timezone).to_series()
+    else:
+        ts = pd.to_datetime(timestamps).to_series()
+
+    tsa = pa.array(ts)
+
+    subseconds = ((ts.dt.microsecond * 10**3 +
+                  ts.dt.nanosecond) * 10**-9).round(9)
+    iso_calendar_fields = [
+        pa.field('iso_year', pa.int64()),
+        pa.field('iso_week', pa.int64()),
+        pa.field('iso_day_of_week', pa.int64())
+    ]
+
+    iso_year = ts.dt.isocalendar()["year"].astype(int)
+    iso_week = ts.dt.isocalendar()["week"].astype(int)
+    iso_day = ts.dt.isocalendar()["day"].astype(int)
+    iso_calendar = pa.StructArray.from_arrays(
+        [iso_year, iso_week, iso_day], fields=iso_calendar_fields)
+
+    assert pc.year(tsa).equals(pa.array(ts.dt.year))
+    assert pc.month(tsa).equals(pa.array(ts.dt.month))
+    assert pc.day(tsa).equals(pa.array(ts.dt.day))
+    assert pc.day_of_week(tsa).equals(pa.array(ts.dt.day_of_week))
+    assert pc.day_of_year(tsa).equals(pa.array(ts.dt.day_of_year))
+    assert pc.iso_year(tsa).equals(pa.array(iso_year))
+    assert pc.iso_week(tsa).equals(pa.array(iso_week))
+    assert pc.iso_calendar(tsa).equals(iso_calendar)
+    assert pc.quarter(tsa).equals(pa.array(ts.dt.quarter))
+    assert pc.hour(tsa).equals(pa.array(ts.dt.hour))
+    assert pc.minute(tsa).equals(pa.array(ts.dt.minute))
+    assert pc.second(tsa).equals(pa.array(ts.dt.second.values))
+    assert pc.millisecond(tsa).equals(pa.array(ts.dt.microsecond // 10**3))
+    assert pc.microsecond(tsa).equals(pa.array(ts.dt.microsecond % 10**3))
+    assert pc.nanosecond(tsa).equals(pa.array(ts.dt.nanosecond))
+    assert pc.subsecond(tsa).equals(pa.array(subseconds))
+
+
+@pytest.mark.pandas
+def test_extract_datetime_components():
+    # TODO: see https://github.com/pandas-dev/pandas/issues/41834
+    # "1899-01-01T00:59:20.001001001"

Review comment:
       Probably best to remove this since I believe pandas is wrong and it's not really our TODO.
   It's kinda hard to confirm who is right.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-867815384


   I Jira-ed the timezone database issue [ARROW-13168](https://issues.apache.org/jira/browse/ARROW-13168).
   
   CI errors don't seem related. I'll add comments referencing ARROW-13168 and ask for a review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r689531944



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -63,79 +71,132 @@ const std::string& GetInputTimezone(const ArrayData& array) {
   return checked_cast<const TimestampType&>(*array.type).timezone();
 }
 
-template <typename T>
-Status TemporalComponentExtractCheckTimezone(const T& input) {
-  const auto& timezone = GetInputTimezone(input);
-  if (!timezone.empty()) {
-    return Status::NotImplemented(
-        "Cannot extract components from timestamp with specific timezone: ", timezone);
-  }
-  return Status::OK();
-}
-
-template <typename Op, typename OutType>
+template <
+    template <typename Duration, typename TimePointConverter, typename DaysConverter>
+    class Op,
+    typename Duration, typename OutType>
 struct TemporalComponentExtract {
-  using OutValue = typename internal::GetOutputType<OutType>::T;
-
   static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
-    RETURN_NOT_OK(TemporalComponentExtractCheckTimezone(batch.values[0]));
-    return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
+    const auto& timezone = GetInputTimezone(batch.values[0]);
+    if (timezone.empty()) {
+      using ExecTemplate = Op<Duration, std::function<sys_time<Duration>(int64_t)>,
+                              std::function<sys_days(sys_days)>>;
+      auto op = ExecTemplate([](int64_t t) { return sys_time<Duration>(Duration{t}); },
+                             [](sys_days d) { return d; });

Review comment:
       Can probably avoid the overhead of `std::function` by templating with the callable type itself.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r688206189



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -142,29 +141,168 @@ TEST_F(ScalarTemporalTest, TestTemporalComponentExtractionWithDifferentUnits) {
   }
 }
 
-TEST_F(ScalarTemporalTest, TestZonedTemporalComponentExtraction) {
-  std::string timezone = "Asia/Kolkata";
+TEST_F(ScalarTemporalTest, TestOutsideNanosecondRange) {
+  const char* times = R"(["1677-09-20T00:00:59.123456", "2262-04-13T23:23:23.999999"])";
+
+  auto unit = timestamp(TimeUnit::MICRO);
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+
+  auto year = "[1677, 2262]";
+  auto month = "[9, 4]";
+  auto day = "[20, 13]";
+  auto day_of_week = "[0, 6]";
+  auto day_of_year = "[263, 103]";
+  auto iso_year = "[1677, 2262]";
+  auto iso_week = "[38, 15]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1677, "iso_week": 38, "iso_day_of_week": 1},
+                          {"iso_year": 2262, "iso_week": 15, "iso_day_of_week": 7}])");
+  auto quarter = "[3, 2]";
+  auto hour = "[0, 23]";
+  auto minute = "[0, 23]";
+  auto second = "[59, 23]";
+  auto millisecond = "[123, 999]";
+  auto microsecond = "[456, 999]";
+  auto nanosecond = "[0, 0]";
+  auto subsecond = "[0.123456, 0.999999]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+#ifndef _WIN32
+// TODO: We should test on windows once ARROW-13168 is resolved.
+TEST_F(ScalarTemporalTest, TestZoned1) {
+  auto unit = timestamp(TimeUnit::NANO, "Pacific/Marquesas");

Review comment:
       It will return `Status::Invalid()` now. Added tests to check.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-899659073


   Rebased again to pick up AppVeyor fix :-/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r673251022



##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1427,9 +1428,20 @@ def test_extract_datetime_components():
                   "2008-12-28",
                   "2008-12-29",
                   "2012-01-01 01:02:03"]
+    timezones = ["UTC", "US/Central", "Asia/Kolkata",
+                 "Etc/GMT-4", "Etc/GMT+4", "Australia/Broken_Hill"]
 
+    # Test timezone naive timestamp array
     _check_datetime_components(timestamps)
 
+    # Test timezone aware timestamp array
+    if sys.platform == 'win32':
+        # TODO: We should test on windows once ARROW-13168 is resolved.
+        pytest.skip('Timezone database is not available on Windows yet')
+    else:
+        for timezone in timezones:
+            _check_datetime_components(timestamps, timezone)

Review comment:
       @rok I executed the same test case (copied the `test_compute.py` file) on top of the master branch and the test case passes. If I understand correctly this should fail on master since it is a new feature. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-861007346


   > I think there was no opposition for this one (i.e. that the field extraction should yield local hour/minute/etc), so I don't think there is a need to close the PR.
   
   Huh, I forget exactly what I was thinking there and you appear to be right.
   
   I think the main remaining task is the tz db issue. I'll take a look at it tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r673229267



##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1427,9 +1428,20 @@ def test_extract_datetime_components():
                   "2008-12-28",
                   "2008-12-29",
                   "2012-01-01 01:02:03"]
+    timezones = ["UTC", "US/Central", "Asia/Kolkata",
+                 "Etc/GMT-4", "Etc/GMT+4", "Australia/Broken_Hill"]
 
+    # Test timezone naive timestamp array
     _check_datetime_components(timestamps)
 
+    # Test timezone aware timestamp array
+    if sys.platform == 'win32':
+        # TODO: We should test on windows once ARROW-13168 is resolved.
+        pytest.skip('Timezone database is not available on Windows yet')
+    else:
+        for timezone in timezones:
+            _check_datetime_components(timestamps, timezone)

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r657500729



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -60,86 +63,132 @@ const std::string& GetInputTimezone(const ArrayData& array) {
   return checked_cast<const TimestampType&>(*array.type).timezone();
 }
 
-template <typename T>
-Status TemporalComponentExtractCheckTimezone(const T& input) {
-  const auto& timezone = GetInputTimezone(input);
-  if (!timezone.empty()) {
-    return Status::NotImplemented(
-        "Cannot extract components from timestamp with specific timezone: ", timezone);
+template <typename Op, typename OutType>
+struct TemporalComponentExtract {
+  using OutValue = typename internal::GetOutputType<OutType>::T;
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
   }
-  return Status::OK();
 }
 
 template <typename Op, typename OutType>
-struct TemporalComponentExtract {
+struct TemporalComponentExtractZoned {
   using OutValue = typename internal::GetOutputType<OutType>::T;
 
   static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
-    RETURN_NOT_OK(TemporalComponentExtractCheckTimezone(batch.values[0]));
-    return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
+    const auto& timezone = GetInputTimezone(batch.values[0]);
+    if (timezone.empty()) {
+      return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
+    } else {
+      applicator::ScalarUnaryNotNullStateful<OutType, TimestampType, Op> kernel{
+          locate_zone(timezone)};
+      return kernel.Exec(ctx, batch, out);
+    }
   }
+}
+
+struct TimezoneMixin {
+  TimezoneMixin(const time_zone* tz = nullptr) : tz_(tz) {}
+  const time_zone* tz_;
 };
 
 // ----------------------------------------------------------------------
 // Extract year from timestamp
 
 template <typename Duration>
-struct Year {
+struct Year : public TimezoneMixin {
+  using TimezoneMixin::TimezoneMixin;
+
   template <typename T, typename Arg0>
-  static T Call(KernelContext*, Arg0 arg, Status*) {
+  T Call(KernelContext*, Arg0 arg, Status*) const {
+    if (tz_ == nullptr) {

Review comment:
       @pitrou what would be a practical way to remove this check by templateing?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r688207165



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -399,6 +533,44 @@ struct ISOCalendar {
   }
 };
 
+template <template <typename...> class Op, typename OutType>
+std::shared_ptr<ScalarFunction> MakeTemporalZoned(std::string name,

Review comment:
       Done.

##########
File path: docs/source/cpp/compute.rst
##########
@@ -1056,7 +1056,8 @@ Temporal component extraction
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 These functions extract datetime components (year, month, day, etc) from timestamp type.
-Note: this is currently not supported for timestamps with timezone information.
+Note: supports timestamps with timezone information and will return localized timestamp
+components.

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r657708955



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -60,86 +63,132 @@ const std::string& GetInputTimezone(const ArrayData& array) {
   return checked_cast<const TimestampType&>(*array.type).timezone();
 }
 
-template <typename T>
-Status TemporalComponentExtractCheckTimezone(const T& input) {
-  const auto& timezone = GetInputTimezone(input);
-  if (!timezone.empty()) {
-    return Status::NotImplemented(
-        "Cannot extract components from timestamp with specific timezone: ", timezone);
+template <typename Op, typename OutType>
+struct TemporalComponentExtract {
+  using OutValue = typename internal::GetOutputType<OutType>::T;
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
   }
-  return Status::OK();
 }
 
 template <typename Op, typename OutType>
-struct TemporalComponentExtract {
+struct TemporalComponentExtractZoned {
   using OutValue = typename internal::GetOutputType<OutType>::T;
 
   static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
-    RETURN_NOT_OK(TemporalComponentExtractCheckTimezone(batch.values[0]));
-    return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
+    const auto& timezone = GetInputTimezone(batch.values[0]);
+    if (timezone.empty()) {
+      return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
+    } else {
+      applicator::ScalarUnaryNotNullStateful<OutType, TimestampType, Op> kernel{
+          locate_zone(timezone)};
+      return kernel.Exec(ctx, batch, out);
+    }
   }
+}
+
+struct TimezoneMixin {
+  TimezoneMixin(const time_zone* tz = nullptr) : tz_(tz) {}
+  const time_zone* tz_;
 };
 
 // ----------------------------------------------------------------------
 // Extract year from timestamp
 
 template <typename Duration>
-struct Year {
+struct Year : public TimezoneMixin {
+  using TimezoneMixin::TimezoneMixin;
+
   template <typename T, typename Arg0>
-  static T Call(KernelContext*, Arg0 arg, Status*) {
+  T Call(KernelContext*, Arg0 arg, Status*) const {
+    if (tz_ == nullptr) {

Review comment:
       Perhaps use a common base class for all extractors or something?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-860765541


   > I'll wait for the consensus to build on the timezone handling discussions before closing the PR and moving the python tests to a new PR.
   
   I think there was no opposition for this one (i.e. that the field extraction should yield local hour/minute/etc), so I don't think there is a need to close the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-855387981


   Errors appear to be due to windows builds not finding tz database. [See.](https://github.com/HowardHinnant/date/issues/641)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-858500316


   > I opened https://issues.apache.org/jira/browse/ARROW-13033 for this.
   
   Nice, I was also writing it just now :)
   The strptime timzone ignoring issue is [here](https://issues.apache.org/jira/browse/ARROW-12820).
   
   > Or does the vendored date.h also include functionalities to automatically download the data if not available on the system?
   
   As per HowardHinnant's [answer](https://github.com/HowardHinnant/date/issues/641#issuecomment-765732206):
   
   > The library can be configured to not go to the internet for tzdata. tzdata can either be downloaded manually, and this lib can find it, or (on non-Windows systems), the OS-supplied tzdata can be used.
   
   I suppose we could have the kernel try getting data online. If that fails try OS and as a final fallback use arrow bundled tz db (which we would have to add in this PR).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-855033992


   https://issues.apache.org/jira/browse/ARROW-12980


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r673134382



##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1427,9 +1428,20 @@ def test_extract_datetime_components():
                   "2008-12-28",
                   "2008-12-29",
                   "2012-01-01 01:02:03"]
+    timezones = ["UTC", "US/Central", "Asia/Kolkata",
+                 "Etc/GMT-4", "Etc/GMT+4", "Australia/Broken_Hill"]
 
+    # Test timezone naive timestamp array
     _check_datetime_components(timestamps)
 
+    # Test timezone aware timestamp array
+    if sys.platform == 'win32':
+        # TODO: We should test on windows once ARROW-13168 is resolved.
+        pytest.skip('Timezone database is not available on Windows yet')
+    else:
+        for timezone in timezones:
+            _check_datetime_components(timestamps, timezone)

Review comment:
       @rok seems like the timezone argument is not used within the `_check_datetime_components` helper, so I think it doesn't have any effect.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-883321833


   Didn't take a thorough look at the PR yet, so asking rather: @rok are these changes backward compatible?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r657924847



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -60,86 +63,132 @@ const std::string& GetInputTimezone(const ArrayData& array) {
   return checked_cast<const TimestampType&>(*array.type).timezone();
 }
 
-template <typename T>
-Status TemporalComponentExtractCheckTimezone(const T& input) {
-  const auto& timezone = GetInputTimezone(input);
-  if (!timezone.empty()) {
-    return Status::NotImplemented(
-        "Cannot extract components from timestamp with specific timezone: ", timezone);
+template <typename Op, typename OutType>
+struct TemporalComponentExtract {
+  using OutValue = typename internal::GetOutputType<OutType>::T;
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
   }
-  return Status::OK();
 }
 
 template <typename Op, typename OutType>
-struct TemporalComponentExtract {
+struct TemporalComponentExtractZoned {
   using OutValue = typename internal::GetOutputType<OutType>::T;
 
   static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
-    RETURN_NOT_OK(TemporalComponentExtractCheckTimezone(batch.values[0]));
-    return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
+    const auto& timezone = GetInputTimezone(batch.values[0]);
+    if (timezone.empty()) {
+      return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
+    } else {
+      applicator::ScalarUnaryNotNullStateful<OutType, TimestampType, Op> kernel{
+          locate_zone(timezone)};
+      return kernel.Exec(ctx, batch, out);
+    }
   }
+}
+
+struct TimezoneMixin {
+  TimezoneMixin(const time_zone* tz = nullptr) : tz_(tz) {}
+  const time_zone* tz_;
 };
 
 // ----------------------------------------------------------------------
 // Extract year from timestamp
 
 template <typename Duration>
-struct Year {
+struct Year : public TimezoneMixin {
+  using TimezoneMixin::TimezoneMixin;
+
   template <typename T, typename Arg0>
-  static T Call(KernelContext*, Arg0 arg, Status*) {
+  T Call(KernelContext*, Arg0 arg, Status*) const {
+    if (tz_ == nullptr) {

Review comment:
       You mine like `TimezoneMixin`? That wouldn't remove the check.
   Also the linter doesn't like it (["explicit" should be used on single-parameter constructors and conversion operators](https://rules.sonarsource.com/cpp/RSPEC-1709)).
   
   I'm looking to template the extractor or rather `Call` based on if `timezone` is present or not.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r689530278



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -50,6 +55,9 @@ using internal::applicator::ScalarUnaryNotNull;
 using internal::applicator::SimpleUnary;
 
 using DayOfWeekState = OptionsWrapper<DayOfWeekOptions>;
+const auto& iso_calendar_type =
+    struct_({field("iso_year", int64()), field("iso_week", int64()),
+             field("iso_day_of_week", int64())});

Review comment:
       You can't take a const-reference to a temporary dynamically created value.

##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -63,79 +71,132 @@ const std::string& GetInputTimezone(const ArrayData& array) {
   return checked_cast<const TimestampType&>(*array.type).timezone();
 }
 
-template <typename T>
-Status TemporalComponentExtractCheckTimezone(const T& input) {
-  const auto& timezone = GetInputTimezone(input);
-  if (!timezone.empty()) {
-    return Status::NotImplemented(
-        "Cannot extract components from timestamp with specific timezone: ", timezone);
-  }
-  return Status::OK();
-}
-
-template <typename Op, typename OutType>
+template <
+    template <typename Duration, typename TimePointConverter, typename DaysConverter>
+    class Op,
+    typename Duration, typename OutType>
 struct TemporalComponentExtract {
-  using OutValue = typename internal::GetOutputType<OutType>::T;
-
   static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
-    RETURN_NOT_OK(TemporalComponentExtractCheckTimezone(batch.values[0]));
-    return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
+    const auto& timezone = GetInputTimezone(batch.values[0]);
+    if (timezone.empty()) {
+      using ExecTemplate = Op<Duration, std::function<sys_time<Duration>(int64_t)>,
+                              std::function<sys_days(sys_days)>>;
+      auto op = ExecTemplate([](int64_t t) { return sys_time<Duration>(Duration{t}); },
+                             [](sys_days d) { return d; });

Review comment:
       Can probably avoid the overhead the `std::function` by templating with the callable type itself.

##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -63,79 +71,132 @@ const std::string& GetInputTimezone(const ArrayData& array) {
   return checked_cast<const TimestampType&>(*array.type).timezone();
 }
 
-template <typename T>
-Status TemporalComponentExtractCheckTimezone(const T& input) {
-  const auto& timezone = GetInputTimezone(input);
-  if (!timezone.empty()) {
-    return Status::NotImplemented(
-        "Cannot extract components from timestamp with specific timezone: ", timezone);
-  }
-  return Status::OK();
-}
-
-template <typename Op, typename OutType>
+template <
+    template <typename Duration, typename TimePointConverter, typename DaysConverter>
+    class Op,
+    typename Duration, typename OutType>
 struct TemporalComponentExtract {
-  using OutValue = typename internal::GetOutputType<OutType>::T;
-
   static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
-    RETURN_NOT_OK(TemporalComponentExtractCheckTimezone(batch.values[0]));
-    return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
+    const auto& timezone = GetInputTimezone(batch.values[0]);
+    if (timezone.empty()) {
+      using ExecTemplate = Op<Duration, std::function<sys_time<Duration>(int64_t)>,
+                              std::function<sys_days(sys_days)>>;
+      auto op = ExecTemplate([](int64_t t) { return sys_time<Duration>(Duration{t}); },
+                             [](sys_days d) { return d; });
+      applicator::ScalarUnaryNotNullStateful<OutType, TimestampType, ExecTemplate> kernel{
+          op};
+      return kernel.Exec(ctx, batch, out);
+    } else {
+      const time_zone* tz;
+      try {
+        tz = locate_zone(timezone);
+      } catch (const std::runtime_error& ex) {
+        return Status::Invalid(ex.what());
+      }
+      using ExecTemplate = Op<Duration, std::function<local_time<Duration>(int64_t)>,
+                              std::function<local_days(sys_days)>>;
+      auto op = ExecTemplate(
+          [tz](int64_t t) { return tz->to_local(sys_time<Duration>(Duration{t})); },
+          [](sys_days d) { return local_days(year_month_day(d)); });
+      applicator::ScalarUnaryNotNullStateful<OutType, TimestampType, ExecTemplate> kernel{
+          op};
+      return kernel.Exec(ctx, batch, out);
+    }
   }
 };
 
-template <typename Op, typename OutType>
-struct DayOfWeekExec {
-  using OutValue = typename internal::GetOutputType<OutType>::T;
-
+template <
+    template <typename Duration, typename TimePointConverter, typename DaysConverter>
+    class Op,
+    typename Duration, typename OutType>
+struct TemporalComponentExtractDayOfWeek {
   static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const auto& timezone = GetInputTimezone(batch.values[0]);
     const DayOfWeekOptions& options = DayOfWeekState::Get(ctx);

Review comment:
       At this point it should be possible to reconcile this with TemporalComponentExtract above... 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-867391225


   > We're using `tz.h` library which needs an updated timezone database to correctly handle timezoned timestamps. See [installation instructions](https://howardhinnant.github.io/date/tz.html#Installation).
   > 
   > We have the following options (if I understand correctly) for getting a timezone database:
   > 
   > 1. local (non-windows) OS timezone database  - no work required.
   > 2. arrow bundled folder - we could bundle the database at build time for windows. Database would slowly go stale.
   > 3. download it from IANA Time Zone Database at runtime - `tz.h` gets the database at runtime, but curl (and 7-zip on windows) are required.
   > 4. local user-provided folder - user could provide a location at buildtime. Nice to have.
   
   Would a 5th option to allow runtime configuration be possible as well? (which I assume would need some modification to tz.h?)
   
   For reference
   
   > For now it would probably be best to create a Jira for this and keep these test disabled on windows.
   
   For me that's fine. But to be explicit, it is currently using the default configuration of tz.h (which is to download the latest version)? I would maybe use the `USE_OS_TZDB` compile flag for now, if we postpone the discussion of how to install the tzdata sources for a follow-up JIRA.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-883326082


   > Didn't take a thorough look at the PR yet, so asking rather: @rok are these changes backward compatible?
   
   I believe so. This is new functionality (timezone component extraction was not present in 4.0.0) so I don't expect anything to break. There is a new compile flag introduced for windows but CI appears ok.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche edited a comment on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-867391225


   > We're using `tz.h` library which needs an updated timezone database to correctly handle timezoned timestamps. See [installation instructions](https://howardhinnant.github.io/date/tz.html#Installation).
   > 
   > We have the following options (if I understand correctly) for getting a timezone database:
   > 
   > 1. local (non-windows) OS timezone database  - no work required.
   > 2. arrow bundled folder - we could bundle the database at build time for windows. Database would slowly go stale.
   > 3. download it from IANA Time Zone Database at runtime - `tz.h` gets the database at runtime, but curl (and 7-zip on windows) are required.
   > 4. local user-provided folder - user could provide a location at buildtime. Nice to have.
   
   Would a 5th option to allow runtime configuration be possible as well? (which I assume would need some modification to tz.h?)
   
   For reference, the recent Python PEP on this topic: https://www.python.org/dev/peps/pep-0615/#sources-for-time-zone-data
   
   > For now it would probably be best to create a Jira for this and keep these test disabled on windows.
   
   For me that's fine. But to be explicit, it is currently using the default configuration of tz.h (which is to download the latest version)? I would maybe use the `USE_OS_TZDB` compile flag for now, if we postpone the discussion of how to install the tzdata sources for a follow-up JIRA.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-858491335


   Another question: is the `locate_zone` configurable in some way to give some hints where to find the tz database?
   
   The database is not always available on the system (as you noticed for windows). But depending on the application / binding language, it might already be available elsewhere. For example, in recent Python versions, there is now support for this in the standard library, automatically using the `tzdata` package if there is no system database. I am wondering if pyarrow should be able to instruct the kernel where to look for the database (https://www.python.org/dev/peps/pep-0615/#search-path-configuration in python). 
   Or does the vendored `date.h` also include functionalities to automatically download the data if not available on the system?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r651784673



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -143,39 +142,202 @@ TEST(ScalarTemporalTest, TestTemporalComponentExtractionWithDifferentUnits) {
     CheckScalarUnary("quarter", unit, times, int64(), quarter);
     CheckScalarUnary("hour", unit, times, int64(), hour);
     CheckScalarUnary("minute", unit, times, int64(), minute);
-    CheckScalarUnary("second", unit, times, float64(), second);
+    CheckScalarUnary("second", unit, times, int64(), second);
     CheckScalarUnary("millisecond", unit, times, int64(), zeros);
     CheckScalarUnary("microsecond", unit, times, int64(), zeros);
     CheckScalarUnary("nanosecond", unit, times, int64(), zeros);
     CheckScalarUnary("subsecond", unit, times, float64(), zeros);
   }
 }
 
-TEST(ScalarTemporalTest, TestZonedTemporalComponentExtraction) {
-  std::string timezone = "Asia/Kolkata";
-  const char* times = R"(["1970-01-01T00:00:59", null])";
+TEST(ScalarTemporalTest, TestZoned1) {
+  const char* times =
+      R"(["1970-01-01T00:00:59.123456789","2000-02-29T23:23:23.999999999",
+          "1899-01-01T00:59:20.001001001","2033-05-18T03:33:20.000000000",
+          "2020-01-01T01:05:05.001", "2019-12-31T02:10:10.002",
+          "2019-12-30T03:15:15.003", "2009-12-31T04:20:20.004132",
+          "2010-01-01T05:25:25.005321", "2010-01-03T06:30:30.006163",
+          "2010-01-04T07:35:35", "2006-01-01T08:40:40", "2005-12-31T09:45:45",
+          "2008-12-28", "2008-12-29", "2012-01-01 01:02:03", null])";
+  auto unit = timestamp(TimeUnit::NANO, "Pacific/Marquesas");
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+  auto year =
+      "[1969, 2000, 1898, 2033, 2019, 2019, 2019, 2009, 2009, 2010, 2010, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto month = "[12, 2, 12, 5, 12, 12, 12, 12, 12, 1, 1, 12, 12, 12, 12, 12, null]";
+  auto day = "[31, 29, 31, 17, 31, 30, 29, 30, 31, 2, 3, 31, 31, 27, 28, 31, null]";
+  auto day_of_week = "[2, 1, 5, 1, 1, 0, 6, 2, 3, 5, 6, 5, 5, 5, 6, 5, null]";
+  auto day_of_year =
+      "[365, 60, 365, 137, 365, 364, 363, 364, 365, 2, 3, 365, 365, 362, 363, 365, null]";
+  auto iso_year =
+      "[1970, 2000, 1898, 2033, 2020, 2020, 2019, 2009, 2009, 2009, 2009, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto iso_week = "[1, 9, 52, 20, 1, 1, 52, 53, 53, 53, 53, 52, 52, 52, 52, 52, null]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 3},
+                        {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 2},
+                        {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2019, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 3},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 6},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 6}, null])");
+  auto quarter = "[4, 1, 4, 2, 4, 4, 4, 4, 4, 1, 1, 4, 4, 4, 4, 4, null]";
+  auto hour = "[14, 13, 15, 18, 15, 16, 17, 18, 19, 21, 22, 23, 0, 14, 14, 15, null]";
+  auto minute = "[30, 53, 41, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+  auto second = "[59, 23, 20, 20, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 0, 3, null]";
+  auto millisecond = "[123, 999, 1, 0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0, null]";
+  auto microsecond = "[456, 999, 1, 0, 0, 0, 0, 132, 321, 163, 0, 0, 0, 0, 0, 0, null]";
+  auto nanosecond = "[789, 999, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null]";
+  auto subsecond =
+      "[0.123456789, 0.999999999, 0.001001001, 0.0, 0.001, 0.002, 0.003, 0.004132, "
+      "0.005321, 0.006163, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, null]";
 
-  for (auto u : internal::AllTimeUnits()) {
-    auto unit = timestamp(u, timezone);
-    auto timestamps = ArrayFromJSON(unit, times);
-
-    ASSERT_RAISES(Invalid, Year(timestamps));
-    ASSERT_RAISES(Invalid, Month(timestamps));
-    ASSERT_RAISES(Invalid, Day(timestamps));
-    ASSERT_RAISES(Invalid, DayOfWeek(timestamps));
-    ASSERT_RAISES(Invalid, DayOfYear(timestamps));
-    ASSERT_RAISES(Invalid, ISOYear(timestamps));
-    ASSERT_RAISES(Invalid, ISOWeek(timestamps));
-    ASSERT_RAISES(Invalid, ISOCalendar(timestamps));
-    ASSERT_RAISES(Invalid, Quarter(timestamps));
-    ASSERT_RAISES(Invalid, Hour(timestamps));
-    ASSERT_RAISES(Invalid, Minute(timestamps));
-    ASSERT_RAISES(Invalid, Second(timestamps));
-    ASSERT_RAISES(Invalid, Millisecond(timestamps));
-    ASSERT_RAISES(Invalid, Microsecond(timestamps));
-    ASSERT_RAISES(Invalid, Nanosecond(timestamps));
-    ASSERT_RAISES(Invalid, Subsecond(timestamps));
-  }
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+TEST(ScalarTemporalTest, TestZoned2) {
+  const char* times =
+      R"(["1970-01-01T00:00:59.123456789","2000-02-29T23:23:23.999999999",
+          "1899-01-01T00:59:20.001001001","2033-05-18T03:33:20.000000000",
+          "2020-01-01T01:05:05.001", "2019-12-31T02:10:10.002",
+          "2019-12-30T03:15:15.003", "2009-12-31T04:20:20.004132",
+          "2010-01-01T05:25:25.005321", "2010-01-03T06:30:30.006163",
+          "2010-01-04T07:35:35", "2006-01-01T08:40:40", "2005-12-31T09:45:45",
+          "2008-12-28", "2008-12-29", "2012-01-01 01:02:03", null])";
+  auto unit = timestamp(TimeUnit::NANO, "Australia/Broken_Hill");
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+  auto year =
+      "[1970, 2000, 1899, 2033, 2020, 2019, 2019, 2009, 2010, 2010, 2010, 2006, 2005, "
+      "2008, 2008, 2012, null]";
+  auto month = "[1, 3, 1, 5, 1, 12, 12, 12, 1, 1, 1, 1, 12, 12, 12, 1, null]";
+  auto day = "[1, 1, 1, 18, 1, 31, 30, 31, 1, 3, 4, 1, 31, 28, 29, 1, null]";
+  auto day_of_week = "[3, 2, 6, 2, 2, 1, 0, 3, 4, 6, 0, 6, 5, 6, 0, 6, null]";
+  auto day_of_year =
+      "[1, 61, 1, 138, 1, 365, 364, 365, 1, 3, 4, 1, 365, 363, 364, 1, null]";
+  auto iso_year =
+      "[1970, 2000, 1898, 2033, 2020, 2020, 2020, 2009, 2009, 2009, 2010, 2005, 2005, "
+      "2008, 2009, 2011, null]";
+  auto iso_week = "[1, 9, 52, 20, 1, 1, 1, 53, 53, 53, 1, 52, 52, 52, 1, 52, null]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 4},
+                        {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 3},
+                        {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 3},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 3},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 5},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                        {"iso_year": 2010, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2009, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 7}, null])");
+  auto quarter = "[1, 1, 1, 2, 1, 4, 4, 4, 1, 1, 1, 1, 4, 4, 4, 1, null]";
+  auto hour = "[9, 9, 9, 13, 11, 12, 13, 14, 15, 17, 18, 19, 20, 10, 10, 11, null]";
+  auto minute = "[30, 53, 59, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+  auto second = "[59, 23, 20, 20, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 0, 3, null]";
+  auto millisecond = "[123, 999, 1, 0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0, null]";
+  auto microsecond = "[456, 999, 1, 0, 0, 0, 0, 132, 321, 163, 0, 0, 0, 0, 0, 0, null]";
+  auto nanosecond = "[789, 999, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null]";
+  auto subsecond =
+      "[0.123456789, 0.999999999, 0.001001001, 0.0, 0.001, 0.002, 0.003, 0.004132, "
+      "0.005321, 0.006163, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, null]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+TEST(ScalarTemporalTest, TestOverflow) {
+  const char* times = R"(["1677-09-20T00:00:59.123456789", "2262-04-13T23:23:23.999999999"])";
+
+  auto unit = timestamp(TimeUnit::NANO);
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+
+  auto year = "[2262, 1677]";
+  auto month = "[4, 9]";
+  auto day = "[10, 22]";
+  auto day_of_week = "[3, 2]";
+  auto day_of_year = "[100, 265]";
+  auto iso_year = "[2262, 1677]";
+  auto iso_week = "[15, 38]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 2262, "iso_week": 15, "iso_day_of_week": 4},
+                        {"iso_year": 1677, "iso_week": 38, "iso_day_of_week": 3}])");
+  auto quarter = "[2, 3]";
+  auto hour = "[23, 23]";
+  auto minute = "[35, 48]";
+  auto second = "[32, 50]";
+  auto millisecond = "[833, 290]";
+  auto microsecond = "[8, 448]";
+  auto nanosecond = "[405, 383]";
+  auto subsecond = "[0.833008405, 0.290448383]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
 }

Review comment:
       Done.
   Perhaps `TestOverflow` id not the best name for the test?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-858395074


   I think it would be good to write some tests in python as well, as currently the C++ tests are very hard to verify since we don't yet have the ability to parse strings localized in the timezone (I mean: the strings in the tests are interpreted as UTC and not the "Australia/Broken_Hill" timezone. And thus as a result, the expected values can't be read/verified from the strings). 
   
   Given that, it might also make sense to first add a "localize" kernel for converting timestamps from naive to a certain timezone.
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r651241888



##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1212,6 +1212,80 @@ def test_strptime():
     assert got == expected
 
 
+def _check_datetime_components(ts):
+    tsa = [pa.array(ts)]
+    subseconds = ((ts.microsecond * 10**3 + ts.nanosecond) *
+                  10**-9).values.round(9)
+    iso_calendar_fields = [
+        pa.field('iso_year', pa.int64()),
+        pa.field('iso_week', pa.int64()),
+        pa.field('iso_day_of_week', pa.int64())
+    ]
+    iso_calendar = pa.StructArray.from_arrays([
+        ts.isocalendar()["year"].astype(int),
+        ts.isocalendar()["week"].astype(int),
+        ts.isocalendar()["day"].astype(int)],
+        fields=iso_calendar_fields)
+
+    assert pc.call_function("year", tsa).equals(pa.array(ts.year))

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r689611393



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -63,79 +71,132 @@ const std::string& GetInputTimezone(const ArrayData& array) {
   return checked_cast<const TimestampType&>(*array.type).timezone();
 }
 
-template <typename T>
-Status TemporalComponentExtractCheckTimezone(const T& input) {
-  const auto& timezone = GetInputTimezone(input);
-  if (!timezone.empty()) {
-    return Status::NotImplemented(
-        "Cannot extract components from timestamp with specific timezone: ", timezone);
-  }
-  return Status::OK();
-}
-
-template <typename Op, typename OutType>
+template <
+    template <typename Duration, typename TimePointConverter, typename DaysConverter>
+    class Op,
+    typename Duration, typename OutType>
 struct TemporalComponentExtract {
-  using OutValue = typename internal::GetOutputType<OutType>::T;
-
   static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
-    RETURN_NOT_OK(TemporalComponentExtractCheckTimezone(batch.values[0]));
-    return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
+    const auto& timezone = GetInputTimezone(batch.values[0]);
+    if (timezone.empty()) {
+      using ExecTemplate = Op<Duration, std::function<sys_time<Duration>(int64_t)>,
+                              std::function<sys_days(sys_days)>>;
+      auto op = ExecTemplate([](int64_t t) { return sys_time<Duration>(Duration{t}); },
+                             [](sys_days d) { return d; });
+      applicator::ScalarUnaryNotNullStateful<OutType, TimestampType, ExecTemplate> kernel{
+          op};
+      return kernel.Exec(ctx, batch, out);
+    } else {
+      const time_zone* tz;
+      try {
+        tz = locate_zone(timezone);
+      } catch (const std::runtime_error& ex) {
+        return Status::Invalid(ex.what());
+      }
+      using ExecTemplate = Op<Duration, std::function<local_time<Duration>(int64_t)>,
+                              std::function<local_days(sys_days)>>;
+      auto op = ExecTemplate(
+          [tz](int64_t t) { return tz->to_local(sys_time<Duration>(Duration{t})); },
+          [](sys_days d) { return local_days(year_month_day(d)); });
+      applicator::ScalarUnaryNotNullStateful<OutType, TimestampType, ExecTemplate> kernel{
+          op};
+      return kernel.Exec(ctx, batch, out);
+    }
   }
 };
 
-template <typename Op, typename OutType>
-struct DayOfWeekExec {
-  using OutValue = typename internal::GetOutputType<OutType>::T;
-
+template <
+    template <typename Duration, typename TimePointConverter, typename DaysConverter>
+    class Op,
+    typename Duration, typename OutType>
+struct TemporalComponentExtractDayOfWeek {
   static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const auto& timezone = GetInputTimezone(batch.values[0]);
     const DayOfWeekOptions& options = DayOfWeekState::Get(ctx);

Review comment:
       Indeed. This will be useful for the `Strftime` and `localize` kernels as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-899704422


   Thanks @pitrou @jorisvandenbossche and @kszucs !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r673152117



##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1427,9 +1428,20 @@ def test_extract_datetime_components():
                   "2008-12-28",
                   "2008-12-29",
                   "2012-01-01 01:02:03"]
+    timezones = ["UTC", "US/Central", "Asia/Kolkata",
+                 "Etc/GMT-4", "Etc/GMT+4", "Australia/Broken_Hill"]
 
+    # Test timezone naive timestamp array
     _check_datetime_components(timestamps)
 
+    # Test timezone aware timestamp array
+    if sys.platform == 'win32':
+        # TODO: We should test on windows once ARROW-13168 is resolved.
+        pytest.skip('Timezone database is not available on Windows yet')
+    else:
+        for timezone in timezones:
+            _check_datetime_components(timestamps, timezone)

Review comment:
       Oh, indeed. Let me fix that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-867632744


   > > 3. download it from IANA Time Zone Database at runtime - `tz.h` gets the database at runtime, but curl (and 7-zip on windows) are required.
   
   > Would a 5th option to allow runtime configuration be possible as well? (which I assume would need some modification to tz.h?)
   
   I think we mean the same with 3. and 5.? I don't think modification to `tz.hz` is required. There are [`get_tzdb`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/vendored/datetime/tz.cpp#L3510) and [`reload_tzdb`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/vendored/datetime/tz.cpp#L3496) available for runtime. These could be interesting in cases where timezone database changes while the process is running.
   
   > For me that's fine. But to be explicit, it is currently using the default configuration of tz.h (which is to download the latest version)? I would maybe use the `USE_OS_TZDB` compile flag for now, if we postpone the discussion of how to install the tzdata sources for a follow-up JIRA.
   
   [By default](https://github.com/apache/arrow/blob/master/cpp/src/arrow/vendored/datetime/tz.h#L49) we have `USE_OS_TZDB=1` on non-windows and `USE_OS_TZDB=1` on windows. I believe that means on non-windows we download if possible and fall back to OS network is not available. On windows we currently fail. Please check as I might have misunderstood.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r657500729



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -60,86 +63,132 @@ const std::string& GetInputTimezone(const ArrayData& array) {
   return checked_cast<const TimestampType&>(*array.type).timezone();
 }
 
-template <typename T>
-Status TemporalComponentExtractCheckTimezone(const T& input) {
-  const auto& timezone = GetInputTimezone(input);
-  if (!timezone.empty()) {
-    return Status::NotImplemented(
-        "Cannot extract components from timestamp with specific timezone: ", timezone);
+template <typename Op, typename OutType>
+struct TemporalComponentExtract {
+  using OutValue = typename internal::GetOutputType<OutType>::T;
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
   }
-  return Status::OK();
 }
 
 template <typename Op, typename OutType>
-struct TemporalComponentExtract {
+struct TemporalComponentExtractZoned {
   using OutValue = typename internal::GetOutputType<OutType>::T;
 
   static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
-    RETURN_NOT_OK(TemporalComponentExtractCheckTimezone(batch.values[0]));
-    return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
+    const auto& timezone = GetInputTimezone(batch.values[0]);
+    if (timezone.empty()) {
+      return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
+    } else {
+      applicator::ScalarUnaryNotNullStateful<OutType, TimestampType, Op> kernel{
+          locate_zone(timezone)};
+      return kernel.Exec(ctx, batch, out);
+    }
   }
+}
+
+struct TimezoneMixin {
+  TimezoneMixin(const time_zone* tz = nullptr) : tz_(tz) {}
+  const time_zone* tz_;
 };
 
 // ----------------------------------------------------------------------
 // Extract year from timestamp
 
 template <typename Duration>
-struct Year {
+struct Year : public TimezoneMixin {
+  using TimezoneMixin::TimezoneMixin;
+
   template <typename T, typename Arg0>
-  static T Call(KernelContext*, Arg0 arg, Status*) {
+  T Call(KernelContext*, Arg0 arg, Status*) const {
+    if (tz_ == nullptr) {

Review comment:
       @pitrou is there a practical way to remove this check with templateing?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r690413546



##########
File path: r/tests/testthat/test-dplyr-lubridate.R
##########
@@ -26,22 +26,13 @@ library(dplyr)
 # TODO: consider reevaluating this workaround after ARROW-12980
 withr::local_timezone("UTC")
 
-test_date <- as.POSIXct("2017-01-01 00:00:12.3456789", tz = "")
+if (tolower(Sys.info()[["sysname"]]) == "windows") {

Review comment:
       Without `-lole32` flag all zoned time logic was causing errors at compile time (if I remember correctly) so it would have to be disabled for windows. Since we will eventually have zones on windows I went ahead and rather added the flag.
   
   Here's a [similar issue from date.h](https://github.com/HowardHinnant/date/issues/272) and a similar [discussion from another PR](https://github.com/apache/arrow/pull/10610#pullrequestreview-705668997).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r651752620



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -143,39 +142,202 @@ TEST(ScalarTemporalTest, TestTemporalComponentExtractionWithDifferentUnits) {
     CheckScalarUnary("quarter", unit, times, int64(), quarter);
     CheckScalarUnary("hour", unit, times, int64(), hour);
     CheckScalarUnary("minute", unit, times, int64(), minute);
-    CheckScalarUnary("second", unit, times, float64(), second);
+    CheckScalarUnary("second", unit, times, int64(), second);
     CheckScalarUnary("millisecond", unit, times, int64(), zeros);
     CheckScalarUnary("microsecond", unit, times, int64(), zeros);
     CheckScalarUnary("nanosecond", unit, times, int64(), zeros);
     CheckScalarUnary("subsecond", unit, times, float64(), zeros);
   }
 }
 
-TEST(ScalarTemporalTest, TestZonedTemporalComponentExtraction) {
-  std::string timezone = "Asia/Kolkata";
-  const char* times = R"(["1970-01-01T00:00:59", null])";
+TEST(ScalarTemporalTest, TestZoned1) {
+  const char* times =
+      R"(["1970-01-01T00:00:59.123456789","2000-02-29T23:23:23.999999999",
+          "1899-01-01T00:59:20.001001001","2033-05-18T03:33:20.000000000",
+          "2020-01-01T01:05:05.001", "2019-12-31T02:10:10.002",
+          "2019-12-30T03:15:15.003", "2009-12-31T04:20:20.004132",
+          "2010-01-01T05:25:25.005321", "2010-01-03T06:30:30.006163",
+          "2010-01-04T07:35:35", "2006-01-01T08:40:40", "2005-12-31T09:45:45",
+          "2008-12-28", "2008-12-29", "2012-01-01 01:02:03", null])";
+  auto unit = timestamp(TimeUnit::NANO, "Pacific/Marquesas");
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+  auto year =
+      "[1969, 2000, 1898, 2033, 2019, 2019, 2019, 2009, 2009, 2010, 2010, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto month = "[12, 2, 12, 5, 12, 12, 12, 12, 12, 1, 1, 12, 12, 12, 12, 12, null]";
+  auto day = "[31, 29, 31, 17, 31, 30, 29, 30, 31, 2, 3, 31, 31, 27, 28, 31, null]";
+  auto day_of_week = "[2, 1, 5, 1, 1, 0, 6, 2, 3, 5, 6, 5, 5, 5, 6, 5, null]";
+  auto day_of_year =
+      "[365, 60, 365, 137, 365, 364, 363, 364, 365, 2, 3, 365, 365, 362, 363, 365, null]";
+  auto iso_year =
+      "[1970, 2000, 1898, 2033, 2020, 2020, 2019, 2009, 2009, 2009, 2009, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto iso_week = "[1, 9, 52, 20, 1, 1, 52, 53, 53, 53, 53, 52, 52, 52, 52, 52, null]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 3},
+                        {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 2},
+                        {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2019, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 3},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 6},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 6}, null])");
+  auto quarter = "[4, 1, 4, 2, 4, 4, 4, 4, 4, 1, 1, 4, 4, 4, 4, 4, null]";
+  auto hour = "[14, 13, 15, 18, 15, 16, 17, 18, 19, 21, 22, 23, 0, 14, 14, 15, null]";
+  auto minute = "[30, 53, 41, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+  auto second = "[59, 23, 20, 20, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 0, 3, null]";
+  auto millisecond = "[123, 999, 1, 0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0, null]";
+  auto microsecond = "[456, 999, 1, 0, 0, 0, 0, 132, 321, 163, 0, 0, 0, 0, 0, 0, null]";
+  auto nanosecond = "[789, 999, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null]";
+  auto subsecond =
+      "[0.123456789, 0.999999999, 0.001001001, 0.0, 0.001, 0.002, 0.003, 0.004132, "
+      "0.005321, 0.006163, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, null]";
 
-  for (auto u : internal::AllTimeUnits()) {
-    auto unit = timestamp(u, timezone);
-    auto timestamps = ArrayFromJSON(unit, times);
-
-    ASSERT_RAISES(Invalid, Year(timestamps));
-    ASSERT_RAISES(Invalid, Month(timestamps));
-    ASSERT_RAISES(Invalid, Day(timestamps));
-    ASSERT_RAISES(Invalid, DayOfWeek(timestamps));
-    ASSERT_RAISES(Invalid, DayOfYear(timestamps));
-    ASSERT_RAISES(Invalid, ISOYear(timestamps));
-    ASSERT_RAISES(Invalid, ISOWeek(timestamps));
-    ASSERT_RAISES(Invalid, ISOCalendar(timestamps));
-    ASSERT_RAISES(Invalid, Quarter(timestamps));
-    ASSERT_RAISES(Invalid, Hour(timestamps));
-    ASSERT_RAISES(Invalid, Minute(timestamps));
-    ASSERT_RAISES(Invalid, Second(timestamps));
-    ASSERT_RAISES(Invalid, Millisecond(timestamps));
-    ASSERT_RAISES(Invalid, Microsecond(timestamps));
-    ASSERT_RAISES(Invalid, Nanosecond(timestamps));
-    ASSERT_RAISES(Invalid, Subsecond(timestamps));
-  }
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+TEST(ScalarTemporalTest, TestZoned2) {
+  const char* times =
+      R"(["1970-01-01T00:00:59.123456789","2000-02-29T23:23:23.999999999",
+          "1899-01-01T00:59:20.001001001","2033-05-18T03:33:20.000000000",
+          "2020-01-01T01:05:05.001", "2019-12-31T02:10:10.002",
+          "2019-12-30T03:15:15.003", "2009-12-31T04:20:20.004132",
+          "2010-01-01T05:25:25.005321", "2010-01-03T06:30:30.006163",
+          "2010-01-04T07:35:35", "2006-01-01T08:40:40", "2005-12-31T09:45:45",
+          "2008-12-28", "2008-12-29", "2012-01-01 01:02:03", null])";
+  auto unit = timestamp(TimeUnit::NANO, "Australia/Broken_Hill");
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+  auto year =
+      "[1970, 2000, 1899, 2033, 2020, 2019, 2019, 2009, 2010, 2010, 2010, 2006, 2005, "
+      "2008, 2008, 2012, null]";
+  auto month = "[1, 3, 1, 5, 1, 12, 12, 12, 1, 1, 1, 1, 12, 12, 12, 1, null]";
+  auto day = "[1, 1, 1, 18, 1, 31, 30, 31, 1, 3, 4, 1, 31, 28, 29, 1, null]";
+  auto day_of_week = "[3, 2, 6, 2, 2, 1, 0, 3, 4, 6, 0, 6, 5, 6, 0, 6, null]";
+  auto day_of_year =
+      "[1, 61, 1, 138, 1, 365, 364, 365, 1, 3, 4, 1, 365, 363, 364, 1, null]";
+  auto iso_year =
+      "[1970, 2000, 1898, 2033, 2020, 2020, 2020, 2009, 2009, 2009, 2010, 2005, 2005, "
+      "2008, 2009, 2011, null]";
+  auto iso_week = "[1, 9, 52, 20, 1, 1, 1, 53, 53, 53, 1, 52, 52, 52, 1, 52, null]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 4},
+                        {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 3},
+                        {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 3},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 3},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 5},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                        {"iso_year": 2010, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2009, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 7}, null])");
+  auto quarter = "[1, 1, 1, 2, 1, 4, 4, 4, 1, 1, 1, 1, 4, 4, 4, 1, null]";
+  auto hour = "[9, 9, 9, 13, 11, 12, 13, 14, 15, 17, 18, 19, 20, 10, 10, 11, null]";
+  auto minute = "[30, 53, 59, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+  auto second = "[59, 23, 20, 20, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 0, 3, null]";
+  auto millisecond = "[123, 999, 1, 0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0, null]";
+  auto microsecond = "[456, 999, 1, 0, 0, 0, 0, 132, 321, 163, 0, 0, 0, 0, 0, 0, null]";
+  auto nanosecond = "[789, 999, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null]";
+  auto subsecond =
+      "[0.123456789, 0.999999999, 0.001001001, 0.0, 0.001, 0.002, 0.003, 0.004132, "
+      "0.005321, 0.006163, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, null]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+TEST(ScalarTemporalTest, TestOverflow) {
+  const char* times = R"(["1677-09-20T00:00:59.123456789", "2262-04-13T23:23:23.999999999"])";
+
+  auto unit = timestamp(TimeUnit::NANO);
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+
+  auto year = "[2262, 1677]";
+  auto month = "[4, 9]";
+  auto day = "[10, 22]";
+  auto day_of_week = "[3, 2]";
+  auto day_of_year = "[100, 265]";
+  auto iso_year = "[2262, 1677]";
+  auto iso_week = "[15, 38]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 2262, "iso_week": 15, "iso_day_of_week": 4},
+                        {"iso_year": 1677, "iso_week": 38, "iso_day_of_week": 3}])");
+  auto quarter = "[2, 3]";
+  auto hour = "[23, 23]";
+  auto minute = "[35, 48]";
+  auto second = "[32, 50]";
+  auto millisecond = "[833, 290]";
+  auto microsecond = "[8, 448]";
+  auto nanosecond = "[405, 383]";
+  auto subsecond = "[0.833008405, 0.290448383]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
 }

Review comment:
       Apart from that you should be using a different unit as NANO? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-883321833


   Didn't take a thorough look at the PR yet, so asking rather: @rok are these changes backward compatible?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r651779895



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -143,39 +142,202 @@ TEST(ScalarTemporalTest, TestTemporalComponentExtractionWithDifferentUnits) {
     CheckScalarUnary("quarter", unit, times, int64(), quarter);
     CheckScalarUnary("hour", unit, times, int64(), hour);
     CheckScalarUnary("minute", unit, times, int64(), minute);
-    CheckScalarUnary("second", unit, times, float64(), second);
+    CheckScalarUnary("second", unit, times, int64(), second);
     CheckScalarUnary("millisecond", unit, times, int64(), zeros);
     CheckScalarUnary("microsecond", unit, times, int64(), zeros);
     CheckScalarUnary("nanosecond", unit, times, int64(), zeros);
     CheckScalarUnary("subsecond", unit, times, float64(), zeros);
   }
 }
 
-TEST(ScalarTemporalTest, TestZonedTemporalComponentExtraction) {
-  std::string timezone = "Asia/Kolkata";
-  const char* times = R"(["1970-01-01T00:00:59", null])";
+TEST(ScalarTemporalTest, TestZoned1) {
+  const char* times =
+      R"(["1970-01-01T00:00:59.123456789","2000-02-29T23:23:23.999999999",
+          "1899-01-01T00:59:20.001001001","2033-05-18T03:33:20.000000000",
+          "2020-01-01T01:05:05.001", "2019-12-31T02:10:10.002",
+          "2019-12-30T03:15:15.003", "2009-12-31T04:20:20.004132",
+          "2010-01-01T05:25:25.005321", "2010-01-03T06:30:30.006163",
+          "2010-01-04T07:35:35", "2006-01-01T08:40:40", "2005-12-31T09:45:45",
+          "2008-12-28", "2008-12-29", "2012-01-01 01:02:03", null])";
+  auto unit = timestamp(TimeUnit::NANO, "Pacific/Marquesas");
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+  auto year =
+      "[1969, 2000, 1898, 2033, 2019, 2019, 2019, 2009, 2009, 2010, 2010, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto month = "[12, 2, 12, 5, 12, 12, 12, 12, 12, 1, 1, 12, 12, 12, 12, 12, null]";
+  auto day = "[31, 29, 31, 17, 31, 30, 29, 30, 31, 2, 3, 31, 31, 27, 28, 31, null]";
+  auto day_of_week = "[2, 1, 5, 1, 1, 0, 6, 2, 3, 5, 6, 5, 5, 5, 6, 5, null]";
+  auto day_of_year =
+      "[365, 60, 365, 137, 365, 364, 363, 364, 365, 2, 3, 365, 365, 362, 363, 365, null]";
+  auto iso_year =
+      "[1970, 2000, 1898, 2033, 2020, 2020, 2019, 2009, 2009, 2009, 2009, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto iso_week = "[1, 9, 52, 20, 1, 1, 52, 53, 53, 53, 53, 52, 52, 52, 52, 52, null]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 3},
+                        {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 2},
+                        {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2019, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 3},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 6},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 6}, null])");
+  auto quarter = "[4, 1, 4, 2, 4, 4, 4, 4, 4, 1, 1, 4, 4, 4, 4, 4, null]";
+  auto hour = "[14, 13, 15, 18, 15, 16, 17, 18, 19, 21, 22, 23, 0, 14, 14, 15, null]";
+  auto minute = "[30, 53, 41, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+  auto second = "[59, 23, 20, 20, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 0, 3, null]";
+  auto millisecond = "[123, 999, 1, 0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0, null]";
+  auto microsecond = "[456, 999, 1, 0, 0, 0, 0, 132, 321, 163, 0, 0, 0, 0, 0, 0, null]";
+  auto nanosecond = "[789, 999, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null]";
+  auto subsecond =
+      "[0.123456789, 0.999999999, 0.001001001, 0.0, 0.001, 0.002, 0.003, 0.004132, "
+      "0.005321, 0.006163, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, null]";
 
-  for (auto u : internal::AllTimeUnits()) {
-    auto unit = timestamp(u, timezone);
-    auto timestamps = ArrayFromJSON(unit, times);
-
-    ASSERT_RAISES(Invalid, Year(timestamps));
-    ASSERT_RAISES(Invalid, Month(timestamps));
-    ASSERT_RAISES(Invalid, Day(timestamps));
-    ASSERT_RAISES(Invalid, DayOfWeek(timestamps));
-    ASSERT_RAISES(Invalid, DayOfYear(timestamps));
-    ASSERT_RAISES(Invalid, ISOYear(timestamps));
-    ASSERT_RAISES(Invalid, ISOWeek(timestamps));
-    ASSERT_RAISES(Invalid, ISOCalendar(timestamps));
-    ASSERT_RAISES(Invalid, Quarter(timestamps));
-    ASSERT_RAISES(Invalid, Hour(timestamps));
-    ASSERT_RAISES(Invalid, Minute(timestamps));
-    ASSERT_RAISES(Invalid, Second(timestamps));
-    ASSERT_RAISES(Invalid, Millisecond(timestamps));
-    ASSERT_RAISES(Invalid, Microsecond(timestamps));
-    ASSERT_RAISES(Invalid, Nanosecond(timestamps));
-    ASSERT_RAISES(Invalid, Subsecond(timestamps));
-  }
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+TEST(ScalarTemporalTest, TestZoned2) {
+  const char* times =
+      R"(["1970-01-01T00:00:59.123456789","2000-02-29T23:23:23.999999999",
+          "1899-01-01T00:59:20.001001001","2033-05-18T03:33:20.000000000",
+          "2020-01-01T01:05:05.001", "2019-12-31T02:10:10.002",
+          "2019-12-30T03:15:15.003", "2009-12-31T04:20:20.004132",
+          "2010-01-01T05:25:25.005321", "2010-01-03T06:30:30.006163",
+          "2010-01-04T07:35:35", "2006-01-01T08:40:40", "2005-12-31T09:45:45",
+          "2008-12-28", "2008-12-29", "2012-01-01 01:02:03", null])";
+  auto unit = timestamp(TimeUnit::NANO, "Australia/Broken_Hill");
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+  auto year =
+      "[1970, 2000, 1899, 2033, 2020, 2019, 2019, 2009, 2010, 2010, 2010, 2006, 2005, "
+      "2008, 2008, 2012, null]";
+  auto month = "[1, 3, 1, 5, 1, 12, 12, 12, 1, 1, 1, 1, 12, 12, 12, 1, null]";
+  auto day = "[1, 1, 1, 18, 1, 31, 30, 31, 1, 3, 4, 1, 31, 28, 29, 1, null]";
+  auto day_of_week = "[3, 2, 6, 2, 2, 1, 0, 3, 4, 6, 0, 6, 5, 6, 0, 6, null]";
+  auto day_of_year =
+      "[1, 61, 1, 138, 1, 365, 364, 365, 1, 3, 4, 1, 365, 363, 364, 1, null]";
+  auto iso_year =
+      "[1970, 2000, 1898, 2033, 2020, 2020, 2020, 2009, 2009, 2009, 2010, 2005, 2005, "
+      "2008, 2009, 2011, null]";
+  auto iso_week = "[1, 9, 52, 20, 1, 1, 1, 53, 53, 53, 1, 52, 52, 52, 1, 52, null]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 4},
+                        {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 3},
+                        {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 3},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 3},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 5},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                        {"iso_year": 2010, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2009, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 7}, null])");
+  auto quarter = "[1, 1, 1, 2, 1, 4, 4, 4, 1, 1, 1, 1, 4, 4, 4, 1, null]";
+  auto hour = "[9, 9, 9, 13, 11, 12, 13, 14, 15, 17, 18, 19, 20, 10, 10, 11, null]";
+  auto minute = "[30, 53, 59, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+  auto second = "[59, 23, 20, 20, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 0, 3, null]";
+  auto millisecond = "[123, 999, 1, 0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0, null]";
+  auto microsecond = "[456, 999, 1, 0, 0, 0, 0, 132, 321, 163, 0, 0, 0, 0, 0, 0, null]";
+  auto nanosecond = "[789, 999, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null]";
+  auto subsecond =
+      "[0.123456789, 0.999999999, 0.001001001, 0.0, 0.001, 0.002, 0.003, 0.004132, "
+      "0.005321, 0.006163, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, null]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+TEST(ScalarTemporalTest, TestOverflow) {
+  const char* times = R"(["1677-09-20T00:00:59.123456789", "2262-04-13T23:23:23.999999999"])";
+
+  auto unit = timestamp(TimeUnit::NANO);
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+
+  auto year = "[2262, 1677]";
+  auto month = "[4, 9]";
+  auto day = "[10, 22]";
+  auto day_of_week = "[3, 2]";
+  auto day_of_year = "[100, 265]";
+  auto iso_year = "[2262, 1677]";
+  auto iso_week = "[15, 38]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 2262, "iso_week": 15, "iso_day_of_week": 4},
+                        {"iso_year": 1677, "iso_week": 38, "iso_day_of_week": 3}])");
+  auto quarter = "[2, 3]";
+  auto hour = "[23, 23]";
+  auto minute = "[35, 48]";
+  auto second = "[32, 50]";
+  auto millisecond = "[833, 290]";
+  auto microsecond = "[8, 448]";
+  auto nanosecond = "[405, 383]";
+  auto subsecond = "[0.833008405, 0.290448383]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
 }

Review comment:
       Oh, now I understand the intent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-858470970


   > I think it would be good to write some tests in python as well, as currently the C++ tests are very hard to verify since we don't yet have the ability to parse strings localized in the timezone (I mean: the strings in the tests are interpreted as UTC and not the "Australia/Broken_Hill" timezone. And thus as a result, the expected values can't be read/verified from the strings).
   > (while in python we could create the localized input timestamps with pandas)
   
   Indeed! I've used that approach to generate data in [ARROW-11759](https://github.com/apache/arrow/pull/10176) by pandas already and I'll just adopt that code here.
   It did lead me to an [odd issue? in pandas](https://github.com/pandas-dev/pandas/issues/41834) that I cant quite explain. We'll either need to address it or avoid it in tests ..
   
   > Given that, it might also make sense to first add a "localize" kernel for converting timestamps from naive to a certain timezone.
   
   Localize kernel would be great. I suppose we'd need a scalar and a vector one depending if timezone is shared between rows or not? Vector version would apply [here](https://issues.apache.org/jira/browse/ARROW-5912).
   
   Also strptime kernel ignores timezones at the moment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r657924847



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -60,86 +63,132 @@ const std::string& GetInputTimezone(const ArrayData& array) {
   return checked_cast<const TimestampType&>(*array.type).timezone();
 }
 
-template <typename T>
-Status TemporalComponentExtractCheckTimezone(const T& input) {
-  const auto& timezone = GetInputTimezone(input);
-  if (!timezone.empty()) {
-    return Status::NotImplemented(
-        "Cannot extract components from timestamp with specific timezone: ", timezone);
+template <typename Op, typename OutType>
+struct TemporalComponentExtract {
+  using OutValue = typename internal::GetOutputType<OutType>::T;
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
   }
-  return Status::OK();
 }
 
 template <typename Op, typename OutType>
-struct TemporalComponentExtract {
+struct TemporalComponentExtractZoned {
   using OutValue = typename internal::GetOutputType<OutType>::T;
 
   static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
-    RETURN_NOT_OK(TemporalComponentExtractCheckTimezone(batch.values[0]));
-    return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
+    const auto& timezone = GetInputTimezone(batch.values[0]);
+    if (timezone.empty()) {
+      return ScalarUnaryNotNull<OutType, TimestampType, Op>::Exec(ctx, batch, out);
+    } else {
+      applicator::ScalarUnaryNotNullStateful<OutType, TimestampType, Op> kernel{
+          locate_zone(timezone)};
+      return kernel.Exec(ctx, batch, out);
+    }
   }
+}
+
+struct TimezoneMixin {
+  TimezoneMixin(const time_zone* tz = nullptr) : tz_(tz) {}
+  const time_zone* tz_;
 };
 
 // ----------------------------------------------------------------------
 // Extract year from timestamp
 
 template <typename Duration>
-struct Year {
+struct Year : public TimezoneMixin {
+  using TimezoneMixin::TimezoneMixin;
+
   template <typename T, typename Arg0>
-  static T Call(KernelContext*, Arg0 arg, Status*) {
+  T Call(KernelContext*, Arg0 arg, Status*) const {
+    if (tz_ == nullptr) {

Review comment:
       You mean like `TimezoneMixin`? That wouldn't remove the check.
   Also the linter doesn't like it (["explicit" should be used on single-parameter constructors and conversion operators](https://rules.sonarsource.com/cpp/RSPEC-1709)).
   
   I'm looking to template the extractor or rather `Call` based on if `timezone` is present or not.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r688405211



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -142,44 +141,211 @@ TEST_F(ScalarTemporalTest, TestTemporalComponentExtractionWithDifferentUnits) {
   }
 }
 
-TEST_F(ScalarTemporalTest, TestZonedTemporalComponentExtraction) {
-  std::string timezone = "Asia/Kolkata";
+TEST_F(ScalarTemporalTest, TestOutsideNanosecondRange) {
+  const char* times = R"(["1677-09-20T00:00:59.123456", "2262-04-13T23:23:23.999999"])";
+
+  auto unit = timestamp(TimeUnit::MICRO);
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+
+  auto year = "[1677, 2262]";
+  auto month = "[9, 4]";
+  auto day = "[20, 13]";
+  auto day_of_week = "[0, 6]";
+  auto day_of_year = "[263, 103]";
+  auto iso_year = "[1677, 2262]";
+  auto iso_week = "[38, 15]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1677, "iso_week": 38, "iso_day_of_week": 1},
+                          {"iso_year": 2262, "iso_week": 15, "iso_day_of_week": 7}])");
+  auto quarter = "[3, 2]";
+  auto hour = "[0, 23]";
+  auto minute = "[0, 23]";
+  auto second = "[59, 23]";
+  auto millisecond = "[123, 999]";
+  auto microsecond = "[456, 999]";
+  auto nanosecond = "[0, 0]";
+  auto subsecond = "[0.123456, 0.999999]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+#ifndef _WIN32
+// TODO: We should test on windows once ARROW-13168 is resolved.
+TEST_F(ScalarTemporalTest, TestZoned1) {
+  auto unit = timestamp(TimeUnit::NANO, "Pacific/Marquesas");
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+  auto year =
+      "[1969, 2000, 1898, 2033, 2019, 2019, 2019, 2009, 2009, 2010, 2010, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto month = "[12, 2, 12, 5, 12, 12, 12, 12, 12, 1, 1, 12, 12, 12, 12, 12, null]";
+  auto day = "[31, 29, 31, 17, 31, 30, 29, 30, 31, 2, 3, 31, 31, 27, 28, 31, null]";
+  auto day_of_week = "[2, 1, 5, 1, 1, 0, 6, 2, 3, 5, 6, 5, 5, 5, 6, 5, null]";
+  auto day_of_year =
+      "[365, 60, 365, 137, 365, 364, 363, 364, 365, 2, 3, 365, 365, 362, 363, 365, null]";
+  auto iso_year =
+      "[1970, 2000, 1898, 2033, 2020, 2020, 2019, 2009, 2009, 2009, 2009, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto iso_week = "[1, 9, 52, 20, 1, 1, 52, 53, 53, 53, 53, 52, 52, 52, 52, 52, null]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 3},
+                        {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 2},
+                        {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2019, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 3},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 6},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 6}, null])");
+  auto quarter = "[4, 1, 4, 2, 4, 4, 4, 4, 4, 1, 1, 4, 4, 4, 4, 4, null]";
+  auto hour = "[14, 13, 15, 18, 15, 16, 17, 18, 19, 21, 22, 23, 0, 14, 14, 15, null]";
+  auto minute = "[30, 53, 41, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+TEST_F(ScalarTemporalTest, TestZoned2) {
+  for (auto u : internal::AllTimeUnits()) {
+    auto unit = timestamp(u, "Australia/Broken_Hill");
+    auto iso_calendar_type =
+        struct_({field("iso_year", int64()), field("iso_week", int64()),
+                 field("iso_day_of_week", int64())});
+    auto year =
+        "[1970, 2000, 1899, 2033, 2020, 2019, 2019, 2009, 2010, 2010, 2010, 2006, 2005, "
+        "2008, 2008, 2012, null]";
+    auto month = "[1, 3, 1, 5, 1, 12, 12, 12, 1, 1, 1, 1, 12, 12, 12, 1, null]";
+    auto day = "[1, 1, 1, 18, 1, 31, 30, 31, 1, 3, 4, 1, 31, 28, 29, 1, null]";
+    auto day_of_week = "[3, 2, 6, 2, 2, 1, 0, 3, 4, 6, 0, 6, 5, 6, 0, 6, null]";
+    auto day_of_year =
+        "[1, 61, 1, 138, 1, 365, 364, 365, 1, 3, 4, 1, 365, 363, 364, 1, null]";
+    auto iso_year =
+        "[1970, 2000, 1898, 2033, 2020, 2020, 2020, 2009, 2009, 2009, 2010, 2005, 2005, "
+        "2008, 2009, 2011, null]";
+    auto iso_week = "[1, 9, 52, 20, 1, 1, 1, 53, 53, 53, 1, 52, 52, 52, 1, 52, null]";
+    auto iso_calendar =
+        ArrayFromJSON(iso_calendar_type,
+                      R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 4},
+                          {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 3},
+                          {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 7},
+                          {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 3},
+                          {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 3},
+                          {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                          {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                          {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                          {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 5},
+                          {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                          {"iso_year": 2010, "iso_week": 1, "iso_day_of_week": 1},
+                          {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 7},
+                          {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                          {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                          {"iso_year": 2009, "iso_week": 1, "iso_day_of_week": 1},
+                          {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 7}, null])");
+    auto quarter = "[1, 1, 1, 2, 1, 4, 4, 4, 1, 1, 1, 1, 4, 4, 4, 1, null]";
+    auto hour = "[9, 9, 9, 13, 11, 12, 13, 14, 15, 17, 18, 19, 20, 10, 10, 11, null]";
+    auto minute = "[30, 53, 59, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+
+    CheckScalarUnary("year", unit, times_seconds_precision, int64(), year);
+    CheckScalarUnary("month", unit, times_seconds_precision, int64(), month);
+    CheckScalarUnary("day", unit, times_seconds_precision, int64(), day);
+    CheckScalarUnary("day_of_week", unit, times_seconds_precision, int64(), day_of_week);
+    CheckScalarUnary("day_of_year", unit, times_seconds_precision, int64(), day_of_year);
+    CheckScalarUnary("iso_year", unit, times_seconds_precision, int64(), iso_year);
+    CheckScalarUnary("iso_week", unit, times_seconds_precision, int64(), iso_week);
+    CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times_seconds_precision),
+                     iso_calendar);
+    CheckScalarUnary("quarter", unit, times_seconds_precision, int64(), quarter);
+    CheckScalarUnary("hour", unit, times_seconds_precision, int64(), hour);
+    CheckScalarUnary("minute", unit, times_seconds_precision, int64(), minute);
+    CheckScalarUnary("second", unit, times_seconds_precision, int64(), second);
+    CheckScalarUnary("millisecond", unit, times_seconds_precision, int64(), zeros);
+    CheckScalarUnary("microsecond", unit, times_seconds_precision, int64(), zeros);
+    CheckScalarUnary("nanosecond", unit, times_seconds_precision, int64(), zeros);
+    CheckScalarUnary("subsecond", unit, times_seconds_precision, float64(), zeros);
+  }
+}
+
+TEST_F(ScalarTemporalTest, TestZoned3) {
+  auto data_buffer = Buffer::Wrap(std::vector<int32_t>{1, 2, 3});
+  auto null_buffer = Buffer::FromString("\xff");
 
   for (auto u : internal::AllTimeUnits()) {
-    auto unit = timestamp(u, timezone);
-    auto timestamps = ArrayFromJSON(unit, times_seconds_precision);
-
-    ASSERT_RAISES(NotImplemented, Year(timestamps));
-    ASSERT_RAISES(NotImplemented, Month(timestamps));
-    ASSERT_RAISES(NotImplemented, Day(timestamps));
-    ASSERT_RAISES(NotImplemented, DayOfWeek(timestamps));
-    ASSERT_RAISES(NotImplemented, DayOfYear(timestamps));
-    ASSERT_RAISES(NotImplemented, ISOYear(timestamps));
-    ASSERT_RAISES(NotImplemented, ISOWeek(timestamps));
-    ASSERT_RAISES(NotImplemented, ISOCalendar(timestamps));
-    ASSERT_RAISES(NotImplemented, Quarter(timestamps));
-    ASSERT_RAISES(NotImplemented, Hour(timestamps));
-    ASSERT_RAISES(NotImplemented, Minute(timestamps));
-    ASSERT_RAISES(NotImplemented, Second(timestamps));
-    ASSERT_RAISES(NotImplemented, Millisecond(timestamps));
-    ASSERT_RAISES(NotImplemented, Microsecond(timestamps));
-    ASSERT_RAISES(NotImplemented, Nanosecond(timestamps));
-    ASSERT_RAISES(NotImplemented, Subsecond(timestamps));
+    auto ts_type = timestamp(u, "Mars/Mariner_Valley");

Review comment:
       Renamed the test.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r688346412



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -142,44 +141,211 @@ TEST_F(ScalarTemporalTest, TestTemporalComponentExtractionWithDifferentUnits) {
   }
 }
 
-TEST_F(ScalarTemporalTest, TestZonedTemporalComponentExtraction) {
-  std::string timezone = "Asia/Kolkata";
+TEST_F(ScalarTemporalTest, TestOutsideNanosecondRange) {
+  const char* times = R"(["1677-09-20T00:00:59.123456", "2262-04-13T23:23:23.999999"])";
+
+  auto unit = timestamp(TimeUnit::MICRO);
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+
+  auto year = "[1677, 2262]";
+  auto month = "[9, 4]";
+  auto day = "[20, 13]";
+  auto day_of_week = "[0, 6]";
+  auto day_of_year = "[263, 103]";
+  auto iso_year = "[1677, 2262]";
+  auto iso_week = "[38, 15]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1677, "iso_week": 38, "iso_day_of_week": 1},
+                          {"iso_year": 2262, "iso_week": 15, "iso_day_of_week": 7}])");
+  auto quarter = "[3, 2]";
+  auto hour = "[0, 23]";
+  auto minute = "[0, 23]";
+  auto second = "[59, 23]";
+  auto millisecond = "[123, 999]";
+  auto microsecond = "[456, 999]";
+  auto nanosecond = "[0, 0]";
+  auto subsecond = "[0.123456, 0.999999]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+#ifndef _WIN32
+// TODO: We should test on windows once ARROW-13168 is resolved.
+TEST_F(ScalarTemporalTest, TestZoned1) {
+  auto unit = timestamp(TimeUnit::NANO, "Pacific/Marquesas");
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+  auto year =
+      "[1969, 2000, 1898, 2033, 2019, 2019, 2019, 2009, 2009, 2010, 2010, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto month = "[12, 2, 12, 5, 12, 12, 12, 12, 12, 1, 1, 12, 12, 12, 12, 12, null]";
+  auto day = "[31, 29, 31, 17, 31, 30, 29, 30, 31, 2, 3, 31, 31, 27, 28, 31, null]";
+  auto day_of_week = "[2, 1, 5, 1, 1, 0, 6, 2, 3, 5, 6, 5, 5, 5, 6, 5, null]";
+  auto day_of_year =
+      "[365, 60, 365, 137, 365, 364, 363, 364, 365, 2, 3, 365, 365, 362, 363, 365, null]";
+  auto iso_year =
+      "[1970, 2000, 1898, 2033, 2020, 2020, 2019, 2009, 2009, 2009, 2009, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto iso_week = "[1, 9, 52, 20, 1, 1, 52, 53, 53, 53, 53, 52, 52, 52, 52, 52, null]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 3},
+                        {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 2},
+                        {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2019, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 3},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 6},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 6}, null])");
+  auto quarter = "[4, 1, 4, 2, 4, 4, 4, 4, 4, 1, 1, 4, 4, 4, 4, 4, null]";
+  auto hour = "[14, 13, 15, 18, 15, 16, 17, 18, 19, 21, 22, 23, 0, 14, 14, 15, null]";
+  auto minute = "[30, 53, 41, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+TEST_F(ScalarTemporalTest, TestZoned2) {
+  for (auto u : internal::AllTimeUnits()) {
+    auto unit = timestamp(u, "Australia/Broken_Hill");
+    auto iso_calendar_type =
+        struct_({field("iso_year", int64()), field("iso_week", int64()),
+                 field("iso_day_of_week", int64())});
+    auto year =
+        "[1970, 2000, 1899, 2033, 2020, 2019, 2019, 2009, 2010, 2010, 2010, 2006, 2005, "
+        "2008, 2008, 2012, null]";
+    auto month = "[1, 3, 1, 5, 1, 12, 12, 12, 1, 1, 1, 1, 12, 12, 12, 1, null]";
+    auto day = "[1, 1, 1, 18, 1, 31, 30, 31, 1, 3, 4, 1, 31, 28, 29, 1, null]";
+    auto day_of_week = "[3, 2, 6, 2, 2, 1, 0, 3, 4, 6, 0, 6, 5, 6, 0, 6, null]";
+    auto day_of_year =
+        "[1, 61, 1, 138, 1, 365, 364, 365, 1, 3, 4, 1, 365, 363, 364, 1, null]";
+    auto iso_year =
+        "[1970, 2000, 1898, 2033, 2020, 2020, 2020, 2009, 2009, 2009, 2010, 2005, 2005, "
+        "2008, 2009, 2011, null]";
+    auto iso_week = "[1, 9, 52, 20, 1, 1, 1, 53, 53, 53, 1, 52, 52, 52, 1, 52, null]";
+    auto iso_calendar =
+        ArrayFromJSON(iso_calendar_type,
+                      R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 4},
+                          {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 3},
+                          {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 7},
+                          {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 3},
+                          {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 3},
+                          {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                          {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                          {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                          {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 5},
+                          {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                          {"iso_year": 2010, "iso_week": 1, "iso_day_of_week": 1},
+                          {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 7},
+                          {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                          {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                          {"iso_year": 2009, "iso_week": 1, "iso_day_of_week": 1},
+                          {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 7}, null])");
+    auto quarter = "[1, 1, 1, 2, 1, 4, 4, 4, 1, 1, 1, 1, 4, 4, 4, 1, null]";
+    auto hour = "[9, 9, 9, 13, 11, 12, 13, 14, 15, 17, 18, 19, 20, 10, 10, 11, null]";
+    auto minute = "[30, 53, 59, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+
+    CheckScalarUnary("year", unit, times_seconds_precision, int64(), year);
+    CheckScalarUnary("month", unit, times_seconds_precision, int64(), month);
+    CheckScalarUnary("day", unit, times_seconds_precision, int64(), day);
+    CheckScalarUnary("day_of_week", unit, times_seconds_precision, int64(), day_of_week);
+    CheckScalarUnary("day_of_year", unit, times_seconds_precision, int64(), day_of_year);
+    CheckScalarUnary("iso_year", unit, times_seconds_precision, int64(), iso_year);
+    CheckScalarUnary("iso_week", unit, times_seconds_precision, int64(), iso_week);
+    CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times_seconds_precision),
+                     iso_calendar);
+    CheckScalarUnary("quarter", unit, times_seconds_precision, int64(), quarter);
+    CheckScalarUnary("hour", unit, times_seconds_precision, int64(), hour);
+    CheckScalarUnary("minute", unit, times_seconds_precision, int64(), minute);
+    CheckScalarUnary("second", unit, times_seconds_precision, int64(), second);
+    CheckScalarUnary("millisecond", unit, times_seconds_precision, int64(), zeros);
+    CheckScalarUnary("microsecond", unit, times_seconds_precision, int64(), zeros);
+    CheckScalarUnary("nanosecond", unit, times_seconds_precision, int64(), zeros);
+    CheckScalarUnary("subsecond", unit, times_seconds_precision, float64(), zeros);
+  }
+}
+
+TEST_F(ScalarTemporalTest, TestZoned3) {
+  auto data_buffer = Buffer::Wrap(std::vector<int32_t>{1, 2, 3});
+  auto null_buffer = Buffer::FromString("\xff");
 
   for (auto u : internal::AllTimeUnits()) {
-    auto unit = timestamp(u, timezone);
-    auto timestamps = ArrayFromJSON(unit, times_seconds_precision);
-
-    ASSERT_RAISES(NotImplemented, Year(timestamps));
-    ASSERT_RAISES(NotImplemented, Month(timestamps));
-    ASSERT_RAISES(NotImplemented, Day(timestamps));
-    ASSERT_RAISES(NotImplemented, DayOfWeek(timestamps));
-    ASSERT_RAISES(NotImplemented, DayOfYear(timestamps));
-    ASSERT_RAISES(NotImplemented, ISOYear(timestamps));
-    ASSERT_RAISES(NotImplemented, ISOWeek(timestamps));
-    ASSERT_RAISES(NotImplemented, ISOCalendar(timestamps));
-    ASSERT_RAISES(NotImplemented, Quarter(timestamps));
-    ASSERT_RAISES(NotImplemented, Hour(timestamps));
-    ASSERT_RAISES(NotImplemented, Minute(timestamps));
-    ASSERT_RAISES(NotImplemented, Second(timestamps));
-    ASSERT_RAISES(NotImplemented, Millisecond(timestamps));
-    ASSERT_RAISES(NotImplemented, Microsecond(timestamps));
-    ASSERT_RAISES(NotImplemented, Nanosecond(timestamps));
-    ASSERT_RAISES(NotImplemented, Subsecond(timestamps));
+    auto ts_type = timestamp(u, "Mars/Mariner_Valley");

Review comment:
       Can you add a comment here (or embed it in the name of the test) that this is a test for a invalid / non-existing timezone? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-898140574


   @pitrou Thanks for the review! And sorry for late reply.
   Could you please review again and suggest if templating could be further improved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #10457:
URL: https://github.com/apache/arrow/pull/10457


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r651752002



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -143,39 +142,202 @@ TEST(ScalarTemporalTest, TestTemporalComponentExtractionWithDifferentUnits) {
     CheckScalarUnary("quarter", unit, times, int64(), quarter);
     CheckScalarUnary("hour", unit, times, int64(), hour);
     CheckScalarUnary("minute", unit, times, int64(), minute);
-    CheckScalarUnary("second", unit, times, float64(), second);
+    CheckScalarUnary("second", unit, times, int64(), second);
     CheckScalarUnary("millisecond", unit, times, int64(), zeros);
     CheckScalarUnary("microsecond", unit, times, int64(), zeros);
     CheckScalarUnary("nanosecond", unit, times, int64(), zeros);
     CheckScalarUnary("subsecond", unit, times, float64(), zeros);
   }
 }
 
-TEST(ScalarTemporalTest, TestZonedTemporalComponentExtraction) {
-  std::string timezone = "Asia/Kolkata";
-  const char* times = R"(["1970-01-01T00:00:59", null])";
+TEST(ScalarTemporalTest, TestZoned1) {
+  const char* times =
+      R"(["1970-01-01T00:00:59.123456789","2000-02-29T23:23:23.999999999",
+          "1899-01-01T00:59:20.001001001","2033-05-18T03:33:20.000000000",
+          "2020-01-01T01:05:05.001", "2019-12-31T02:10:10.002",
+          "2019-12-30T03:15:15.003", "2009-12-31T04:20:20.004132",
+          "2010-01-01T05:25:25.005321", "2010-01-03T06:30:30.006163",
+          "2010-01-04T07:35:35", "2006-01-01T08:40:40", "2005-12-31T09:45:45",
+          "2008-12-28", "2008-12-29", "2012-01-01 01:02:03", null])";
+  auto unit = timestamp(TimeUnit::NANO, "Pacific/Marquesas");
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+  auto year =
+      "[1969, 2000, 1898, 2033, 2019, 2019, 2019, 2009, 2009, 2010, 2010, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto month = "[12, 2, 12, 5, 12, 12, 12, 12, 12, 1, 1, 12, 12, 12, 12, 12, null]";
+  auto day = "[31, 29, 31, 17, 31, 30, 29, 30, 31, 2, 3, 31, 31, 27, 28, 31, null]";
+  auto day_of_week = "[2, 1, 5, 1, 1, 0, 6, 2, 3, 5, 6, 5, 5, 5, 6, 5, null]";
+  auto day_of_year =
+      "[365, 60, 365, 137, 365, 364, 363, 364, 365, 2, 3, 365, 365, 362, 363, 365, null]";
+  auto iso_year =
+      "[1970, 2000, 1898, 2033, 2020, 2020, 2019, 2009, 2009, 2009, 2009, 2005, 2005, "
+      "2008, 2008, 2011, null]";
+  auto iso_week = "[1, 9, 52, 20, 1, 1, 52, 53, 53, 53, 53, 52, 52, 52, 52, 52, null]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 3},
+                        {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 2},
+                        {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2019, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 3},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 6},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 6}, null])");
+  auto quarter = "[4, 1, 4, 2, 4, 4, 4, 4, 4, 1, 1, 4, 4, 4, 4, 4, null]";
+  auto hour = "[14, 13, 15, 18, 15, 16, 17, 18, 19, 21, 22, 23, 0, 14, 14, 15, null]";
+  auto minute = "[30, 53, 41, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+  auto second = "[59, 23, 20, 20, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 0, 3, null]";
+  auto millisecond = "[123, 999, 1, 0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0, null]";
+  auto microsecond = "[456, 999, 1, 0, 0, 0, 0, 132, 321, 163, 0, 0, 0, 0, 0, 0, null]";
+  auto nanosecond = "[789, 999, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null]";
+  auto subsecond =
+      "[0.123456789, 0.999999999, 0.001001001, 0.0, 0.001, 0.002, 0.003, 0.004132, "
+      "0.005321, 0.006163, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, null]";
 
-  for (auto u : internal::AllTimeUnits()) {
-    auto unit = timestamp(u, timezone);
-    auto timestamps = ArrayFromJSON(unit, times);
-
-    ASSERT_RAISES(Invalid, Year(timestamps));
-    ASSERT_RAISES(Invalid, Month(timestamps));
-    ASSERT_RAISES(Invalid, Day(timestamps));
-    ASSERT_RAISES(Invalid, DayOfWeek(timestamps));
-    ASSERT_RAISES(Invalid, DayOfYear(timestamps));
-    ASSERT_RAISES(Invalid, ISOYear(timestamps));
-    ASSERT_RAISES(Invalid, ISOWeek(timestamps));
-    ASSERT_RAISES(Invalid, ISOCalendar(timestamps));
-    ASSERT_RAISES(Invalid, Quarter(timestamps));
-    ASSERT_RAISES(Invalid, Hour(timestamps));
-    ASSERT_RAISES(Invalid, Minute(timestamps));
-    ASSERT_RAISES(Invalid, Second(timestamps));
-    ASSERT_RAISES(Invalid, Millisecond(timestamps));
-    ASSERT_RAISES(Invalid, Microsecond(timestamps));
-    ASSERT_RAISES(Invalid, Nanosecond(timestamps));
-    ASSERT_RAISES(Invalid, Subsecond(timestamps));
-  }
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+TEST(ScalarTemporalTest, TestZoned2) {
+  const char* times =
+      R"(["1970-01-01T00:00:59.123456789","2000-02-29T23:23:23.999999999",
+          "1899-01-01T00:59:20.001001001","2033-05-18T03:33:20.000000000",
+          "2020-01-01T01:05:05.001", "2019-12-31T02:10:10.002",
+          "2019-12-30T03:15:15.003", "2009-12-31T04:20:20.004132",
+          "2010-01-01T05:25:25.005321", "2010-01-03T06:30:30.006163",
+          "2010-01-04T07:35:35", "2006-01-01T08:40:40", "2005-12-31T09:45:45",
+          "2008-12-28", "2008-12-29", "2012-01-01 01:02:03", null])";
+  auto unit = timestamp(TimeUnit::NANO, "Australia/Broken_Hill");
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+  auto year =
+      "[1970, 2000, 1899, 2033, 2020, 2019, 2019, 2009, 2010, 2010, 2010, 2006, 2005, "
+      "2008, 2008, 2012, null]";
+  auto month = "[1, 3, 1, 5, 1, 12, 12, 12, 1, 1, 1, 1, 12, 12, 12, 1, null]";
+  auto day = "[1, 1, 1, 18, 1, 31, 30, 31, 1, 3, 4, 1, 31, 28, 29, 1, null]";
+  auto day_of_week = "[3, 2, 6, 2, 2, 1, 0, 3, 4, 6, 0, 6, 5, 6, 0, 6, null]";
+  auto day_of_year =
+      "[1, 61, 1, 138, 1, 365, 364, 365, 1, 3, 4, 1, 365, 363, 364, 1, null]";
+  auto iso_year =
+      "[1970, 2000, 1898, 2033, 2020, 2020, 2020, 2009, 2009, 2009, 2010, 2005, 2005, "
+      "2008, 2009, 2011, null]";
+  auto iso_week = "[1, 9, 52, 20, 1, 1, 1, 53, 53, 53, 1, 52, 52, 52, 1, 52, null]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1970, "iso_week": 1, "iso_day_of_week": 4},
+                        {"iso_year": 2000, "iso_week": 9, "iso_day_of_week": 3},
+                        {"iso_year": 1898, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2033, "iso_week": 20, "iso_day_of_week": 3},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 3},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 2},
+                        {"iso_year": 2020, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 4},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 5},
+                        {"iso_year": 2009, "iso_week": 53, "iso_day_of_week": 7},
+                        {"iso_year": 2010, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2005, "iso_week": 52, "iso_day_of_week": 6},
+                        {"iso_year": 2008, "iso_week": 52, "iso_day_of_week": 7},
+                        {"iso_year": 2009, "iso_week": 1, "iso_day_of_week": 1},
+                        {"iso_year": 2011, "iso_week": 52, "iso_day_of_week": 7}, null])");
+  auto quarter = "[1, 1, 1, 2, 1, 4, 4, 4, 1, 1, 1, 1, 4, 4, 4, 1, null]";
+  auto hour = "[9, 9, 9, 13, 11, 12, 13, 14, 15, 17, 18, 19, 20, 10, 10, 11, null]";
+  auto minute = "[30, 53, 59, 3, 35, 40, 45, 50, 55, 0, 5, 10, 15, 30, 30, 32, null]";
+  auto second = "[59, 23, 20, 20, 5, 10, 15, 20, 25, 30, 35, 40, 45, 0, 0, 3, null]";
+  auto millisecond = "[123, 999, 1, 0, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0, null]";
+  auto microsecond = "[456, 999, 1, 0, 0, 0, 0, 132, 321, 163, 0, 0, 0, 0, 0, 0, null]";
+  auto nanosecond = "[789, 999, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null]";
+  auto subsecond =
+      "[0.123456789, 0.999999999, 0.001001001, 0.0, 0.001, 0.002, 0.003, 0.004132, "
+      "0.005321, 0.006163, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, null]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+TEST(ScalarTemporalTest, TestOverflow) {
+  const char* times = R"(["1677-09-20T00:00:59.123456789", "2262-04-13T23:23:23.999999999"])";
+
+  auto unit = timestamp(TimeUnit::NANO);
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+
+  auto year = "[2262, 1677]";
+  auto month = "[4, 9]";
+  auto day = "[10, 22]";
+  auto day_of_week = "[3, 2]";
+  auto day_of_year = "[100, 265]";
+  auto iso_year = "[2262, 1677]";
+  auto iso_week = "[15, 38]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 2262, "iso_week": 15, "iso_day_of_week": 4},
+                        {"iso_year": 1677, "iso_week": 38, "iso_day_of_week": 3}])");
+  auto quarter = "[2, 3]";
+  auto hour = "[23, 23]";
+  auto minute = "[35, 48]";
+  auto second = "[32, 50]";
+  auto millisecond = "[833, 290]";
+  auto microsecond = "[8, 448]";
+  auto nanosecond = "[405, 383]";
+  auto subsecond = "[0.833008405, 0.290448383]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
 }

Review comment:
       Yes, thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r690360575



##########
File path: r/tests/testthat/test-dplyr-lubridate.R
##########
@@ -26,22 +26,13 @@ library(dplyr)
 # TODO: consider reevaluating this workaround after ARROW-12980
 withr::local_timezone("UTC")
 
-test_date <- as.POSIXct("2017-01-01 00:00:12.3456789", tz = "")
+if (tolower(Sys.info()[["sysname"]]) == "windows") {

Review comment:
       If this doesn't work on windows, why did we have to add `-lole32` to r/configure.win in this patch?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-858854459


   Added the timezone-naive test for component extractions.
   I'll wait for the consensus to build on the timezone handling discussions before closing the PR and moving the python tests to a new PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-858694911


   @jorisvandenbossche added the [python tests](https://github.com/apache/arrow/pull/10457/commits/83b1e21a187c88d13e2b7a7589eac80f0a77a5b0). Will add some more for timezone naive and if needed make another PR :).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r657543026



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -143,39 +142,204 @@ TEST(ScalarTemporalTest, TestTemporalComponentExtractionWithDifferentUnits) {
     CheckScalarUnary("quarter", unit, times, int64(), quarter);
     CheckScalarUnary("hour", unit, times, int64(), hour);
     CheckScalarUnary("minute", unit, times, int64(), minute);
-    CheckScalarUnary("second", unit, times, float64(), second);
+    CheckScalarUnary("second", unit, times, int64(), second);
     CheckScalarUnary("millisecond", unit, times, int64(), zeros);
     CheckScalarUnary("microsecond", unit, times, int64(), zeros);
     CheckScalarUnary("nanosecond", unit, times, int64(), zeros);
     CheckScalarUnary("subsecond", unit, times, float64(), zeros);
   }
 }
 
-TEST(ScalarTemporalTest, TestZonedTemporalComponentExtraction) {
-  std::string timezone = "Asia/Kolkata";
-  const char* times = R"(["1970-01-01T00:00:59", null])";
+TEST(ScalarTemporalTest, TestOutsideNanosecondRange) {
+  const char* times = R"(["1677-09-20T00:00:59.123456", "2262-04-13T23:23:23.999999"])";
 
-  for (auto u : internal::AllTimeUnits()) {
-    auto unit = timestamp(u, timezone);
-    auto timestamps = ArrayFromJSON(unit, times);
-
-    ASSERT_RAISES(NotImplemented, Year(timestamps));
-    ASSERT_RAISES(NotImplemented, Month(timestamps));
-    ASSERT_RAISES(NotImplemented, Day(timestamps));
-    ASSERT_RAISES(NotImplemented, DayOfWeek(timestamps));
-    ASSERT_RAISES(NotImplemented, DayOfYear(timestamps));
-    ASSERT_RAISES(NotImplemented, ISOYear(timestamps));
-    ASSERT_RAISES(NotImplemented, ISOWeek(timestamps));
-    ASSERT_RAISES(NotImplemented, ISOCalendar(timestamps));
-    ASSERT_RAISES(NotImplemented, Quarter(timestamps));
-    ASSERT_RAISES(NotImplemented, Hour(timestamps));
-    ASSERT_RAISES(NotImplemented, Minute(timestamps));
-    ASSERT_RAISES(NotImplemented, Second(timestamps));
-    ASSERT_RAISES(NotImplemented, Millisecond(timestamps));
-    ASSERT_RAISES(NotImplemented, Microsecond(timestamps));
-    ASSERT_RAISES(NotImplemented, Nanosecond(timestamps));
-    ASSERT_RAISES(NotImplemented, Subsecond(timestamps));
-  }
+  auto unit = timestamp(TimeUnit::MICRO);
+  auto iso_calendar_type =
+      struct_({field("iso_year", int64()), field("iso_week", int64()),
+               field("iso_day_of_week", int64())});
+
+  auto year = "[1677, 2262]";
+  auto month = "[9, 4]";
+  auto day = "[20, 13]";
+  auto day_of_week = "[0, 6]";
+  auto day_of_year = "[263, 103]";
+  auto iso_year = "[1677, 2262]";
+  auto iso_week = "[38, 15]";
+  auto iso_calendar =
+      ArrayFromJSON(iso_calendar_type,
+                    R"([{"iso_year": 1677, "iso_week": 38, "iso_day_of_week": 1},
+                          {"iso_year": 2262, "iso_week": 15, "iso_day_of_week": 7}])");
+  auto quarter = "[3, 2]";
+  auto hour = "[0, 23]";
+  auto minute = "[0, 23]";
+  auto second = "[59, 23]";
+  auto millisecond = "[123, 999]";
+  auto microsecond = "[456, 999]";
+  auto nanosecond = "[0, 0]";
+  auto subsecond = "[0.123456, 0.999999]";
+
+  CheckScalarUnary("year", unit, times, int64(), year);
+  CheckScalarUnary("month", unit, times, int64(), month);
+  CheckScalarUnary("day", unit, times, int64(), day);
+  CheckScalarUnary("day_of_week", unit, times, int64(), day_of_week);
+  CheckScalarUnary("day_of_year", unit, times, int64(), day_of_year);
+  CheckScalarUnary("iso_year", unit, times, int64(), iso_year);
+  CheckScalarUnary("iso_week", unit, times, int64(), iso_week);
+  CheckScalarUnary("iso_calendar", ArrayFromJSON(unit, times), iso_calendar);
+  CheckScalarUnary("quarter", unit, times, int64(), quarter);
+  CheckScalarUnary("hour", unit, times, int64(), hour);
+  CheckScalarUnary("minute", unit, times, int64(), minute);
+  CheckScalarUnary("second", unit, times, int64(), second);
+  CheckScalarUnary("millisecond", unit, times, int64(), millisecond);
+  CheckScalarUnary("microsecond", unit, times, int64(), microsecond);
+  CheckScalarUnary("nanosecond", unit, times, int64(), nanosecond);
+  CheckScalarUnary("subsecond", unit, times, float64(), subsecond);
+}
+
+#ifndef _WIN32

Review comment:
       Ok! Let's see what the opinions are on this.
   It would be nice if this worked on windows without bothering the user.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r688207105



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc
##########
@@ -187,16 +243,28 @@ struct DayOfYear {
 
 template <typename Duration>
 struct ISOYear {
+  explicit ISOYear(const time_zone* tz = nullptr) : tz_(tz) {}
+
   template <typename T, typename Arg0>
-  static T Call(KernelContext*, Arg0 arg, Status*) {
-    const auto t = floor<days>(sys_time<Duration>(Duration{arg}));
+  T Call(KernelContext*, Arg0 arg, Status*) const {
+    if (tz_ == nullptr) {
+      const auto t = floor<days>(sys_time<Duration>(Duration{arg}));
+      auto y = year_month_day{t + days{3}}.year();
+      auto start = sys_time<days>((y - years{1}) / dec / thu[last]) + (mon - thu);
+      if (t < start) {
+        --y;
+      }
+      return static_cast<T>(static_cast<int32_t>(y));
+    }
+    const auto t = floor<days>(tz_->to_local(sys_time<Duration>(Duration{arg})));
     auto y = year_month_day{t + days{3}}.year();
-    auto start = sys_time<days>((y - years{1}) / dec / thu[last]) + (mon - thu);
+    auto start = local_days((y - years{1}) / dec / thu[last]) + (mon - thu);
     if (t < start) {

Review comment:
       I've adjusted your proposal to work with `ScalarUnaryNotNullStateful`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#issuecomment-867695446


   > By default we have USE_OS_TZDB=1 on non-windows and USE_OS_TZDB=1 on windows
   
   Sorry this should be:
   By default we have USE_OS_TZDB=1 on non-windows and USE_OS_TZDB=0 on windows.
   
   > (BTW, I think it will then always use the OS supplied version, and never try to download, based on https://howardhinnant.github.io/date/tz.html#Installation)
   
   Indeed we set `USE_OS_TZDB=1` which sets `HAS_REMOTE_API=0` which means we don't attempt to download.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r688207402



##########
File path: docs/source/cpp/compute.rst
##########
@@ -1056,7 +1056,8 @@ Temporal component extraction
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 These functions extract datetime components (year, month, day, etc) from timestamp type.
-Note: this is currently not supported for timestamps with timezone information.
+Note: supports timestamps with timezone information and will return localized timestamp
+components.
 
 +--------------------+------------+-------------------+---------------+----------------------------+-------+
 | Function name      | Arity      | Input types       | Output type   | Options class              | Notes |

Review comment:
       Yes. I've changed the columns to say Timestamp to reflect this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10457:
URL: https://github.com/apache/arrow/pull/10457#discussion_r689812738



##########
File path: r/tests/testthat/test-dplyr-lubridate.R
##########
@@ -26,22 +26,13 @@ library(dplyr)
 # TODO: consider reevaluating this workaround after ARROW-12980
 withr::local_timezone("UTC")
 
-test_date <- as.POSIXct("2017-01-01 00:00:12.3456789", tz = "")
+if (tolower(Sys.info()[["sysname"]]) == "windows") {

Review comment:
       Why is windows special-cased here? This needs an explanatory comment and possibly a link to a jira that will let us delete this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org