You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/01 23:45:08 UTC

[GitHub] [arrow] wjones127 opened a new pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

wjones127 opened a new pull request #12536:
URL: https://github.com/apache/arrow/pull/12536


   This allows for runtime configuration of the timezone database on Windows for C++ and R. Python will be handled later because it's available timezone libraries use the binary rather than text format, which is not yet supported the vendored date library.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r834724077



##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+  GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+  auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+  auto home_raw = std::getenv("USERPROFILE");
+  std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+  ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+                       fs->NormalizePath(home + "\\Downloads\\tzdata"));

Review comment:
       I'm not sure what the convention for locking here would be. @pitrou what would you suggest?
   It's probably still prudent to move this to a new Jira.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r835661703



##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -538,15 +531,6 @@ test_that("extract quarter from date", {
   )
 })
 
-test_that("extract month from date", {
-  compare_dplyr_binding(
-    .input %>%
-      mutate(x = month(date)) %>%
-      collect(),
-    test_df
-  )
-})

Review comment:
       Yes, that's intentional. It appeared twice in the file (probably from a bad rebase/merge), and I chose to keep the longer version:
   https://github.com/apache/arrow/blob/62e81780dda234106fba5a3148f0d19045d66898/r/tests/testthat/test-dplyr-funcs-datetime.R#L561
   
   If you look in master right now you will see the same test twice: https://github.com/apache/arrow/blob/master/r/tests/testthat/test-dplyr-funcs-datetime.R
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r835660596



##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -538,15 +531,6 @@ test_that("extract quarter from date", {
   )
 })
 
-test_that("extract month from date", {
-  compare_dplyr_binding(
-    .input %>%
-      mutate(x = month(date)) %>%
-      collect(),
-    test_df
-  )
-})

Review comment:
       Did you intend to remove this test? Or maybe I missed that it was being moved somewhere else?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r836655317



##########
File path: ci/appveyor-cpp-setup.bat
##########
@@ -115,3 +115,17 @@ powershell.exe -Command "Start-Process clcache-server" || exit /B
 if "%ARROW_S3%" == "ON" (
     appveyor DownloadFile https://dl.min.io/server/minio/release/windows-amd64/minio.exe -FileName C:\Windows\Minio.exe || exit /B
 )
+
+
+@rem
+@rem Download IANA Timezone Database for unit tests
+@rem
+@rem (Doc section: Download timezone database)
+curl https://data.iana.org/time-zones/releases/tzdata2021e.tar.gz --output tzdata.tar.gz

Review comment:
       Ah got it. We'll address in the follow up PR where we'll have PyArrow use tzdb. For now the python timestamp tests are skipped on Windows https://github.com/apache/arrow/blob/d327f694d7cd6b5919fe1df7d9dea6e7ebef03e1/python/pyarrow/tests/test_compute.py#L1915-L1917




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r835918809



##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -538,15 +531,6 @@ test_that("extract quarter from date", {
   )
 })
 
-test_that("extract month from date", {
-  compare_dplyr_binding(
-    .input %>%
-      mutate(x = month(date)) %>%
-      collect(),
-    test_df
-  )
-})

Review comment:
       Aaah, good catch! Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r832736100



##########
File path: ci/appveyor-cpp-setup.bat
##########
@@ -115,3 +115,17 @@ powershell.exe -Command "Start-Process clcache-server" || exit /B
 if "%ARROW_S3%" == "ON" (
     appveyor DownloadFile https://dl.min.io/server/minio/release/windows-amd64/minio.exe -FileName C:\Windows\Minio.exe || exit /B
 )
+
+
+@rem
+@rem Download IANA Timezone Database for unit tests
+@rem
+@rem (Doc section: Download timezone database)
+curl https://data.iana.org/time-zones/releases/tzdata2021e.tar.gz --output tzdata.tar.gz

Review comment:
       No that `2021e` is a version, which won't necessarily align with the R and Python ones in the future. I don't think it matters that we update it, unless the format changes somehow. This is just for testing that it works, and we don't ship it.
   
   But the R unit tests use the one provided by the tzdb package, so we are testing that as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r833673483



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -1875,6 +1870,9 @@ TEST_F(ScalarTemporalTest, StrftimeCLocale) {
 }
 
 TEST_F(ScalarTemporalTest, StrftimeOtherLocale) {
+#ifdef _WIN32
+  GTEST_SKIP() << "There is a known bug in strftime for locales on Windows (ARROW-15922)";

Review comment:
       Actually, I think the test is likely skipping on the `LocaleExists("fr_FR.UTF-8")` condition; IIRC MinGW doesn't support locales apart from "C" and "POSIX". Sadly, no indication in CI whether the test was skipped: https://github.com/apache/arrow/runs/5504310182?check_suite_focus=true




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r834909962



##########
File path: r/src/config.cpp
##########
@@ -34,4 +35,16 @@ std::vector<std::string> runtime_info() {
   return {info.simd_level, info.detected_simd_level};
 }
 
+// [[arrow::export]]
+void set_timezone_database(cpp11::strings path) {
+  auto paths = cpp11::as_cpp<std::vector<std::string>>(path);
+  if (path.size() != 1) {
+    cpp11::stop("Must provide a single path to the timezone database.");
+  }
+
+  arrow::ArrowGlobalOptions options;

Review comment:
       Yes, forgot to update the R portion after getting the C++ tests passing. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r833673483



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -1875,6 +1870,9 @@ TEST_F(ScalarTemporalTest, StrftimeCLocale) {
 }
 
 TEST_F(ScalarTemporalTest, StrftimeOtherLocale) {
+#ifdef _WIN32
+  GTEST_SKIP() << "There is a known bug in strftime for locales on Windows (ARROW-15922)";

Review comment:
       Actually, I think the test is likely skipping on the `LocaleExists("fr_FR.UTF-8")`; IIRC MinGW doesn't support locales apart from "C" and "POSIX". Sadly, no indication in CI whether the test was skipped: https://github.com/apache/arrow/runs/5504310182?check_suite_focus=true




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r833247348



##########
File path: cpp/src/arrow/config.h
##########
@@ -77,4 +86,13 @@ const BuildInfo& GetBuildInfo();
 ARROW_EXPORT
 RuntimeInfo GetRuntimeInfo();
 
+struct ArrowGlobalOptions {
+  /// Path to text timezone database. This is only configurable on Windows,
+  /// which does not have a compatible OS timezone database.
+  util::optional<std::string> tz_db_path;

Review comment:
       Can we use a consistent naming here and above?
   

##########
File path: cpp/src/arrow/config.cc
##########
@@ -73,7 +76,30 @@ RuntimeInfo GetRuntimeInfo() {
       MakeSimdLevelString([&](int64_t flags) { return cpu_info->IsSupported(flags); });
   info.detected_simd_level =
       MakeSimdLevelString([&](int64_t flags) { return cpu_info->IsDetected(flags); });
+  info.using_os_timezone_db = USE_OS_TZDB;
+#if !USE_OS_TZDB
+  info.timezone_db_path = timezone_db_path;

Review comment:
       You shouldn't leave it uninitialized otherwise.

##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+  GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+  auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+  auto home_raw = std::getenv("USERPROFILE");
+  std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+  ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+                       fs->NormalizePath(home + "\\Downloads\\tzdata"));

Review comment:
       This makes the test depend on specifics of where the timezone db was downloaded.
   
   Do we instead want to expose a `ARROW_TIMEZONE_DB_PATH` environment variable that would automatically configure the right tz db path? Or is that overkill?

##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM

Review comment:
       ```suggestion
   #elif !defined(ARROW_FILESYSTEM)
   ```

##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {

Review comment:
       Typo here.

##########
File path: docs/source/cpp/build_system.rst
##########
@@ -163,3 +163,24 @@ can control the source of each dependency and whether it is statically or
 dynamically linked. See :doc:`/developers/cpp/building` for instructions. Or
 alternatively, use Arrow from a package manager such as Conda or vcpkg which
 will manage consistent versions of Arrow and its dependencies.
+
+
+Runtime Dependencies
+====================
+
+While Arrow uses the OS-provided timezone database on Linux and Mac OS, it
+requires a user-provided database on Windows. You must download and extract the
+text version of the IANA timezone database and add the Windows timezone mapping
+XML. To download, you can use the following batch script:
+
+.. literalinclude:: ../../../ci/appveyor-cpp-setup.bat
+   :language: cmd
+   :start-after: @rem (Doc section: Download timezone database)
+   :end-before: @rem (Doc section: Download timezone database)

Review comment:
       I'd rather if you copied the relevant snippet here, we shouldn't ideally rely on the contents of CI scripts (which may contain specific quirks that irrelevant to normal user setups) for the public docs.

##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -1875,6 +1870,9 @@ TEST_F(ScalarTemporalTest, StrftimeCLocale) {
 }
 
 TEST_F(ScalarTemporalTest, StrftimeOtherLocale) {
+#ifdef _WIN32
+  GTEST_SKIP() << "There is a known bug in strftime for locales on Windows (ARROW-15922)";

Review comment:
       Is this on Windows or specifically MinGW? i.e., would the non-MinGW Windows CI pass if you remove this skip? I'm wondering if we can narrow the condition.

##########
File path: docs/source/cpp/build_system.rst
##########
@@ -163,3 +163,24 @@ can control the source of each dependency and whether it is statically or
 dynamically linked. See :doc:`/developers/cpp/building` for instructions. Or
 alternatively, use Arrow from a package manager such as Conda or vcpkg which
 will manage consistent versions of Arrow and its dependencies.
+
+
+Runtime Dependencies
+====================
+
+While Arrow uses the OS-provided timezone database on Linux and Mac OS, it

Review comment:
       ```suggestion
   While Arrow uses the OS-provided timezone database on Linux and macOS, it
   ```

##########
File path: docs/source/developers/cpp/windows.rst
##########
@@ -362,6 +362,21 @@ suppress dllimport/dllexport marking of symbols. Projects that statically link
 against Arrow on Windows additionally need this definition. The Unix builds do
 not use the macro.
 
+Downloading the Timezone Database
+=================================
+
+To run some of the compute unit tests on Windows, the IANA timezone database
+and the Windows timezone mapping need to be downloaded first. To download
+them to the default search directory (in the user Downloads directory), use 

Review comment:
       Can you just cross-reference to the relevant section in the building docs above?

##########
File path: docs/source/cpp/build_system.rst
##########
@@ -163,3 +163,24 @@ can control the source of each dependency and whether it is statically or
 dynamically linked. See :doc:`/developers/cpp/building` for instructions. Or
 alternatively, use Arrow from a package manager such as Conda or vcpkg which
 will manage consistent versions of Arrow and its dependencies.
+
+
+Runtime Dependencies
+====================
+
+While Arrow uses the OS-provided timezone database on Linux and Mac OS, it
+requires a user-provided database on Windows. You must download and extract the
+text version of the IANA timezone database and add the Windows timezone mapping
+XML. To download, you can use the following batch script:
+
+.. literalinclude:: ../../../ci/appveyor-cpp-setup.bat
+   :language: cmd
+   :start-after: @rem (Doc section: Download timezone database)
+   :end-before: @rem (Doc section: Download timezone database)
+
+By default, the timezone database will be detected at ``%USERPROFILE%\Downloads\tzdata``,

Review comment:
       Is this a reliable location? Is there a risk that the downloads folder gets cleared from time to time?

##########
File path: cpp/src/arrow/config.cc
##########
@@ -73,7 +76,30 @@ RuntimeInfo GetRuntimeInfo() {
       MakeSimdLevelString([&](int64_t flags) { return cpu_info->IsSupported(flags); });
   info.detected_simd_level =
       MakeSimdLevelString([&](int64_t flags) { return cpu_info->IsDetected(flags); });
+  info.using_os_timezone_db = USE_OS_TZDB;
+#if !USE_OS_TZDB
+  info.timezone_db_path = timezone_db_path;
+#endif
   return info;
 }
 
+Status Initialize(const ArrowGlobalOptions& options) noexcept {
+  if (options.tz_db_path.has_value()) {
+#if !USE_OS_TZDB
+    try {
+      arrow_vendored::date::set_install(options.tz_db_path.value());
+      arrow_vendored::date::reload_tzdb();
+    } catch (const std::runtime_error& e) {
+      return Status::ExecutionError(e.what());
+    }
+    timezone_db_path = options.tz_db_path.value();
+#else
+    return Status::Invalid(
+        "Arrow was set to use OS timezone database at compile time, so it cannot be set "
+        "at runtime.");

Review comment:
       The error message is weird, it reads like you can set to use OS timezone database at runtime.

##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+  GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+  auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+  auto home_raw = std::getenv("USERPROFILE");
+  std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+  ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+                       fs->NormalizePath(home + "\\Downloads\\tzdata"));
+  ASSERT_OK_AND_ASSIGN(auto tzdata_path,
+                       arrow::internal::PlatformFilename::FromString(tzdata_dir));
+  if (!arrow::internal::FileExists(tzdata_path).ValueOr(false)) {
+    GTEST_SKIP() << "Couldn't find timezone database in expected dir: " << tzdata_dir;
+  }
+  // Create a tmp directory
+  ASSERT_OK_AND_ASSIGN(auto tempdir, arrow::internal::TemporaryDir::Make("tzdata"));
+
+  // Validate that setting tzdb to that dir fails
+  arrow::ArrowGlobalOptions options = {util::make_optional(tempdir->path().ToString())};
+  ASSERT_NOT_OK(arrow::Initialize(options));
+
+  // Copy tzdb data from ~/Downloads
+  auto selector = arrow::fs::FileSelector();
+  selector.base_dir = tzdata_dir;
+  selector.recursive = true;
+  ASSERT_OK(arrow::fs::CopyFiles(fs, selector, fs, tempdir->path().ToString()));
+
+  // Validate that tzdb is working
+  ASSERT_OK(arrow::Initialize(options));

Review comment:
       Hmm, the problem is that later tests may then pick up your temporary directory for the timezone db, which will then not exist anymore?

##########
File path: cpp/src/arrow/config.cc
##########
@@ -73,7 +76,30 @@ RuntimeInfo GetRuntimeInfo() {
       MakeSimdLevelString([&](int64_t flags) { return cpu_info->IsSupported(flags); });
   info.detected_simd_level =
       MakeSimdLevelString([&](int64_t flags) { return cpu_info->IsDetected(flags); });
+  info.using_os_timezone_db = USE_OS_TZDB;
+#if !USE_OS_TZDB
+  info.timezone_db_path = timezone_db_path;
+#endif
   return info;
 }
 
+Status Initialize(const ArrowGlobalOptions& options) noexcept {
+  if (options.tz_db_path.has_value()) {
+#if !USE_OS_TZDB
+    try {
+      arrow_vendored::date::set_install(options.tz_db_path.value());
+      arrow_vendored::date::reload_tzdb();
+    } catch (const std::runtime_error& e) {
+      return Status::ExecutionError(e.what());

Review comment:
       That status code is really Gandiva specific.
   ```suggestion
         return Status::IOError(e.what());
   ```

##########
File path: cpp/src/arrow/config.h
##########
@@ -77,4 +86,13 @@ const BuildInfo& GetBuildInfo();
 ARROW_EXPORT
 RuntimeInfo GetRuntimeInfo();
 
+struct ArrowGlobalOptions {

Review comment:
       Note sure it's worth putting "Arrow" here since this is in the `arrow` namespace.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r833570663



##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+  GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+  auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+  auto home_raw = std::getenv("USERPROFILE");
+  std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+  ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+                       fs->NormalizePath(home + "\\Downloads\\tzdata"));

Review comment:
       I guess for the sake of local developers testing, we should do have an environment variable. I just need to figure out where in the test files to run the Initialize() function. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#issuecomment-1081174695


   Benchmark runs are scheduled for baseline = 919d113883276e3a4f56399009d8f2527fce60d0 and contender = f4dfd6c37731eb0cd5056618e8e547066bcbfe70. f4dfd6c37731eb0cd5056618e8e547066bcbfe70 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c111763b9ad248c1963960d709b3f11c...0393bade894c45c98a6178ee87a35921/)
   [Finished :arrow_down:0.34% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/43a8b1b44ba34f289d2dd5b1e01ac7d5...9a5ecb3b3b7749fa98373e6c3ffe870b/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/566122ae01644b52b2ba4c7e119eb6ba...22dd2f8eccb74dd584e174c6eaafafbe/)
   [Finished :arrow_down:1.02% :arrow_up:0.81%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/2934d0c5d0504baeb83ff3c67da206e6...14fa3b149d6b4cebb2b834c3647c9433/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r834509328



##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+  GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+  auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+  auto home_raw = std::getenv("USERPROFILE");
+  std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+  ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+                       fs->NormalizePath(home + "\\Downloads\\tzdata"));

Review comment:
       Perhaps here would be best: https://github.com/apache/arrow/blob/ddb663b1724034f64cc53d62bd2d5a4e8fa42954/cpp/src/arrow/compute/kernels/temporal_internal.h#L47-L53




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r836650721



##########
File path: ci/appveyor-cpp-setup.bat
##########
@@ -115,3 +115,17 @@ powershell.exe -Command "Start-Process clcache-server" || exit /B
 if "%ARROW_S3%" == "ON" (
     appveyor DownloadFile https://dl.min.io/server/minio/release/windows-amd64/minio.exe -FileName C:\Windows\Minio.exe || exit /B
 )
+
+
+@rem
+@rem Download IANA Timezone Database for unit tests
+@rem
+@rem (Doc section: Download timezone database)
+curl https://data.iana.org/time-zones/releases/tzdata2021e.tar.gz --output tzdata.tar.gz

Review comment:
       We currently test arrow vs pandas tzdb in CI. We have a test for a timestamp in 2033 and if it is in DST and DST is abolished it will error if we'll be using a pre-abolishment db with arrow and post-abolishment db with pandas. This is super irrelevant and as it's a simple fix and I'm sorry for wasting your time :).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r818932698



##########
File path: r/R/arrow-package.R
##########
@@ -65,6 +65,14 @@
     # Disable multithreading on Windows
     # See https://issues.apache.org/jira/browse/ARROW-8379
     options(arrow.use_threads = FALSE)
+
+    # Try to set timezone database
+    if (requireNamespace("tzdb", quietly = TRUE)) {
+      tzdb::tzdb_initialize()
+      set_timezone_database(tzdb::tzdb_path("text"))
+    } else {
+      warning("tzdb not installed. Timezones will not be available.")
+    }

Review comment:
       @nealrichardson  @paleolimbot the R CMD CHECK doesn't seem to like us referencing `tzdb::` in `onLoad`. ([See this CI job](https://github.com/apache/arrow/runs/5400279568?check_suite_focus=true#step:6:1162).) Any thoughts on a better way to handle an optional dependency? (I think we only need this on Windows.) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r835074201



##########
File path: ci/appveyor-cpp-setup.bat
##########
@@ -115,3 +115,17 @@ powershell.exe -Command "Start-Process clcache-server" || exit /B
 if "%ARROW_S3%" == "ON" (
     appveyor DownloadFile https://dl.min.io/server/minio/release/windows-amd64/minio.exe -FileName C:\Windows\Minio.exe || exit /B
 )
+
+
+@rem
+@rem Download IANA Timezone Database for unit tests
+@rem
+@rem (Doc section: Download timezone database)
+curl https://data.iana.org/time-zones/releases/tzdata2021e.tar.gz --output tzdata.tar.gz

Review comment:
       The times in the past won't change, so unless we test against a random "now", updates to the timezone database (such as for DST policy changes) shouldn't impact those tests?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r835605950



##########
File path: r/R/arrow-package.R
##########
@@ -65,6 +65,14 @@
     # Disable multithreading on Windows
     # See https://issues.apache.org/jira/browse/ARROW-8379
     options(arrow.use_threads = FALSE)
+
+    # Try to set timezone database
+    if (requireNamespace("tzdb", quietly = TRUE)) {
+      tzdb::tzdb_initialize()
+      set_timezone_database(tzdb::tzdb_path("text"))
+    } else {
+      warning("tzdb not installed. Timezones will not be available.")

Review comment:
       Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r834468931



##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+  GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+  auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+  auto home_raw = std::getenv("USERPROFILE");
+  std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+  ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+                       fs->NormalizePath(home + "\\Downloads\\tzdata"));

Review comment:
       That's a good point. Perhaps we should punt on that for now and create a JIRA for it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r832697782



##########
File path: ci/appveyor-cpp-setup.bat
##########
@@ -115,3 +115,17 @@ powershell.exe -Command "Start-Process clcache-server" || exit /B
 if "%ARROW_S3%" == "ON" (
     appveyor DownloadFile https://dl.min.io/server/minio/release/windows-amd64/minio.exe -FileName C:\Windows\Minio.exe || exit /B
 )
+
+
+@rem
+@rem Download IANA Timezone Database for unit tests
+@rem
+@rem (Doc section: Download timezone database)
+curl https://data.iana.org/time-zones/releases/tzdata2021e.tar.gz --output tzdata.tar.gz

Review comment:
       Will this always be the same DB that R and Python will be using?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#issuecomment-1081174695


   Benchmark runs are scheduled for baseline = 919d113883276e3a4f56399009d8f2527fce60d0 and contender = f4dfd6c37731eb0cd5056618e8e547066bcbfe70. f4dfd6c37731eb0cd5056618e8e547066bcbfe70 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c111763b9ad248c1963960d709b3f11c...0393bade894c45c98a6178ee87a35921/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/43a8b1b44ba34f289d2dd5b1e01ac7d5...9a5ecb3b3b7749fa98373e6c3ffe870b/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/566122ae01644b52b2ba4c7e119eb6ba...22dd2f8eccb74dd584e174c6eaafafbe/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/2934d0c5d0504baeb83ff3c67da206e6...14fa3b149d6b4cebb2b834c3647c9433/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane closed pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
jonkeane closed pull request #12536:
URL: https://github.com/apache/arrow/pull/12536


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r834594911



##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+  GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+  auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+  auto home_raw = std::getenv("USERPROFILE");
+  std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+  ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+                       fs->NormalizePath(home + "\\Downloads\\tzdata"));

Review comment:
       I think for now I'll add it to a test fixture just for developers. If there's serious demand to make that env var available to users we can create a Jira for that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#issuecomment-1055979125


   https://issues.apache.org/jira/browse/ARROW-13168


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r833682123



##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+  GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+  auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+  auto home_raw = std::getenv("USERPROFILE");
+  std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+  ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+                       fs->NormalizePath(home + "\\Downloads\\tzdata"));
+  ASSERT_OK_AND_ASSIGN(auto tzdata_path,
+                       arrow::internal::PlatformFilename::FromString(tzdata_dir));
+  if (!arrow::internal::FileExists(tzdata_path).ValueOr(false)) {
+    GTEST_SKIP() << "Couldn't find timezone database in expected dir: " << tzdata_dir;
+  }
+  // Create a tmp directory
+  ASSERT_OK_AND_ASSIGN(auto tempdir, arrow::internal::TemporaryDir::Make("tzdata"));
+
+  // Validate that setting tzdb to that dir fails
+  arrow::ArrowGlobalOptions options = {util::make_optional(tempdir->path().ToString())};
+  ASSERT_NOT_OK(arrow::Initialize(options));
+
+  // Copy tzdb data from ~/Downloads
+  auto selector = arrow::fs::FileSelector();
+  selector.base_dir = tzdata_dir;
+  selector.recursive = true;
+  ASSERT_OK(arrow::fs::CopyFiles(fs, selector, fs, tempdir->path().ToString()));
+
+  // Validate that tzdb is working
+  ASSERT_OK(arrow::Initialize(options));

Review comment:
       Good point. I'll see if I can add cleanup code to reset the GlobalOptions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r834057377



##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+  GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+  auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+  auto home_raw = std::getenv("USERPROFILE");
+  std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+  ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+                       fs->NormalizePath(home + "\\Downloads\\tzdata"));

Review comment:
       Is it possibly to initialize lazily in the kernel instead?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r834529166



##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+  GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+  auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+  auto home_raw = std::getenv("USERPROFILE");
+  std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+  ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+                       fs->NormalizePath(home + "\\Downloads\\tzdata"));

Review comment:
       You would want to only do this only once though (`LocateZone` is called on every kernel invocation).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#issuecomment-1081216280


   Follow-up Jira created: https://issues.apache.org/jira/browse/ARROW-16054


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r834780590



##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+  GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+  auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+  auto home_raw = std::getenv("USERPROFILE");
+  std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+  ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+                       fs->NormalizePath(home + "\\Downloads\\tzdata"));

Review comment:
       https://issues.apache.org/jira/browse/ARROW-16024




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] DavisVaughan commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
DavisVaughan commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r835577044



##########
File path: ci/scripts/download_tz_database.sh
##########
@@ -0,0 +1,30 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+set -ex
+
+# Download database
+curl https://data.iana.org/time-zones/releases/tzdata2021e.tar.gz --output ~/Downloads/tzdata2021e.tar.gz

Review comment:
       FWIW, they just released 2022a a few days ago




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r833568966



##########
File path: docs/source/developers/cpp/windows.rst
##########
@@ -362,6 +362,21 @@ suppress dllimport/dllexport marking of symbols. Projects that statically link
 against Arrow on Windows additionally need this definition. The Unix builds do
 not use the macro.
 
+Downloading the Timezone Database
+=================================
+
+To run some of the compute unit tests on Windows, the IANA timezone database
+and the Windows timezone mapping need to be downloaded first. To download
+them to the default search directory (in the user Downloads directory), use 

Review comment:
       Yeah that makes sense 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r833566978



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -1875,6 +1870,9 @@ TEST_F(ScalarTemporalTest, StrftimeCLocale) {
 }
 
 TEST_F(ScalarTemporalTest, StrftimeOtherLocale) {
+#ifdef _WIN32
+  GTEST_SKIP() << "There is a known bug in strftime for locales on Windows (ARROW-15922)";

Review comment:
       That's a bit weird, MinGW is just a different compiler but targetting the same runtime libraries...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#issuecomment-1057455673


   So timezone database seems to work, but methods that rely on `std::local` currently error with:
   
   ```
   Error (test-dplyr-funcs-datetime.R:344:3): extract month from timestamp
   Error: Invalid: Cannot find locale 'English_United States.1252': locale::facet::_S_create_c_locale name not valid
   C:/Users/voltron/arrow/cpp/src/arrow/compute/kernels/scalar_temporal_unary.cc:1085  GetLocale(options.locale)
   C:/Users/voltron/arrow/cpp/src/arrow/compute/kernels/scalar_temporal_unary.cc:1105  Make(ctx, *in.type)
   C:/Users/voltron/arrow/cpp/src/arrow/compute/exec.cc:700  kernel_->exec(kernel_ctx_, batch, &out)
   C:/Users/voltron/arrow/cpp/src/arrow/compute/exec.cc:641  ExecuteBatch(batch, listener)
   C:/Users/voltron/arrow/cpp/src/arrow/compute/exec/expression.cc:547  executor->Execute(arguments, &listener)
   C:/Users/voltron/arrow/cpp/src/arrow/compute/exec/expression.cc:533  ExecuteScalarExpression(call->arguments[i], input, exec_context)
   C:/Users/voltron/arrow/cpp/src/arrow/compute/exec/project_node.cc:91  ExecuteScalarExpression(simplified_expr, target, plan()->exec_context())
   C:/Users/voltron/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:484  iterator_.Next()
   C:/Users/voltron/arrow/cpp/src/arrow/record_batch.cc:336  ReadNext(&batch)
   C:/Users/voltron/arrow/cpp/src/arrow/record_batch.cc:347  ReadAll(&batches)
   ```
   
   These do work if you set `Sys.setlocale("LC_TIME", "C")`. If I don't find a fix for this, I may consider only supporting the "C" locale on Windows. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r833309715



##########
File path: ci/appveyor-cpp-setup.bat
##########
@@ -115,3 +115,17 @@ powershell.exe -Command "Start-Process clcache-server" || exit /B
 if "%ARROW_S3%" == "ON" (
     appveyor DownloadFile https://dl.min.io/server/minio/release/windows-amd64/minio.exe -FileName C:\Windows\Minio.exe || exit /B
 )
+
+
+@rem
+@rem Download IANA Timezone Database for unit tests
+@rem
+@rem (Doc section: Download timezone database)
+curl https://data.iana.org/time-zones/releases/tzdata2021e.tar.gz --output tzdata.tar.gz

Review comment:
       Currently US and EU have DST, but they plan to abolish it soon. At that point we will have Python and R DBs with fresh DSTless times and arrow c++ using DST for tests. In isolation that's ok, but we do have tests comparing pandas and pyarrow results and similar for lubridate. It's not a big problem for sure, but if there is like a tzdata-latest.tar.gz that would be great.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r833527233



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -1875,6 +1870,9 @@ TEST_F(ScalarTemporalTest, StrftimeCLocale) {
 }
 
 TEST_F(ScalarTemporalTest, StrftimeOtherLocale) {
+#ifdef _WIN32
+  GTEST_SKIP() << "There is a known bug in strftime for locales on Windows (ARROW-15922)";

Review comment:
       If [failed on Windows 2019 C++17](https://github.com/apache/arrow/runs/5541207130?check_suite_focus=true#step:8:731), which uses MSVC. But seems like it was actually passing on MinGW. So maybe I can try skipping for just MSVC?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r835348548



##########
File path: ci/appveyor-cpp-setup.bat
##########
@@ -115,3 +115,17 @@ powershell.exe -Command "Start-Process clcache-server" || exit /B
 if "%ARROW_S3%" == "ON" (
     appveyor DownloadFile https://dl.min.io/server/minio/release/windows-amd64/minio.exe -FileName C:\Windows\Minio.exe || exit /B
 )
+
+
+@rem
+@rem Download IANA Timezone Database for unit tests
+@rem
+@rem (Doc section: Download timezone database)
+curl https://data.iana.org/time-zones/releases/tzdata2021e.tar.gz --output tzdata.tar.gz

Review comment:
       Yeah this should be fine as is. We will never be testing between timezone databases. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r834509328



##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+  GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+  auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+  auto home_raw = std::getenv("USERPROFILE");
+  std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+  ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+                       fs->NormalizePath(home + "\\Downloads\\tzdata"));

Review comment:
       Perhaps here would be best: https://github.com/apache/arrow/blob/ddb663b1724034f64cc53d62bd2d5a4e8fa42954/cpp/src/arrow/compute/kernels/temporal_internal.h#L47-L53
   I'm not 100% this will cover all the cases, but if it would it would be nice.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] DavisVaughan commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
DavisVaughan commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r835579086



##########
File path: r/R/arrow-package.R
##########
@@ -65,6 +65,14 @@
     # Disable multithreading on Windows
     # See https://issues.apache.org/jira/browse/ARROW-8379
     options(arrow.use_threads = FALSE)
+
+    # Try to set timezone database
+    if (requireNamespace("tzdb", quietly = TRUE)) {
+      tzdb::tzdb_initialize()
+      set_timezone_database(tzdb::tzdb_path("text"))
+    } else {
+      warning("tzdb not installed. Timezones will not be available.")

Review comment:
       ```suggestion
         warning("The tzdb package is not installed. Timezones will not be available.")
   ```
   
   Just a suggestion on the wording here, since tzdb is a fairly hidden and unknown package (purposefully)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#issuecomment-1081174695


   Benchmark runs are scheduled for baseline = 919d113883276e3a4f56399009d8f2527fce60d0 and contender = f4dfd6c37731eb0cd5056618e8e547066bcbfe70. f4dfd6c37731eb0cd5056618e8e547066bcbfe70 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c111763b9ad248c1963960d709b3f11c...0393bade894c45c98a6178ee87a35921/)
   [Finished :arrow_down:0.34% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/43a8b1b44ba34f289d2dd5b1e01ac7d5...9a5ecb3b3b7749fa98373e6c3ffe870b/)
   [Finished :arrow_down:0.36% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/566122ae01644b52b2ba4c7e119eb6ba...22dd2f8eccb74dd584e174c6eaafafbe/)
   [Finished :arrow_down:1.02% :arrow_up:0.81%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/2934d0c5d0504baeb83ff3c67da206e6...14fa3b149d6b4cebb2b834c3647c9433/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r826177969



##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -1875,6 +1870,9 @@ TEST_F(ScalarTemporalTest, StrftimeCLocale) {
 }
 
 TEST_F(ScalarTemporalTest, StrftimeOtherLocale) {
+#ifdef WIN32
+  GTEST_SKIP() << "There is a known bug in strftime for locales on Windows (ARROW-15922)";
+#else

Review comment:
       This isn't being skipped on AMD64 Windows?! https://github.com/wjones127/arrow/runs/5539946289?check_suite_focus=true#step:8:730




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r834396274



##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+  GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+  auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+  auto home_raw = std::getenv("USERPROFILE");
+  std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+  ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+                       fs->NormalizePath(home + "\\Downloads\\tzdata"));

Review comment:
       I haven't worked much in the compute codebase. @rok any suggestion for a good place to lazy initialize the timezone database?
   
   I think there's several kernels across scalar_temporal_binary and scalar_temporal_unary that use the db. And I wouldn't want to make it easy to accidentally use the timezone db before it had been initialized.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#issuecomment-1075675286


   cc @pitrou 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r834691833



##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+  GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+  auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+  auto home_raw = std::getenv("USERPROFILE");
+  std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+  ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+                       fs->NormalizePath(home + "\\Downloads\\tzdata"));

Review comment:
       > You would want to only do this only once though (LocateZone is called on every kernel invocation).
   
   Another worry I have about using that is I think it's called from multiple threads, potentially simultaneously, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r819003299



##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -184,13 +181,16 @@ test_that("strftime", {
 
   # This check is due to differences in the way %c currently works in Arrow and R's strftime.
   # We can revisit after https://github.com/HowardHinnant/date/issues/704 is resolved.
-  expect_error(
-    times %>%
-      Table$create() %>%
-      mutate(x = strftime(datetime, format = "%c")) %>%
-      collect(),
-    "%c flag is not supported in non-C locales."
-  )
+  # Skip on Windows because it must use C locale.
+  if (tolower(Sys.info()[["sysname"]]) != "windows") {

Review comment:
       use `skip_on_os("windows")`? or perhaps better, check that locale != C




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r819002496



##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -15,6 +15,15 @@
 # specific language governing permissions and limitations
 # under the License.
 
+check_locale <- function(locale = Sys.getlocale("LC_TIME")) {

Review comment:
       It looks like this is only about LC_TIME and its use in strftime, so let's name the function and revise the error message to reflect that




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#issuecomment-1081174695


   Benchmark runs are scheduled for baseline = 919d113883276e3a4f56399009d8f2527fce60d0 and contender = f4dfd6c37731eb0cd5056618e8e547066bcbfe70. f4dfd6c37731eb0cd5056618e8e547066bcbfe70 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c111763b9ad248c1963960d709b3f11c...0393bade894c45c98a6178ee87a35921/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/43a8b1b44ba34f289d2dd5b1e01ac7d5...9a5ecb3b3b7749fa98373e6c3ffe870b/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/566122ae01644b52b2ba4c7e119eb6ba...22dd2f8eccb74dd584e174c6eaafafbe/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/2934d0c5d0504baeb83ff3c67da206e6...14fa3b149d6b4cebb2b834c3647c9433/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] paleolimbot commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r818943619



##########
File path: r/R/arrow-package.R
##########
@@ -65,6 +65,14 @@
     # Disable multithreading on Windows
     # See https://issues.apache.org/jira/browse/ARROW-8379
     options(arrow.use_threads = FALSE)
+
+    # Try to set timezone database
+    if (requireNamespace("tzdb", quietly = TRUE)) {
+      tzdb::tzdb_initialize()
+      set_timezone_database(tzdb::tzdb_path("text"))
+    } else {
+      warning("tzdb not installed. Timezones will not be available.")
+    }

Review comment:
       I think you need to add `tzdb` to `Suggests:` (which lives in the `DESCRIPTION` file at the top level of the package)!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r834396274



##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
   ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
 }
 
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+  GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+  GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+  auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+  auto home_raw = std::getenv("USERPROFILE");
+  std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+  ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+                       fs->NormalizePath(home + "\\Downloads\\tzdata"));

Review comment:
       I haven't worked much in the compute codebase. @rok any suggestion for a good place to lazy initialize the timezone database?
   
   I think there's several kernals across scalar_temporal_binary and scalar_temporal_unary that use the db. And I wouldn't want to make it easy to accidentally use the timezone db before it had been initialized.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r834801280



##########
File path: cpp/src/arrow/config.cc
##########
@@ -73,7 +76,32 @@ RuntimeInfo GetRuntimeInfo() {
       MakeSimdLevelString([&](int64_t flags) { return cpu_info->IsSupported(flags); });
   info.detected_simd_level =
       MakeSimdLevelString([&](int64_t flags) { return cpu_info->IsDetected(flags); });
+  info.using_os_timezone_db = USE_OS_TZDB;
+#if !USE_OS_TZDB
+  info.timezone_db_path = timezone_db_path;
+#else
+  info.timezone_db_path = util::optional<std::string>();
+#endif
   return info;
 }
 
+Status Initialize(const GlobalOptions& options) noexcept {
+  if (options.timezone_db_path.has_value()) {
+#if !USE_OS_TZDB
+    try {
+      arrow_vendored::date::set_install(options.timezone_db_path.value());
+      arrow_vendored::date::reload_tzdb();
+    } catch (const std::runtime_error& e) {
+      return Status::IOError(e.what());
+    }
+    timezone_db_path = options.timezone_db_path.value();
+#else
+    return Status::Invalid(
+        "Arrow was set to use OS timezone database at compile time, so a downloaded database "
+        "cannot be provided at runtime.");

Review comment:
       ```suggestion
       return Status::Invalid(
           "Arrow was set to use OS timezone database at compile time, "
           "so a downloaded database cannot be provided at runtime.");
   ```
   
   to appease the linter

##########
File path: r/src/config.cpp
##########
@@ -34,4 +35,16 @@ std::vector<std::string> runtime_info() {
   return {info.simd_level, info.detected_simd_level};
 }
 
+// [[arrow::export]]
+void set_timezone_database(cpp11::strings path) {
+  auto paths = cpp11::as_cpp<std::vector<std::string>>(path);
+  if (path.size() != 1) {
+    cpp11::stop("Must provide a single path to the timezone database.");
+  }
+
+  arrow::ArrowGlobalOptions options;

Review comment:
       ```suggestion
     arrow::GlobalOptions options;
   ```
   
   Just looking at the CI failures, and I think this is a typo?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r833518046



##########
File path: docs/source/cpp/build_system.rst
##########
@@ -163,3 +163,24 @@ can control the source of each dependency and whether it is statically or
 dynamically linked. See :doc:`/developers/cpp/building` for instructions. Or
 alternatively, use Arrow from a package manager such as Conda or vcpkg which
 will manage consistent versions of Arrow and its dependencies.
+
+
+Runtime Dependencies
+====================
+
+While Arrow uses the OS-provided timezone database on Linux and Mac OS, it
+requires a user-provided database on Windows. You must download and extract the
+text version of the IANA timezone database and add the Windows timezone mapping
+XML. To download, you can use the following batch script:
+
+.. literalinclude:: ../../../ci/appveyor-cpp-setup.bat
+   :language: cmd
+   :start-after: @rem (Doc section: Download timezone database)
+   :end-before: @rem (Doc section: Download timezone database)
+
+By default, the timezone database will be detected at ``%USERPROFILE%\Downloads\tzdata``,

Review comment:
       This is the behavior of the vendored datetime library, not something we chose. I think it's a fine default for testing and in production applications I expect users will manually specify a more appropriate path at runtime.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on a change in pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r833568800



##########
File path: docs/source/cpp/build_system.rst
##########
@@ -163,3 +163,24 @@ can control the source of each dependency and whether it is statically or
 dynamically linked. See :doc:`/developers/cpp/building` for instructions. Or
 alternatively, use Arrow from a package manager such as Conda or vcpkg which
 will manage consistent versions of Arrow and its dependencies.
+
+
+Runtime Dependencies
+====================
+
+While Arrow uses the OS-provided timezone database on Linux and Mac OS, it
+requires a user-provided database on Windows. You must download and extract the
+text version of the IANA timezone database and add the Windows timezone mapping
+XML. To download, you can use the following batch script:
+
+.. literalinclude:: ../../../ci/appveyor-cpp-setup.bat
+   :language: cmd
+   :start-after: @rem (Doc section: Download timezone database)
+   :end-before: @rem (Doc section: Download timezone database)

Review comment:
       Since we are pulling a specific part of the script, we should easily be able to avoid quirks?  I like the idea of having our build instructions tested in CI. And we reference the CI scripts regularly when providing build instructions.
   
   At the very least, if we get to a point where the public instructions and CI instructions diverge, we can easily separate them. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wjones127 commented on pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
wjones127 commented on pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#issuecomment-1075674611


   CI failure is unrelated Flight error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #12536: ARROW-13168: [C++][R] Enable runtime timezone database for Windows

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#issuecomment-1081074008


   Are there any outstanding comments that we need to resolve before merging? 
   
   The failures in CI both look unrelated. I'm happy to merge if no one objects


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org