You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/14 15:56:22 UTC

[GitHub] [arrow] pitrou opened a new pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

pitrou opened a new pull request #12624:
URL: https://github.com/apache/arrow/pull/12624


   List and describe the environment variables which influence the behaviour of Arrow C++ at runtime.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r826121766



##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;
+   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
+   - ``warn`` prints a warning on stderr and continues execution;
+   - an empty value disables memory checks.
+
+   .. note::
+      While this functionality can be useful and has little overhead, it
+      is not a replacement for more sophisticated memory checking utilities
+      such as `Valgrind <https://valgrind.org/>`_ or
+      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.
+
+.. envvar:: ARROW_DEFAULT_MEMORY_POOL
+
+   The backend to be used for the default :ref:`memory pool <cpp_memory_pool>`.
+   Possible values are among ``jemalloc``, ``mimalloc`` and ``system``,
+   depending on which backends were enabled when
+   :ref:`building Arrow C++ <building-arrow-cpp>`.
+
+.. envvar:: ARROW_LIBHDFS_DIR
+
+   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
+   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
+   Alternatively, one can set :envvar:`HADOOP_HOME`.
+
+.. envvar:: ARROW_TRACING_BACKEND
+
+   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based
+   execution traces.  Possible values are:
+
+   - ``ostream``: emit textual log messages to stdout;
+   - ``otlp_http``: emit JSON traces to a HTTP server (by default, the endpoint
+     URL is "http://localhost:4318/v1/traces");
+   - ``arrow_otlp_stdout``: emit JSON traces to stdout;
+   - ``arrow_otlp_stderr``: emit JSON traces to stderr.
+
+   This environment variable has no effect if Arrow C++ was not built with
+   tracing enabled.
+
+   .. seealso::
+
+      `OpenTelemetry configuration for remote endpoints
+      <https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md>`__
+
+.. envvar:: ARROW_USER_SIMD_LEVEL
+
+   The SIMD optimization level to select.  By default, Arrow C++ detects
+   the capabilities of the current CPU at runtime and chooses the best
+   execution paths based on that information.  One can override the detection
+   by setting this environment variable to a well-defined value.
+   Supported values are:
+
+   - ``NONE`` disables any runtime-selected SIMD optimization;
+   - ``SSE4.2`` enables any SSE2-based optimizations until SSE4.2 (included);
+   - ``AVX`` enables any AVX-based optimizations and earlier;
+   - ``AVX2`` enables any AVX2-based optimizations and earlier;
+   - ``AVX512`` enables any AVX512-based optimizations and earlier.
+
+   This environment variable only has an effect on x86 platforms.  Other
+   platforms currently do not implement any form of runtime dispatch.
+
+   .. note::
+      In addition to runtime dispatch, the compile-time SIMD level can
+      be set using the ``ARROW_SIMD_LEVEL`` CMake configuration variable.
+      Unlike runtime dispatch, compile-time SIMD optimizations cannot be
+      changed at runtime (for example, if you compile Arrow C++ with AVX512
+      enabled, the resulting binary will only run on AVX512-enabled CPUs).
+
+.. envvar:: GANDIVA_CACHE_SIZE
+
+   The number of entries to keep in the Gandiva JIT compilation cache.
+   The cache is in-memory and does not persist accross processes.
+
+.. envvar:: HADOOP_HOME
+
+   The path to the Hadoop installation.
+
+.. envvar:: JAVA_HOME
+
+   The path to the Java Runtime Environment installation.  This may be
+   required for HDFS support if Java is installed in a non-standard location.
+
+.. envvar:: OMP_NUM_THREADS
+
+   The number of worker threads in the global (process-wide) CPU thread pool.
+   If this environment variable is not defined, the available hardware
+   concurrency is determined using a platform-specific routine.

Review comment:
       There is a function, though not an env var: https://github.com/apache/arrow/blob/5cb5afc40547b4f75739e31ff8632c71a10d3084/cpp/src/arrow/io/type_fwd.h#L46-L52
   
   Also, hmm. https://github.com/apache/arrow/blob/5cb5afc40547b4f75739e31ff8632c71a10d3084/r/src/threadpool.cpp#L51-L57




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r826122992



##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;
+   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
+   - ``warn`` prints a warning on stderr and continues execution;
+   - an empty value disables memory checks.
+
+   .. note::
+      While this functionality can be useful and has little overhead, it
+      is not a replacement for more sophisticated memory checking utilities
+      such as `Valgrind <https://valgrind.org/>`_ or
+      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.
+
+.. envvar:: ARROW_DEFAULT_MEMORY_POOL
+
+   The backend to be used for the default :ref:`memory pool <cpp_memory_pool>`.
+   Possible values are among ``jemalloc``, ``mimalloc`` and ``system``,
+   depending on which backends were enabled when
+   :ref:`building Arrow C++ <building-arrow-cpp>`.
+
+.. envvar:: ARROW_LIBHDFS_DIR
+
+   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
+   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
+   Alternatively, one can set :envvar:`HADOOP_HOME`.
+
+.. envvar:: ARROW_TRACING_BACKEND
+
+   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based
+   execution traces.  Possible values are:
+
+   - ``ostream``: emit textual log messages to stdout;
+   - ``otlp_http``: emit JSON traces to a HTTP server (by default, the endpoint
+     URL is "http://localhost:4318/v1/traces");
+   - ``arrow_otlp_stdout``: emit JSON traces to stdout;
+   - ``arrow_otlp_stderr``: emit JSON traces to stderr.
+
+   This environment variable has no effect if Arrow C++ was not built with
+   tracing enabled.
+
+   .. seealso::
+
+      `OpenTelemetry configuration for remote endpoints
+      <https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md>`__
+
+.. envvar:: ARROW_USER_SIMD_LEVEL
+
+   The SIMD optimization level to select.  By default, Arrow C++ detects
+   the capabilities of the current CPU at runtime and chooses the best
+   execution paths based on that information.  One can override the detection
+   by setting this environment variable to a well-defined value.
+   Supported values are:
+
+   - ``NONE`` disables any runtime-selected SIMD optimization;
+   - ``SSE4.2`` enables any SSE2-based optimizations until SSE4.2 (included);
+   - ``AVX`` enables any AVX-based optimizations and earlier;
+   - ``AVX2`` enables any AVX2-based optimizations and earlier;
+   - ``AVX512`` enables any AVX512-based optimizations and earlier.
+
+   This environment variable only has an effect on x86 platforms.  Other
+   platforms currently do not implement any form of runtime dispatch.
+
+   .. note::
+      In addition to runtime dispatch, the compile-time SIMD level can
+      be set using the ``ARROW_SIMD_LEVEL`` CMake configuration variable.
+      Unlike runtime dispatch, compile-time SIMD optimizations cannot be
+      changed at runtime (for example, if you compile Arrow C++ with AVX512
+      enabled, the resulting binary will only run on AVX512-enabled CPUs).
+
+.. envvar:: GANDIVA_CACHE_SIZE
+
+   The number of entries to keep in the Gandiva JIT compilation cache.
+   The cache is in-memory and does not persist accross processes.
+
+.. envvar:: HADOOP_HOME
+
+   The path to the Hadoop installation.
+
+.. envvar:: JAVA_HOME
+
+   The path to the Java Runtime Environment installation.  This may be
+   required for HDFS support if Java is installed in a non-standard location.
+
+.. envvar:: OMP_NUM_THREADS
+
+   The number of worker threads in the global (process-wide) CPU thread pool.
+   If this environment variable is not defined, the available hardware
+   concurrency is determined using a platform-specific routine.

Review comment:
       :-D. Do you want to open a JIRA for ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r827088337



##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;
+   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
+   - ``warn`` prints a warning on stderr and continues execution;
+   - an empty value disables memory checks.
+
+   .. note::
+      While this functionality can be useful and has little overhead, it
+      is not a replacement for more sophisticated memory checking utilities
+      such as `Valgrind <https://valgrind.org/>`_ or
+      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.
+
+.. envvar:: ARROW_DEFAULT_MEMORY_POOL
+
+   The backend to be used for the default :ref:`memory pool <cpp_memory_pool>`.
+   Possible values are among ``jemalloc``, ``mimalloc`` and ``system``,
+   depending on which backends were enabled when
+   :ref:`building Arrow C++ <building-arrow-cpp>`.
+
+.. envvar:: ARROW_LIBHDFS_DIR
+
+   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
+   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
+   Alternatively, one can set :envvar:`HADOOP_HOME`.
+
+.. envvar:: ARROW_TRACING_BACKEND
+
+   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based
+   execution traces.  Possible values are:
+
+   - ``ostream``: emit textual log messages to stdout;
+   - ``otlp_http``: emit JSON traces to a HTTP server (by default, the endpoint
+     URL is "http://localhost:4318/v1/traces");
+   - ``arrow_otlp_stdout``: emit JSON traces to stdout;
+   - ``arrow_otlp_stderr``: emit JSON traces to stderr.
+
+   This environment variable has no effect if Arrow C++ was not built with
+   tracing enabled.
+
+   .. seealso::
+
+      `OpenTelemetry configuration for remote endpoints
+      <https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md>`__
+
+.. envvar:: ARROW_USER_SIMD_LEVEL
+
+   The SIMD optimization level to select.  By default, Arrow C++ detects
+   the capabilities of the current CPU at runtime and chooses the best
+   execution paths based on that information.  One can override the detection
+   by setting this environment variable to a well-defined value.
+   Supported values are:
+
+   - ``NONE`` disables any runtime-selected SIMD optimization;
+   - ``SSE4.2`` enables any SSE2-based optimizations until SSE4.2 (included);
+   - ``AVX`` enables any AVX-based optimizations and earlier;
+   - ``AVX2`` enables any AVX2-based optimizations and earlier;
+   - ``AVX512`` enables any AVX512-based optimizations and earlier.
+
+   This environment variable only has an effect on x86 platforms.  Other
+   platforms currently do not implement any form of runtime dispatch.
+
+   .. note::
+      In addition to runtime dispatch, the compile-time SIMD level can
+      be set using the ``ARROW_SIMD_LEVEL`` CMake configuration variable.
+      Unlike runtime dispatch, compile-time SIMD optimizations cannot be
+      changed at runtime (for example, if you compile Arrow C++ with AVX512
+      enabled, the resulting binary will only run on AVX512-enabled CPUs).
+
+.. envvar:: GANDIVA_CACHE_SIZE
+
+   The number of entries to keep in the Gandiva JIT compilation cache.
+   The cache is in-memory and does not persist accross processes.
+
+.. envvar:: HADOOP_HOME
+
+   The path to the Hadoop installation.
+
+.. envvar:: JAVA_HOME
+
+   The path to the Java Runtime Environment installation.  This may be
+   required for HDFS support if Java is installed in a non-standard location.
+
+.. envvar:: OMP_NUM_THREADS
+
+   The number of worker threads in the global (process-wide) CPU thread pool.
+   If this environment variable is not defined, the available hardware
+   concurrency is determined using a platform-specific routine.

Review comment:
       At the moment I think it is very important this is configurable.  So I am +1 on being able to expose this via an environment variable.  We have had at least one customer that was using S3 and benefited from setting this larger than the initial default.
   
   At some point though I think we want to move towards having I/O context / thread pools specific to the filesystem.  A single global default doesn't make a lot of sense when you might have a mix of local and remote workloads.  Even then I suppose we might still have a global default as a fallback in case the user doesn't specify anything.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#issuecomment-1068217658


   Ok, I added a dedicated PyArrow doc page about environment variables as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#issuecomment-1069325871


   Benchmark runs are scheduled for baseline = 3eaa7dd0e8b3dabc5438203331f05e3e6c011e37 and contender = 40d8e7ebf24540d64fe502de601e5cebcfc89a08. 40d8e7ebf24540d64fe502de601e5cebcfc89a08 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3faee5232cba42e98cf9b14db0ca91a2...e5daa0567b94433c8e63553b9b5cc4e3/)
   [Finished :arrow_down:0.17% :arrow_up:0.04%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/063b62469e614538b0df38221f85e8f2...8d79bb6452394c5cb3eaa9a60ff6fe1a/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/a997da7d595e4b2e945f967115596008...4943bed4d24346a4b91a5efed181c2f6/)
   [Finished :arrow_down:0.72% :arrow_up:0.77%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ff3cf4c8cb6b45c894cfba3fab4c72f2...a4732ad8194b45e5a4be1b166c8e2c9a/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r827131659



##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;
+   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
+   - ``warn`` prints a warning on stderr and continues execution;
+   - an empty value disables memory checks.
+
+   .. note::
+      While this functionality can be useful and has little overhead, it
+      is not a replacement for more sophisticated memory checking utilities
+      such as `Valgrind <https://valgrind.org/>`_ or
+      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.
+
+.. envvar:: ARROW_DEFAULT_MEMORY_POOL
+
+   The backend to be used for the default :ref:`memory pool <cpp_memory_pool>`.
+   Possible values are among ``jemalloc``, ``mimalloc`` and ``system``,
+   depending on which backends were enabled when
+   :ref:`building Arrow C++ <building-arrow-cpp>`.
+
+.. envvar:: ARROW_LIBHDFS_DIR
+
+   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
+   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
+   Alternatively, one can set :envvar:`HADOOP_HOME`.
+
+.. envvar:: ARROW_TRACING_BACKEND
+
+   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based
+   execution traces.  Possible values are:
+
+   - ``ostream``: emit textual log messages to stdout;
+   - ``otlp_http``: emit JSON traces to a HTTP server (by default, the endpoint
+     URL is "http://localhost:4318/v1/traces");
+   - ``arrow_otlp_stdout``: emit JSON traces to stdout;
+   - ``arrow_otlp_stderr``: emit JSON traces to stderr.
+
+   This environment variable has no effect if Arrow C++ was not built with
+   tracing enabled.
+
+   .. seealso::
+
+      `OpenTelemetry configuration for remote endpoints
+      <https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md>`__
+
+.. envvar:: ARROW_USER_SIMD_LEVEL
+
+   The SIMD optimization level to select.  By default, Arrow C++ detects
+   the capabilities of the current CPU at runtime and chooses the best
+   execution paths based on that information.  One can override the detection
+   by setting this environment variable to a well-defined value.
+   Supported values are:
+
+   - ``NONE`` disables any runtime-selected SIMD optimization;
+   - ``SSE4.2`` enables any SSE2-based optimizations until SSE4.2 (included);
+   - ``AVX`` enables any AVX-based optimizations and earlier;
+   - ``AVX2`` enables any AVX2-based optimizations and earlier;
+   - ``AVX512`` enables any AVX512-based optimizations and earlier.
+
+   This environment variable only has an effect on x86 platforms.  Other
+   platforms currently do not implement any form of runtime dispatch.
+
+   .. note::
+      In addition to runtime dispatch, the compile-time SIMD level can
+      be set using the ``ARROW_SIMD_LEVEL`` CMake configuration variable.
+      Unlike runtime dispatch, compile-time SIMD optimizations cannot be
+      changed at runtime (for example, if you compile Arrow C++ with AVX512
+      enabled, the resulting binary will only run on AVX512-enabled CPUs).
+
+.. envvar:: GANDIVA_CACHE_SIZE
+
+   The number of entries to keep in the Gandiva JIT compilation cache.
+   The cache is in-memory and does not persist accross processes.

Review comment:
       Unfortunately there's no user-facing documentation for Gandiva.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#issuecomment-1069325871


   Benchmark runs are scheduled for baseline = 3eaa7dd0e8b3dabc5438203331f05e3e6c011e37 and contender = 40d8e7ebf24540d64fe502de601e5cebcfc89a08. 40d8e7ebf24540d64fe502de601e5cebcfc89a08 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3faee5232cba42e98cf9b14db0ca91a2...e5daa0567b94433c8e63553b9b5cc4e3/)
   [Finished :arrow_down:0.17% :arrow_up:0.04%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/063b62469e614538b0df38221f85e8f2...8d79bb6452394c5cb3eaa9a60ff6fe1a/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/a997da7d595e4b2e945f967115596008...4943bed4d24346a4b91a5efed181c2f6/)
   [Finished :arrow_down:0.72% :arrow_up:0.77%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ff3cf4c8cb6b45c894cfba3fab4c72f2...a4732ad8194b45e5a4be1b166c8e2c9a/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#issuecomment-1068184926


   > These environment variables apply when running python, R, etc. as well. Do we want to add a small snippet referencing this page in the documentation for those languages as well?
   
   Ideally but I'm not sure where to put that. The Python docs don't have a natural place for it currently. As for the R docs, I'd rather leave this to the R developers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r829142780



##########
File path: docs/source/python/env_vars.rst
##########
@@ -0,0 +1,63 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. currentmodule:: pyarrow
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+PyArrow.
+
+.. envvar:: ARROW_HOME
+
+   The base path to the PyArrow installation.  This variable overrides the
+   default computation of library paths in introspection functions such
+   as :func:`get_library_dirs`.
+
+.. envvar:: ARROW_PRE_0_15_IPC_FORMAT
+
+   If this environment variable is set to a non-zero integer value, the PyArrow
+   IPC writer will default to the pre-0.15 Arrow IPC format.
+   This behavior can also be enabled using :attr:`IpcWriteOptions.use_legacy_format`.
+
+.. envvar:: ARROW_PRE_1_0_METADATA_VERSION
+
+   If this environment variable is set to a non-zero integer value, the PyArrow
+   IPC writer will write V4 Arrow metadata (corresponding to pre-1.0 Arrow
+   with an incompatible Union data layout).
+   This behavior can also be enabled using :attr:`IpcWriteOptions.metadata_version`.
+
+.. envvar:: PKG_CONFIG
+
+   The path to the ``pkg-config`` executable.  This may be required for
+   proper functioning of introspection functions such as
+   :func:`get_library_dirs` if ``pkg-config`` is not available on the system
+   ``PATH``.
+
+.. envvar:: PYARROW_IGNORE_TIMEZONE
+
+   By default, PyArrow propagates the timezone value when converting
+   Arrow data to/from Python datetime objects. If this environment variable
+   is set to a non-empty value, the timezone is not propagated.

Review comment:
       I am not 100% sure it's only for Spark, but I would also say that if we wanted to expose this as an option for the conversion for general use, it should be an argument to conversion functions (as we already have others), and not controlled through an environment variable. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r828832212



##########
File path: docs/source/python/env_vars.rst
##########
@@ -0,0 +1,63 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. currentmodule:: pyarrow
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+PyArrow.
+
+.. envvar:: ARROW_HOME
+
+   The base path to the PyArrow installation.  This variable overrides the
+   default computation of library paths in introspection functions such
+   as :func:`get_library_dirs`.
+
+.. envvar:: ARROW_PRE_0_15_IPC_FORMAT
+
+   If this environment variable is set to a non-zero integer value, the PyArrow
+   IPC writer will default to the pre-0.15 Arrow IPC format.
+   This behavior can also be enabled using :attr:`IpcWriteOptions.use_legacy_format`.
+
+.. envvar:: ARROW_PRE_1_0_METADATA_VERSION
+
+   If this environment variable is set to a non-zero integer value, the PyArrow
+   IPC writer will write V4 Arrow metadata (corresponding to pre-1.0 Arrow
+   with an incompatible Union data layout).
+   This behavior can also be enabled using :attr:`IpcWriteOptions.metadata_version`.
+
+.. envvar:: PKG_CONFIG
+
+   The path to the ``pkg-config`` executable.  This may be required for
+   proper functioning of introspection functions such as
+   :func:`get_library_dirs` if ``pkg-config`` is not available on the system
+   ``PATH``.
+
+.. envvar:: PYARROW_IGNORE_TIMEZONE
+
+   By default, PyArrow propagates the timezone value when converting
+   Arrow data to/from Python datetime objects. If this environment variable
+   is set to a non-empty value, the timezone is not propagated.

Review comment:
       _If_ this environment variable was mainly for pyspark compatibility, and if the plan still is too remove this once it is solved on the pyspark side, we should not maybe not document it? (because that would only encourage others to also start making use of it)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #12624:
URL: https://github.com/apache/arrow/pull/12624


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r829142780



##########
File path: docs/source/python/env_vars.rst
##########
@@ -0,0 +1,63 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. currentmodule:: pyarrow
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+PyArrow.
+
+.. envvar:: ARROW_HOME
+
+   The base path to the PyArrow installation.  This variable overrides the
+   default computation of library paths in introspection functions such
+   as :func:`get_library_dirs`.
+
+.. envvar:: ARROW_PRE_0_15_IPC_FORMAT
+
+   If this environment variable is set to a non-zero integer value, the PyArrow
+   IPC writer will default to the pre-0.15 Arrow IPC format.
+   This behavior can also be enabled using :attr:`IpcWriteOptions.use_legacy_format`.
+
+.. envvar:: ARROW_PRE_1_0_METADATA_VERSION
+
+   If this environment variable is set to a non-zero integer value, the PyArrow
+   IPC writer will write V4 Arrow metadata (corresponding to pre-1.0 Arrow
+   with an incompatible Union data layout).
+   This behavior can also be enabled using :attr:`IpcWriteOptions.metadata_version`.
+
+.. envvar:: PKG_CONFIG
+
+   The path to the ``pkg-config`` executable.  This may be required for
+   proper functioning of introspection functions such as
+   :func:`get_library_dirs` if ``pkg-config`` is not available on the system
+   ``PATH``.
+
+.. envvar:: PYARROW_IGNORE_TIMEZONE
+
+   By default, PyArrow propagates the timezone value when converting
+   Arrow data to/from Python datetime objects. If this environment variable
+   is set to a non-empty value, the timezone is not propagated.

Review comment:
       I am not 100% sure it's only for Spark, but I would also say that if we wanted to expose this as an option for the conversion for general use, it should be an argument to conversion functions (as we already have others), and not controlled through an environment variable. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r828831453



##########
File path: docs/source/python/env_vars.rst
##########
@@ -0,0 +1,63 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. currentmodule:: pyarrow
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+PyArrow.
+
+.. envvar:: ARROW_HOME
+
+   The base path to the PyArrow installation.  This variable overrides the
+   default computation of library paths in introspection functions such
+   as :func:`get_library_dirs`.
+
+.. envvar:: ARROW_PRE_0_15_IPC_FORMAT
+
+   If this environment variable is set to a non-zero integer value, the PyArrow
+   IPC writer will default to the pre-0.15 Arrow IPC format.
+   This behavior can also be enabled using :attr:`IpcWriteOptions.use_legacy_format`.
+
+.. envvar:: ARROW_PRE_1_0_METADATA_VERSION
+
+   If this environment variable is set to a non-zero integer value, the PyArrow
+   IPC writer will write V4 Arrow metadata (corresponding to pre-1.0 Arrow
+   with an incompatible Union data layout).
+   This behavior can also be enabled using :attr:`IpcWriteOptions.metadata_version`.
+
+.. envvar:: PKG_CONFIG
+
+   The path to the ``pkg-config`` executable.  This may be required for
+   proper functioning of introspection functions such as
+   :func:`get_library_dirs` if ``pkg-config`` is not available on the system
+   ``PATH``.
+
+.. envvar:: PYARROW_IGNORE_TIMEZONE
+
+   By default, PyArrow propagates the timezone value when converting
+   Arrow data to/from Python datetime objects. If this environment variable
+   is set to a non-empty value, the timezone is not propagated.

Review comment:
       That sounds correct, yes. Now, checking the code, I see this note:
   
   https://github.com/apache/arrow/blob/ecf8c753c0e9552fbf54fd79c48e58d382efcba8/cpp/src/arrow/python/python_to_arrow.h#L56-L59
   
   Although I see that spark is still using the env variable (cc @BryanCutler). In the spark code (https://github.com/apache/spark/pull/30111) it points to https://issues.apache.org/jira/browse/SPARK-32285, which is not yet resolved. 
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#issuecomment-1069325871


   Benchmark runs are scheduled for baseline = 3eaa7dd0e8b3dabc5438203331f05e3e6c011e37 and contender = 40d8e7ebf24540d64fe502de601e5cebcfc89a08. 40d8e7ebf24540d64fe502de601e5cebcfc89a08 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3faee5232cba42e98cf9b14db0ca91a2...e5daa0567b94433c8e63553b9b5cc4e3/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/063b62469e614538b0df38221f85e8f2...8d79bb6452394c5cb3eaa9a60ff6fe1a/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/a997da7d595e4b2e945f967115596008...4943bed4d24346a4b91a5efed181c2f6/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ff3cf4c8cb6b45c894cfba3fab4c72f2...a4732ad8194b45e5a4be1b166c8e2c9a/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r827133513



##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;
+   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
+   - ``warn`` prints a warning on stderr and continues execution;
+   - an empty value disables memory checks.
+
+   .. note::
+      While this functionality can be useful and has little overhead, it
+      is not a replacement for more sophisticated memory checking utilities
+      such as `Valgrind <https://valgrind.org/>`_ or
+      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.
+
+.. envvar:: ARROW_DEFAULT_MEMORY_POOL
+
+   The backend to be used for the default :ref:`memory pool <cpp_memory_pool>`.
+   Possible values are among ``jemalloc``, ``mimalloc`` and ``system``,
+   depending on which backends were enabled when
+   :ref:`building Arrow C++ <building-arrow-cpp>`.
+
+.. envvar:: ARROW_LIBHDFS_DIR
+
+   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
+   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
+   Alternatively, one can set :envvar:`HADOOP_HOME`.
+
+.. envvar:: ARROW_TRACING_BACKEND
+
+   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based
+   execution traces.  Possible values are:
+
+   - ``ostream``: emit textual log messages to stdout;
+   - ``otlp_http``: emit JSON traces to a HTTP server (by default, the endpoint
+     URL is "http://localhost:4318/v1/traces");
+   - ``arrow_otlp_stdout``: emit JSON traces to stdout;
+   - ``arrow_otlp_stderr``: emit JSON traces to stderr.
+
+   This environment variable has no effect if Arrow C++ was not built with
+   tracing enabled.
+
+   .. seealso::
+
+      `OpenTelemetry configuration for remote endpoints
+      <https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md>`__
+
+.. envvar:: ARROW_USER_SIMD_LEVEL
+
+   The SIMD optimization level to select.  By default, Arrow C++ detects
+   the capabilities of the current CPU at runtime and chooses the best
+   execution paths based on that information.  One can override the detection
+   by setting this environment variable to a well-defined value.
+   Supported values are:
+
+   - ``NONE`` disables any runtime-selected SIMD optimization;
+   - ``SSE4.2`` enables any SSE2-based optimizations until SSE4.2 (included);
+   - ``AVX`` enables any AVX-based optimizations and earlier;
+   - ``AVX2`` enables any AVX2-based optimizations and earlier;
+   - ``AVX512`` enables any AVX512-based optimizations and earlier.
+
+   This environment variable only has an effect on x86 platforms.  Other
+   platforms currently do not implement any form of runtime dispatch.
+
+   .. note::
+      In addition to runtime dispatch, the compile-time SIMD level can
+      be set using the ``ARROW_SIMD_LEVEL`` CMake configuration variable.
+      Unlike runtime dispatch, compile-time SIMD optimizations cannot be
+      changed at runtime (for example, if you compile Arrow C++ with AVX512
+      enabled, the resulting binary will only run on AVX512-enabled CPUs).

Review comment:
       `ARROW_SIMD_LEVEL` determines the compiler flags used when building Arrow C++, so it functions as a baseline for the CPU requirements. Even if you set `ARROW_USER_SIMD_LEVEL` to a lower value, the compile-time optimizations enabled by `ARROW_SIMD_LEVEL` will still drive the CPU requirements (hence the example in parentheses).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r826117252



##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;
+   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
+   - ``warn`` prints a warning on stderr and continues execution;
+   - an empty value disables memory checks.
+
+   .. note::
+      While this functionality can be useful and has little overhead, it
+      is not a replacement for more sophisticated memory checking utilities
+      such as `Valgrind <https://valgrind.org/>`_ or
+      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.
+
+.. envvar:: ARROW_DEFAULT_MEMORY_POOL
+
+   The backend to be used for the default :ref:`memory pool <cpp_memory_pool>`.
+   Possible values are among ``jemalloc``, ``mimalloc`` and ``system``,
+   depending on which backends were enabled when
+   :ref:`building Arrow C++ <building-arrow-cpp>`.
+
+.. envvar:: ARROW_LIBHDFS_DIR
+
+   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
+   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
+   Alternatively, one can set :envvar:`HADOOP_HOME`.
+
+.. envvar:: ARROW_TRACING_BACKEND
+
+   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based
+   execution traces.  Possible values are:
+
+   - ``ostream``: emit textual log messages to stdout;
+   - ``otlp_http``: emit JSON traces to a HTTP server (by default, the endpoint
+     URL is "http://localhost:4318/v1/traces");
+   - ``arrow_otlp_stdout``: emit JSON traces to stdout;
+   - ``arrow_otlp_stderr``: emit JSON traces to stderr.
+
+   This environment variable has no effect if Arrow C++ was not built with
+   tracing enabled.
+
+   .. seealso::
+
+      `OpenTelemetry configuration for remote endpoints
+      <https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md>`__
+
+.. envvar:: ARROW_USER_SIMD_LEVEL
+
+   The SIMD optimization level to select.  By default, Arrow C++ detects
+   the capabilities of the current CPU at runtime and chooses the best
+   execution paths based on that information.  One can override the detection
+   by setting this environment variable to a well-defined value.
+   Supported values are:
+
+   - ``NONE`` disables any runtime-selected SIMD optimization;
+   - ``SSE4.2`` enables any SSE2-based optimizations until SSE4.2 (included);
+   - ``AVX`` enables any AVX-based optimizations and earlier;
+   - ``AVX2`` enables any AVX2-based optimizations and earlier;
+   - ``AVX512`` enables any AVX512-based optimizations and earlier.
+
+   This environment variable only has an effect on x86 platforms.  Other
+   platforms currently do not implement any form of runtime dispatch.
+
+   .. note::
+      In addition to runtime dispatch, the compile-time SIMD level can
+      be set using the ``ARROW_SIMD_LEVEL`` CMake configuration variable.
+      Unlike runtime dispatch, compile-time SIMD optimizations cannot be
+      changed at runtime (for example, if you compile Arrow C++ with AVX512
+      enabled, the resulting binary will only run on AVX512-enabled CPUs).
+
+.. envvar:: GANDIVA_CACHE_SIZE
+
+   The number of entries to keep in the Gandiva JIT compilation cache.
+   The cache is in-memory and does not persist accross processes.
+
+.. envvar:: HADOOP_HOME
+
+   The path to the Hadoop installation.
+
+.. envvar:: JAVA_HOME
+
+   The path to the Java Runtime Environment installation.  This may be
+   required for HDFS support if Java is installed in a non-standard location.
+
+.. envvar:: OMP_NUM_THREADS
+
+   The number of worker threads in the global (process-wide) CPU thread pool.
+   If this environment variable is not defined, the available hardware
+   concurrency is determined using a platform-specific routine.

Review comment:
       @westonpace I notice the IO thread pool size cannot be influenced for now, unless I'm mistaken. Is this something we'd like to make configurable?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r826119248



##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;
+   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
+   - ``warn`` prints a warning on stderr and continues execution;
+   - an empty value disables memory checks.
+
+   .. note::
+      While this functionality can be useful and has little overhead, it
+      is not a replacement for more sophisticated memory checking utilities
+      such as `Valgrind <https://valgrind.org/>`_ or
+      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.
+
+.. envvar:: ARROW_DEFAULT_MEMORY_POOL
+
+   The backend to be used for the default :ref:`memory pool <cpp_memory_pool>`.
+   Possible values are among ``jemalloc``, ``mimalloc`` and ``system``,
+   depending on which backends were enabled when
+   :ref:`building Arrow C++ <building-arrow-cpp>`.
+
+.. envvar:: ARROW_LIBHDFS_DIR
+
+   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
+   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
+   Alternatively, one can set :envvar:`HADOOP_HOME`.
+
+.. envvar:: ARROW_TRACING_BACKEND
+
+   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based
+   execution traces.  Possible values are:
+
+   - ``ostream``: emit textual log messages to stdout;

Review comment:
       We can add `ostream_stdout` and `ostream_stderr` then




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r827204151



##########
File path: docs/source/python/env_vars.rst
##########
@@ -0,0 +1,63 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. currentmodule:: pyarrow
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+PyArrow.
+
+.. envvar:: ARROW_HOME
+
+   The base path to the PyArrow installation.  This variable overrides the
+   default computation of library paths in introspection functions such
+   as :func:`get_library_dirs`.
+
+.. envvar:: ARROW_PRE_0_15_IPC_FORMAT
+
+   If this environment variable is set to a non-zero integer value, the PyArrow
+   IPC writer will default to the pre-0.15 Arrow IPC format.
+   This behavior can also be enabled using :attr:`IpcWriteOptions.use_legacy_format`.
+
+.. envvar:: ARROW_PRE_1_0_METADATA_VERSION
+
+   If this environment variable is set to a non-zero integer value, the PyArrow
+   IPC writer will write V4 Arrow metadata (corresponding to pre-1.0 Arrow
+   with an incompatible Union data layout).
+   This behavior can also be enabled using :attr:`IpcWriteOptions.metadata_version`.
+
+.. envvar:: PKG_CONFIG
+
+   The path to the ``pkg-config`` executable.  This may be required for
+   proper functioning of introspection functions such as
+   :func:`get_library_dirs` if ``pkg-config`` is not available on the system
+   ``PATH``.
+
+.. envvar:: PYARROW_IGNORE_TIMEZONE
+
+   By default, PyArrow propagates the timezone value when converting
+   Arrow data to/from Python datetime objects. If this environment variable
+   is set to a non-empty value, the timezone is not propagated.

Review comment:
       @jorisvandenbossche Does this seem accurate or do you want to suggest a better wording?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#issuecomment-1069325871


   Benchmark runs are scheduled for baseline = 3eaa7dd0e8b3dabc5438203331f05e3e6c011e37 and contender = 40d8e7ebf24540d64fe502de601e5cebcfc89a08. 40d8e7ebf24540d64fe502de601e5cebcfc89a08 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3faee5232cba42e98cf9b14db0ca91a2...e5daa0567b94433c8e63553b9b5cc4e3/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/063b62469e614538b0df38221f85e8f2...8d79bb6452394c5cb3eaa9a60ff6fe1a/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/a997da7d595e4b2e945f967115596008...4943bed4d24346a4b91a5efed181c2f6/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ff3cf4c8cb6b45c894cfba3fab4c72f2...a4732ad8194b45e5a4be1b166c8e2c9a/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r828924111



##########
File path: docs/source/python/env_vars.rst
##########
@@ -0,0 +1,63 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. currentmodule:: pyarrow
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+PyArrow.
+
+.. envvar:: ARROW_HOME
+
+   The base path to the PyArrow installation.  This variable overrides the
+   default computation of library paths in introspection functions such
+   as :func:`get_library_dirs`.
+
+.. envvar:: ARROW_PRE_0_15_IPC_FORMAT
+
+   If this environment variable is set to a non-zero integer value, the PyArrow
+   IPC writer will default to the pre-0.15 Arrow IPC format.
+   This behavior can also be enabled using :attr:`IpcWriteOptions.use_legacy_format`.
+
+.. envvar:: ARROW_PRE_1_0_METADATA_VERSION
+
+   If this environment variable is set to a non-zero integer value, the PyArrow
+   IPC writer will write V4 Arrow metadata (corresponding to pre-1.0 Arrow
+   with an incompatible Union data layout).
+   This behavior can also be enabled using :attr:`IpcWriteOptions.metadata_version`.
+
+.. envvar:: PKG_CONFIG
+
+   The path to the ``pkg-config`` executable.  This may be required for
+   proper functioning of introspection functions such as
+   :func:`get_library_dirs` if ``pkg-config`` is not available on the system
+   ``PATH``.
+
+.. envvar:: PYARROW_IGNORE_TIMEZONE
+
+   By default, PyArrow propagates the timezone value when converting
+   Arrow data to/from Python datetime objects. If this environment variable
+   is set to a non-empty value, the timezone is not propagated.

Review comment:
       Hmm, I see. If this is only meant for use by Spark, then I agree we should probably not document it, or perhaps alter the documentation accordingly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r826117777



##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;
+   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
+   - ``warn`` prints a warning on stderr and continues execution;
+   - an empty value disables memory checks.
+
+   .. note::
+      While this functionality can be useful and has little overhead, it
+      is not a replacement for more sophisticated memory checking utilities
+      such as `Valgrind <https://valgrind.org/>`_ or
+      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.
+
+.. envvar:: ARROW_DEFAULT_MEMORY_POOL
+
+   The backend to be used for the default :ref:`memory pool <cpp_memory_pool>`.
+   Possible values are among ``jemalloc``, ``mimalloc`` and ``system``,
+   depending on which backends were enabled when
+   :ref:`building Arrow C++ <building-arrow-cpp>`.
+
+.. envvar:: ARROW_LIBHDFS_DIR
+
+   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
+   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
+   Alternatively, one can set :envvar:`HADOOP_HOME`.
+
+.. envvar:: ARROW_TRACING_BACKEND
+
+   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based
+   execution traces.  Possible values are:
+
+   - ``ostream``: emit textual log messages to stdout;

Review comment:
       @lidavidm It seems once cannot choose between stdout/stderr?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r826124063



##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;
+   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
+   - ``warn`` prints a warning on stderr and continues execution;
+   - an empty value disables memory checks.
+
+   .. note::
+      While this functionality can be useful and has little overhead, it
+      is not a replacement for more sophisticated memory checking utilities
+      such as `Valgrind <https://valgrind.org/>`_ or
+      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.
+
+.. envvar:: ARROW_DEFAULT_MEMORY_POOL
+
+   The backend to be used for the default :ref:`memory pool <cpp_memory_pool>`.
+   Possible values are among ``jemalloc``, ``mimalloc`` and ``system``,
+   depending on which backends were enabled when
+   :ref:`building Arrow C++ <building-arrow-cpp>`.
+
+.. envvar:: ARROW_LIBHDFS_DIR
+
+   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
+   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
+   Alternatively, one can set :envvar:`HADOOP_HOME`.
+
+.. envvar:: ARROW_TRACING_BACKEND
+
+   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based
+   execution traces.  Possible values are:
+
+   - ``ostream``: emit textual log messages to stdout;
+   - ``otlp_http``: emit JSON traces to a HTTP server (by default, the endpoint
+     URL is "http://localhost:4318/v1/traces");
+   - ``arrow_otlp_stdout``: emit JSON traces to stdout;
+   - ``arrow_otlp_stderr``: emit JSON traces to stderr.
+
+   This environment variable has no effect if Arrow C++ was not built with
+   tracing enabled.
+
+   .. seealso::
+
+      `OpenTelemetry configuration for remote endpoints
+      <https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md>`__
+
+.. envvar:: ARROW_USER_SIMD_LEVEL
+
+   The SIMD optimization level to select.  By default, Arrow C++ detects
+   the capabilities of the current CPU at runtime and chooses the best
+   execution paths based on that information.  One can override the detection
+   by setting this environment variable to a well-defined value.
+   Supported values are:
+
+   - ``NONE`` disables any runtime-selected SIMD optimization;
+   - ``SSE4.2`` enables any SSE2-based optimizations until SSE4.2 (included);
+   - ``AVX`` enables any AVX-based optimizations and earlier;
+   - ``AVX2`` enables any AVX2-based optimizations and earlier;
+   - ``AVX512`` enables any AVX512-based optimizations and earlier.
+
+   This environment variable only has an effect on x86 platforms.  Other
+   platforms currently do not implement any form of runtime dispatch.
+
+   .. note::
+      In addition to runtime dispatch, the compile-time SIMD level can
+      be set using the ``ARROW_SIMD_LEVEL`` CMake configuration variable.
+      Unlike runtime dispatch, compile-time SIMD optimizations cannot be
+      changed at runtime (for example, if you compile Arrow C++ with AVX512
+      enabled, the resulting binary will only run on AVX512-enabled CPUs).
+
+.. envvar:: GANDIVA_CACHE_SIZE
+
+   The number of entries to keep in the Gandiva JIT compilation cache.
+   The cache is in-memory and does not persist accross processes.
+
+.. envvar:: HADOOP_HOME
+
+   The path to the Hadoop installation.
+
+.. envvar:: JAVA_HOME
+
+   The path to the Java Runtime Environment installation.  This may be
+   required for HDFS support if Java is installed in a non-standard location.
+
+.. envvar:: OMP_NUM_THREADS
+
+   The number of worker threads in the global (process-wide) CPU thread pool.
+   If this environment variable is not defined, the available hardware
+   concurrency is determined using a platform-specific routine.

Review comment:
       https://issues.apache.org/jira/browse/ARROW-15929




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r826122992



##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;
+   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
+   - ``warn`` prints a warning on stderr and continues execution;
+   - an empty value disables memory checks.
+
+   .. note::
+      While this functionality can be useful and has little overhead, it
+      is not a replacement for more sophisticated memory checking utilities
+      such as `Valgrind <https://valgrind.org/>`_ or
+      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.
+
+.. envvar:: ARROW_DEFAULT_MEMORY_POOL
+
+   The backend to be used for the default :ref:`memory pool <cpp_memory_pool>`.
+   Possible values are among ``jemalloc``, ``mimalloc`` and ``system``,
+   depending on which backends were enabled when
+   :ref:`building Arrow C++ <building-arrow-cpp>`.
+
+.. envvar:: ARROW_LIBHDFS_DIR
+
+   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
+   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
+   Alternatively, one can set :envvar:`HADOOP_HOME`.
+
+.. envvar:: ARROW_TRACING_BACKEND
+
+   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based
+   execution traces.  Possible values are:
+
+   - ``ostream``: emit textual log messages to stdout;
+   - ``otlp_http``: emit JSON traces to a HTTP server (by default, the endpoint
+     URL is "http://localhost:4318/v1/traces");
+   - ``arrow_otlp_stdout``: emit JSON traces to stdout;
+   - ``arrow_otlp_stderr``: emit JSON traces to stderr.
+
+   This environment variable has no effect if Arrow C++ was not built with
+   tracing enabled.
+
+   .. seealso::
+
+      `OpenTelemetry configuration for remote endpoints
+      <https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md>`__
+
+.. envvar:: ARROW_USER_SIMD_LEVEL
+
+   The SIMD optimization level to select.  By default, Arrow C++ detects
+   the capabilities of the current CPU at runtime and chooses the best
+   execution paths based on that information.  One can override the detection
+   by setting this environment variable to a well-defined value.
+   Supported values are:
+
+   - ``NONE`` disables any runtime-selected SIMD optimization;
+   - ``SSE4.2`` enables any SSE2-based optimizations until SSE4.2 (included);
+   - ``AVX`` enables any AVX-based optimizations and earlier;
+   - ``AVX2`` enables any AVX2-based optimizations and earlier;
+   - ``AVX512`` enables any AVX512-based optimizations and earlier.
+
+   This environment variable only has an effect on x86 platforms.  Other
+   platforms currently do not implement any form of runtime dispatch.
+
+   .. note::
+      In addition to runtime dispatch, the compile-time SIMD level can
+      be set using the ``ARROW_SIMD_LEVEL`` CMake configuration variable.
+      Unlike runtime dispatch, compile-time SIMD optimizations cannot be
+      changed at runtime (for example, if you compile Arrow C++ with AVX512
+      enabled, the resulting binary will only run on AVX512-enabled CPUs).
+
+.. envvar:: GANDIVA_CACHE_SIZE
+
+   The number of entries to keep in the Gandiva JIT compilation cache.
+   The cache is in-memory and does not persist accross processes.
+
+.. envvar:: HADOOP_HOME
+
+   The path to the Hadoop installation.
+
+.. envvar:: JAVA_HOME
+
+   The path to the Java Runtime Environment installation.  This may be
+   required for HDFS support if Java is installed in a non-standard location.
+
+.. envvar:: OMP_NUM_THREADS
+
+   The number of worker threads in the global (process-wide) CPU thread pool.
+   If this environment variable is not defined, the available hardware
+   concurrency is determined using a platform-specific routine.

Review comment:
       :-D. Do you want to open a JIRA for R?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r826125472



##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;
+   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
+   - ``warn`` prints a warning on stderr and continues execution;
+   - an empty value disables memory checks.
+
+   .. note::
+      While this functionality can be useful and has little overhead, it
+      is not a replacement for more sophisticated memory checking utilities
+      such as `Valgrind <https://valgrind.org/>`_ or
+      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.
+
+.. envvar:: ARROW_DEFAULT_MEMORY_POOL
+
+   The backend to be used for the default :ref:`memory pool <cpp_memory_pool>`.
+   Possible values are among ``jemalloc``, ``mimalloc`` and ``system``,
+   depending on which backends were enabled when
+   :ref:`building Arrow C++ <building-arrow-cpp>`.
+
+.. envvar:: ARROW_LIBHDFS_DIR
+
+   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
+   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
+   Alternatively, one can set :envvar:`HADOOP_HOME`.
+
+.. envvar:: ARROW_TRACING_BACKEND
+
+   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based
+   execution traces.  Possible values are:
+
+   - ``ostream``: emit textual log messages to stdout;

Review comment:
       https://issues.apache.org/jira/browse/ARROW-15930




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r827089976



##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;

Review comment:
       ```suggestion
      - ``abort`` exits the process with a non-zero return value;
   ```

##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;
+   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
+   - ``warn`` prints a warning on stderr and continues execution;
+   - an empty value disables memory checks.
+
+   .. note::
+      While this functionality can be useful and has little overhead, it
+      is not a replacement for more sophisticated memory checking utilities
+      such as `Valgrind <https://valgrind.org/>`_ or
+      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.
+
+.. envvar:: ARROW_DEFAULT_MEMORY_POOL
+
+   The backend to be used for the default :ref:`memory pool <cpp_memory_pool>`.
+   Possible values are among ``jemalloc``, ``mimalloc`` and ``system``,
+   depending on which backends were enabled when
+   :ref:`building Arrow C++ <building-arrow-cpp>`.
+
+.. envvar:: ARROW_LIBHDFS_DIR
+
+   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
+   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
+   Alternatively, one can set :envvar:`HADOOP_HOME`.
+
+.. envvar:: ARROW_TRACING_BACKEND
+
+   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based

Review comment:
       ```suggestion
      The backend used to export `OpenTelemetry <https://opentelemetry.io/>`_-based
   ```

##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;
+   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
+   - ``warn`` prints a warning on stderr and continues execution;
+   - an empty value disables memory checks.
+
+   .. note::
+      While this functionality can be useful and has little overhead, it
+      is not a replacement for more sophisticated memory checking utilities
+      such as `Valgrind <https://valgrind.org/>`_ or
+      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.
+
+.. envvar:: ARROW_DEFAULT_MEMORY_POOL
+
+   The backend to be used for the default :ref:`memory pool <cpp_memory_pool>`.
+   Possible values are among ``jemalloc``, ``mimalloc`` and ``system``,
+   depending on which backends were enabled when
+   :ref:`building Arrow C++ <building-arrow-cpp>`.
+
+.. envvar:: ARROW_LIBHDFS_DIR
+
+   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
+   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
+   Alternatively, one can set :envvar:`HADOOP_HOME`.
+
+.. envvar:: ARROW_TRACING_BACKEND
+
+   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based
+   execution traces.  Possible values are:
+
+   - ``ostream``: emit textual log messages to stdout;
+   - ``otlp_http``: emit JSON traces to a HTTP server (by default, the endpoint
+     URL is "http://localhost:4318/v1/traces");
+   - ``arrow_otlp_stdout``: emit JSON traces to stdout;
+   - ``arrow_otlp_stderr``: emit JSON traces to stderr.
+
+   This environment variable has no effect if Arrow C++ was not built with
+   tracing enabled.
+
+   .. seealso::
+
+      `OpenTelemetry configuration for remote endpoints
+      <https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md>`__
+
+.. envvar:: ARROW_USER_SIMD_LEVEL
+
+   The SIMD optimization level to select.  By default, Arrow C++ detects
+   the capabilities of the current CPU at runtime and chooses the best
+   execution paths based on that information.  One can override the detection
+   by setting this environment variable to a well-defined value.
+   Supported values are:
+
+   - ``NONE`` disables any runtime-selected SIMD optimization;
+   - ``SSE4.2`` enables any SSE2-based optimizations until SSE4.2 (included);
+   - ``AVX`` enables any AVX-based optimizations and earlier;
+   - ``AVX2`` enables any AVX2-based optimizations and earlier;
+   - ``AVX512`` enables any AVX512-based optimizations and earlier.
+
+   This environment variable only has an effect on x86 platforms.  Other
+   platforms currently do not implement any form of runtime dispatch.
+
+   .. note::
+      In addition to runtime dispatch, the compile-time SIMD level can
+      be set using the ``ARROW_SIMD_LEVEL`` CMake configuration variable.
+      Unlike runtime dispatch, compile-time SIMD optimizations cannot be
+      changed at runtime (for example, if you compile Arrow C++ with AVX512
+      enabled, the resulting binary will only run on AVX512-enabled CPUs).

Review comment:
       I'm not sure this fully explains the interplay between runtime and compile time settings.  Would the user ever specify `ARROW_SIMD_LEVEL` at build time and still use `ARROW_USER_SIMD_LEVEL` at runtime?  Or does `ARROW_USER_SIMD_LEVEL` only make sense if `ARROW_SIMD_LEVEL` was not specified.

##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;
+   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
+   - ``warn`` prints a warning on stderr and continues execution;
+   - an empty value disables memory checks.
+
+   .. note::
+      While this functionality can be useful and has little overhead, it
+      is not a replacement for more sophisticated memory checking utilities
+      such as `Valgrind <https://valgrind.org/>`_ or
+      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.
+
+.. envvar:: ARROW_DEFAULT_MEMORY_POOL
+
+   The backend to be used for the default :ref:`memory pool <cpp_memory_pool>`.
+   Possible values are among ``jemalloc``, ``mimalloc`` and ``system``,
+   depending on which backends were enabled when
+   :ref:`building Arrow C++ <building-arrow-cpp>`.
+
+.. envvar:: ARROW_LIBHDFS_DIR
+
+   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
+   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
+   Alternatively, one can set :envvar:`HADOOP_HOME`.
+
+.. envvar:: ARROW_TRACING_BACKEND
+
+   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based
+   execution traces.  Possible values are:
+
+   - ``ostream``: emit textual log messages to stdout;
+   - ``otlp_http``: emit JSON traces to a HTTP server (by default, the endpoint
+     URL is "http://localhost:4318/v1/traces");
+   - ``arrow_otlp_stdout``: emit JSON traces to stdout;
+   - ``arrow_otlp_stderr``: emit JSON traces to stderr.
+
+   This environment variable has no effect if Arrow C++ was not built with
+   tracing enabled.
+
+   .. seealso::
+
+      `OpenTelemetry configuration for remote endpoints
+      <https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md>`__
+
+.. envvar:: ARROW_USER_SIMD_LEVEL
+
+   The SIMD optimization level to select.  By default, Arrow C++ detects
+   the capabilities of the current CPU at runtime and chooses the best
+   execution paths based on that information.  One can override the detection
+   by setting this environment variable to a well-defined value.
+   Supported values are:
+
+   - ``NONE`` disables any runtime-selected SIMD optimization;
+   - ``SSE4.2`` enables any SSE2-based optimizations until SSE4.2 (included);
+   - ``AVX`` enables any AVX-based optimizations and earlier;
+   - ``AVX2`` enables any AVX2-based optimizations and earlier;
+   - ``AVX512`` enables any AVX512-based optimizations and earlier.
+
+   This environment variable only has an effect on x86 platforms.  Other
+   platforms currently do not implement any form of runtime dispatch.
+
+   .. note::
+      In addition to runtime dispatch, the compile-time SIMD level can
+      be set using the ``ARROW_SIMD_LEVEL`` CMake configuration variable.
+      Unlike runtime dispatch, compile-time SIMD optimizations cannot be
+      changed at runtime (for example, if you compile Arrow C++ with AVX512
+      enabled, the resulting binary will only run on AVX512-enabled CPUs).
+
+.. envvar:: GANDIVA_CACHE_SIZE
+
+   The number of entries to keep in the Gandiva JIT compilation cache.
+   The cache is in-memory and does not persist accross processes.

Review comment:
       Do we have any other Gandiva documentation that we can link to for more information?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#discussion_r827162822



##########
File path: docs/source/cpp/env_vars.rst
##########
@@ -0,0 +1,128 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=====================
+Environment Variables
+=====================
+
+The following environment variables can be used to affect the behavior of
+Arrow C++ at runtime.  Many of these variables are inspected only once per
+process (for example, when the Arrow C++ DLL is loaded), so you cannot assume
+that changing their value later will have an effect.
+
+.. envvar:: ARROW_DEBUG_MEMORY_POOL
+
+   Enable rudimentary memory checks to guard against buffer overflows.
+   The value of this environment variable selects the behavior when a
+   buffer overflow is detected:
+
+   - ``abort`` exits the processus with a non-zero return value;
+   - ``trap`` issues a platform-specific debugger breakpoint / trap instruction;
+   - ``warn`` prints a warning on stderr and continues execution;
+   - an empty value disables memory checks.
+
+   .. note::
+      While this functionality can be useful and has little overhead, it
+      is not a replacement for more sophisticated memory checking utilities
+      such as `Valgrind <https://valgrind.org/>`_ or
+      `Address Sanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_.
+
+.. envvar:: ARROW_DEFAULT_MEMORY_POOL
+
+   The backend to be used for the default :ref:`memory pool <cpp_memory_pool>`.
+   Possible values are among ``jemalloc``, ``mimalloc`` and ``system``,
+   depending on which backends were enabled when
+   :ref:`building Arrow C++ <building-arrow-cpp>`.
+
+.. envvar:: ARROW_LIBHDFS_DIR
+
+   The directory containing the C HDFS library (``hdfs.dll`` on Windows,
+   ``libhdfs.dylib`` on macOS, ``libhdfs.so`` on other platforms).
+   Alternatively, one can set :envvar:`HADOOP_HOME`.
+
+.. envvar:: ARROW_TRACING_BACKEND
+
+   The backend where to export `OpenTelemetry <https://opentelemetry.io/>`_-based
+   execution traces.  Possible values are:
+
+   - ``ostream``: emit textual log messages to stdout;
+   - ``otlp_http``: emit JSON traces to a HTTP server (by default, the endpoint
+     URL is "http://localhost:4318/v1/traces");
+   - ``arrow_otlp_stdout``: emit JSON traces to stdout;
+   - ``arrow_otlp_stderr``: emit JSON traces to stderr.
+
+   This environment variable has no effect if Arrow C++ was not built with
+   tracing enabled.
+
+   .. seealso::
+
+      `OpenTelemetry configuration for remote endpoints
+      <https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md>`__
+
+.. envvar:: ARROW_USER_SIMD_LEVEL
+
+   The SIMD optimization level to select.  By default, Arrow C++ detects
+   the capabilities of the current CPU at runtime and chooses the best
+   execution paths based on that information.  One can override the detection
+   by setting this environment variable to a well-defined value.
+   Supported values are:
+
+   - ``NONE`` disables any runtime-selected SIMD optimization;
+   - ``SSE4.2`` enables any SSE2-based optimizations until SSE4.2 (included);
+   - ``AVX`` enables any AVX-based optimizations and earlier;
+   - ``AVX2`` enables any AVX2-based optimizations and earlier;
+   - ``AVX512`` enables any AVX512-based optimizations and earlier.
+
+   This environment variable only has an effect on x86 platforms.  Other
+   platforms currently do not implement any form of runtime dispatch.
+
+   .. note::
+      In addition to runtime dispatch, the compile-time SIMD level can
+      be set using the ``ARROW_SIMD_LEVEL`` CMake configuration variable.
+      Unlike runtime dispatch, compile-time SIMD optimizations cannot be
+      changed at runtime (for example, if you compile Arrow C++ with AVX512
+      enabled, the resulting binary will only run on AVX512-enabled CPUs).
+
+.. envvar:: GANDIVA_CACHE_SIZE
+
+   The number of entries to keep in the Gandiva JIT compilation cache.
+   The cache is in-memory and does not persist accross processes.
+
+.. envvar:: HADOOP_HOME
+
+   The path to the Hadoop installation.
+
+.. envvar:: JAVA_HOME
+
+   The path to the Java Runtime Environment installation.  This may be
+   required for HDFS support if Java is installed in a non-standard location.
+
+.. envvar:: OMP_NUM_THREADS
+
+   The number of worker threads in the global (process-wide) CPU thread pool.
+   If this environment variable is not defined, the available hardware
+   concurrency is determined using a platform-specific routine.

Review comment:
       Ok, I created https://issues.apache.org/jira/browse/ARROW-15941 for an IO thread pool environment variable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12624: ARROW-15617: [Doc][C++] Document environment variables

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12624:
URL: https://github.com/apache/arrow/pull/12624#issuecomment-1067094428


   https://issues.apache.org/jira/browse/ARROW-15617


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org