You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/09/21 20:21:00 UTC

[jira] [Comment Edited] (ARROW-14039) [C++] [Docs] Indicate memory required for installation

    [ https://issues.apache.org/jira/browse/ARROW-14039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418306#comment-17418306 ] 

Weston Pace edited comment on ARROW-14039 at 9/21/21, 8:20 PM:
---------------------------------------------------------------

> being able to run in low resource settings enables wider adoption as a standard backend

I may be misunderstanding still but I think the discussion is about building and not running.  I absolutely agree that Arrow should be able to run with minimal memory and that might be worth defining a limit for.

> for example R will download and build Arrow on Linux when one installs the R bindings.

I believe R always compiles the bindings but it shouldn't compile arrow-cpp if the package is already present.  For example, if the user has already installed the CentOS8 Arrow package from the EPEL.  The one exception might be golang (statically compiles everything) but it has pretty strong cross compilation support.

> The requirement is not meant to be very precise, more a suggestion as to what to expect. It is possible to add memory use monitoring to the CI builds, though again this would need maintenance.  We want someone installing Arrow (at least the debug build as virtual machines with less than 1Gb are rare) to know that if the build is proceeding very slowly and they have limited RAM, swapping from RAM is the likely reason for the slow build.

What if we just add a generic statement:

_Arrow C++ is a complex project that needs to handle many different data types, vectorization architectures, and compiler differences.  Building Arrow C++ requires a considerable amount of CPU and RAM.  When installing Arrow on a system with limited resources we recommend compiling the binaries on a capable build machine or downloading prebuilt binaries from package managers._


If you want to replace "considerable amount of CPU and RAM" with "potentially more than 4GB of RAM" (or insert your number here) I wouldn't really be opposed.  I think my concern would be more with a phrase like "at most 4GB of RAM" because we have no way of reliably backing that up other than "On these build machines with these configurations it took less than 4GB" and that isn't really the same thing.


was (Author: westonpace):
> being able to run in low resource settings enables wider adoption as a standard backend

I may be misunderstanding still but I think the discussion is about building and not running.  I absolutely agree that Arrow should be able to run with minimal memory and that might be worth defining a limit for.

> for example R will download and build Arrow on Linux when one installs the R bindings.

I believe R always compiles the bindings but it shouldn't compile arrow-cpp if the package is already present.  For example, if the user has already installed the CentOS8 Arrow package from the EPEL.  The one exception might be golang (statically compiles everything) but it has pretty strong cross compilation support.

> The requirement is not meant to be very precise, more a suggestion as to what to expect. It is possible to add memory use monitoring to the CI builds, though again this would need maintenance.  We want someone installing Arrow (at least the debug build as virtual machines with less than 1Gb are rare) to know that if the build is proceeding very slowly and they have limited RAM, swapping from RAM is the likely reason for the slow build.

What if we just add a generic statement:

_Arrow C++ is a complex project that needs to handle many different data types, vectorization architectures, and compiler differences.  Building Arrow C++ requires a considerable amount of CPU and RAM.  When installing Arrow on a system with limited resources we recommend compiling the binaries on a capable build machine or downloading prebuilt binaries from package managers.
_

If you want to replace "considerable amount of CPU and RAM" with "potentially more than 4GB of RAM" (or insert your number here) I wouldn't really be opposed.  I think my concern would be more with a phrase like "at most 4GB of RAM" because we have no way of reliably backing that up other than "On these build machines with these configurations it took less than 4GB" and that isn't really the same thing.

> [C++] [Docs] Indicate memory required for installation
> ------------------------------------------------------
>
>                 Key: ARROW-14039
>                 URL: https://issues.apache.org/jira/browse/ARROW-14039
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Benson Muite
>            Assignee: Benson Muite
>            Priority: Trivial
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Would be helpful to add typical memory required for installation. A single core is sufficient for processing power, monitoring with SAR indicates that about 3 Gb of RAM are needed for debug build and 1Gb of RAM for release build.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)