You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "coders1122 (via GitHub)" <gi...@apache.org> on 2023/04/25 06:24:16 UTC

[GitHub] [arrow] coders1122 opened a new issue, #35319: Why is arrow mmap marked MAP_PRIVATE (during read)?

coders1122 opened a new issue, #35319:
URL: https://github.com/apache/arrow/issues/35319

   ### Describe the usage question you have. Please include as many useful details as  possible.
   
   
   Why is it marked MAP_PRIVATE [here](https://github.com/apache/arrow/blob/9009dd76e4e9a2f4f13340ebf4173e71813b359b/cpp/src/arrow/io/file.cc#L449)?
   Is there a reason why you've explicitly chosen to not (also) use `MAP_SHARED` similar to write?
   
   We are trying to evaluate if this causes multiple workers (in a distributed training) to `mmap` multiple times instead of reusing the same mmap'ed file and reduce memory usage?
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #35319: [C++] Why is arrow mmap marked MAP_PRIVATE (during read)?

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35319:
URL: https://github.com/apache/arrow/issues/35319#issuecomment-1540401538

   Let's just consult the Linux `mmap` man page:
   ```
          MAP_PRIVATE
                 Create  a  private copy-on-write mapping.  Updates to the mapping are not
                 visible to other processes mapping the same file,  and  are  not  carried
                 through  to  the underlying file.  It is unspecified whether changes made
                 to the file after the mmap() call are visible in the mapped region.
   ```
   
   Since it's a copy-on-write mapping, there would only be any duplication if the user writes into the mapped area.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35319: [C++] Why is arrow mmap marked MAP_PRIVATE (during read)?

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35319:
URL: https://github.com/apache/arrow/issues/35319#issuecomment-1524095587

   Since we are not doing any rights there *shouldn't* be any difference.  I suspect we picked `MAP_PRIVATE` just because it was more foolproof.
   
   This seems unrelated to how many times something might call `mmap`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #35319: [C++] Why is arrow mmap marked MAP_PRIVATE (during read)?

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #35319:
URL: https://github.com/apache/arrow/issues/35319#issuecomment-1522109887

   Seems that RocksDB use `MAP_PRIVATE` and badger uses `MAP_SHARED`. Is there any differences between `MAP_PRIVATE` and  `MAP_SHARED` when not defined `PROM_WRITE`? Seems that it's cow, it shares the underlying memory if you didn't write to that buffer? And if there is something different, would you mind provide some benchmark data between them?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org