You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "coders1122 (via GitHub)" <gi...@apache.org> on 2023/04/25 06:24:16 UTC
[GitHub] [arrow] coders1122 opened a new issue, #35319: Why is arrow mmap marked MAP_PRIVATE (during read)?
coders1122 opened a new issue, #35319:
URL: https://github.com/apache/arrow/issues/35319
### Describe the usage question you have. Please include as many useful details as possible.
Why is it marked MAP_PRIVATE [here](https://github.com/apache/arrow/blob/9009dd76e4e9a2f4f13340ebf4173e71813b359b/cpp/src/arrow/io/file.cc#L449)?
Is there a reason why you've explicitly chosen to not (also) use `MAP_SHARED` similar to write?
We are trying to evaluate if this causes multiple workers (in a distributed training) to `mmap` multiple times instead of reusing the same mmap'ed file and reduce memory usage?
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on issue #35319: [C++] Why is arrow mmap marked MAP_PRIVATE (during read)?
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35319:
URL: https://github.com/apache/arrow/issues/35319#issuecomment-1540401538
Let's just consult the Linux `mmap` man page:
```
MAP_PRIVATE
Create a private copy-on-write mapping. Updates to the mapping are not
visible to other processes mapping the same file, and are not carried
through to the underlying file. It is unspecified whether changes made
to the file after the mmap() call are visible in the mapped region.
```
Since it's a copy-on-write mapping, there would only be any duplication if the user writes into the mapped area.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on issue #35319: [C++] Why is arrow mmap marked MAP_PRIVATE (during read)?
Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35319:
URL: https://github.com/apache/arrow/issues/35319#issuecomment-1524095587
Since we are not doing any rights there *shouldn't* be any difference. I suspect we picked `MAP_PRIVATE` just because it was more foolproof.
This seems unrelated to how many times something might call `mmap`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] mapleFU commented on issue #35319: [C++] Why is arrow mmap marked MAP_PRIVATE (during read)?
Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #35319:
URL: https://github.com/apache/arrow/issues/35319#issuecomment-1522109887
Seems that RocksDB use `MAP_PRIVATE` and badger uses `MAP_SHARED`. Is there any differences between `MAP_PRIVATE` and `MAP_SHARED` when not defined `PROM_WRITE`? Seems that it's cow, it shares the underlying memory if you didn't write to that buffer? And if there is something different, would you mind provide some benchmark data between them?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org