You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by we...@apache.org on 2016/12/19 23:28:13 UTC

parquet-cpp git commit: PARQUET-799: Fix bug in MemoryMapSource::CloseFile

Repository: parquet-cpp
Updated Branches:
  refs/heads/master 0c0c3ca8c -> 0e195730d


PARQUET-799: Fix bug in MemoryMapSource::CloseFile

The key change here is to ensure that munmap is called at most once for
a given memory-mapped file.

Previously, MemoryMapSource::CloseFile was calling munmap on every
invocation (provided that a file had ever been successfully mapped into
memory for that instance). This is problematic in a multi-threaded
environment, even if each MemoryMapSource instance is being used in only
one thread, as illustrated by the following hypothetical sequence of
operations:

thread 1 @ time 1: munmap(0xf000, 4096)
thread 2 @ time 2: void *addr = mmap(NULL, 4096, ...) // addr <- 0xf000
thread 1 @ time 3: munmap(0xf000, 4096)

After time 3, the mapping for the memory segment beginning at 0xf000 has
been invalidated, so the next attempt by thread 2 to access memory
within that segment will likely cause a segfault (unless yet another
thread has mmap'd that segment in the meantime, in which case the
results could be even more interesting, but certainly no better).

Also, I'm adding/modifying a couple comments in header files to mark
"sample" implementations accordingly. This is intended to give API
consumers a heads up as to the intent and level of maturity of those
sections of the codebase.

Author: William Forson <wi...@gluent.com>

Closes #203 from wdforson/master and squashes the following commits:

782713a [William Forson] Adjust 'sample' code comments
7337551 [William Forson] PARQUET-799: Fix bug in MemoryMapSource::CloseFile


Project: http://git-wip-us.apache.org/repos/asf/parquet-cpp/repo
Commit: http://git-wip-us.apache.org/repos/asf/parquet-cpp/commit/0e195730
Tree: http://git-wip-us.apache.org/repos/asf/parquet-cpp/tree/0e195730
Diff: http://git-wip-us.apache.org/repos/asf/parquet-cpp/diff/0e195730

Branch: refs/heads/master
Commit: 0e195730d413661d99a75d1d285f2e8593909e8c
Parents: 0c0c3ca
Author: William Forson <wi...@gluent.com>
Authored: Mon Dec 19 18:28:07 2016 -0500
Committer: Wes McKinney <we...@twosigma.com>
Committed: Mon Dec 19 18:28:07 2016 -0500

----------------------------------------------------------------------
 src/parquet/file/reader.h | 4 +++-
 src/parquet/util/input.cc | 5 ++++-
 src/parquet/util/input.h  | 4 ++++
 3 files changed, 11 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/parquet-cpp/blob/0e195730/src/parquet/file/reader.h
----------------------------------------------------------------------
diff --git a/src/parquet/file/reader.h b/src/parquet/file/reader.h
index 89dbbc8..ca28f67 100644
--- a/src/parquet/file/reader.h
+++ b/src/parquet/file/reader.h
@@ -79,7 +79,9 @@ class PARQUET_EXPORT ParquetFileReader {
   ParquetFileReader();
   ~ParquetFileReader();
 
-  // API Convenience to open a serialized Parquet file on disk
+  // API Convenience to open a serialized Parquet file on disk, using built-in IO
+  // interface implementations that were created for testing, and may not be robust for
+  // all use cases.
   static std::unique_ptr<ParquetFileReader> OpenFile(const std::string& path,
       bool memory_map = true, ReaderProperties props = default_reader_properties());
 

http://git-wip-us.apache.org/repos/asf/parquet-cpp/blob/0e195730/src/parquet/util/input.cc
----------------------------------------------------------------------
diff --git a/src/parquet/util/input.cc b/src/parquet/util/input.cc
index 10e1573..127b90c 100644
--- a/src/parquet/util/input.cc
+++ b/src/parquet/util/input.cc
@@ -134,7 +134,10 @@ void MemoryMapSource::Close() {
 }
 
 void MemoryMapSource::CloseFile() {
-  if (data_ != nullptr) { munmap(data_, size_); }
+  if (data_ != nullptr) {
+    munmap(data_, size_);
+    data_ = nullptr;
+  }
 
   LocalFileSource::CloseFile();
 }

http://git-wip-us.apache.org/repos/asf/parquet-cpp/blob/0e195730/src/parquet/util/input.h
----------------------------------------------------------------------
diff --git a/src/parquet/util/input.h b/src/parquet/util/input.h
index d6d6f34..1bb41e3 100644
--- a/src/parquet/util/input.h
+++ b/src/parquet/util/input.h
@@ -55,6 +55,10 @@ class PARQUET_EXPORT RandomAccessSource {
   int64_t size_;
 };
 
+// ----------------------------------------------------------------------
+// Implementations of RandomAccessSource used for testing and internal CLI tools.
+// May not be sufficiently robust for general production use.
+
 class PARQUET_EXPORT LocalFileSource : public RandomAccessSource {
  public:
   explicit LocalFileSource(MemoryAllocator* allocator = default_allocator())