You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/07/15 13:50:30 UTC

[GitHub] [doris] platoneko opened a new pull request, #10896: Shared file reader

platoneko opened a new pull request, #10896:
URL: https://github.com/apache/doris/pull/10896

   # Proposed changes
   
   Issue Number: close #10895 
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10896:
URL: https://github.com/apache/doris/pull/10896#discussion_r922674060


##########
be/src/io/fs/local_file_system.cpp:
##########
@@ -56,28 +47,16 @@ Status LocalFileSystem::create_file(const Path& path, std::unique_ptr<FileWriter
     return Status::OK();
 }
 
-Status LocalFileSystem::open_file(const Path& path, std::unique_ptr<FileReader>* reader) {
+Status LocalFileSystem::open_file(const Path& path, FileReaderPtr* reader) {
     auto fs_path = absolute_path(path);
-    std::shared_ptr<OpenedFileHandle<int>> file_handle(new OpenedFileHandle<int>());
-    bool found = _file_cache->lookup(fs_path.native(), file_handle.get());
-    if (!found) {
-        int fd = -1;
-        RETRY_ON_EINTR(fd, open(fs_path.c_str(), O_RDONLY));
-        if (fd < 0) {
-            return Status::IOError(
-                    fmt::format("cannot open {}: {}", fs_path.native(), std::strerror(errno)));
-        }
-        int* p_fd = new int(fd);
-        _file_cache->insert(fs_path.native(), p_fd, file_handle.get(),
-                            [](const CacheKey& key, void* value) {
-                                auto fd = reinterpret_cast<int*>(value);
-                                ::close(*fd);
-                                delete fd;
-                            });
-    }
     size_t fsize = 0;
     RETURN_IF_ERROR(file_size(fs_path, &fsize));
-    *reader = std::make_unique<LocalFileReader>(std::move(fs_path), fsize, std::move(file_handle));
+    int fd = -1;
+    RETRY_ON_EINTR(fd, open(fs_path.c_str(), O_RDONLY));
+    if (fd < 0) {
+        return Status::IOError("cannot open {}: {}", fs_path.native(), std::strerror(errno));
+    }
+    *reader = std::make_unique<LocalFileReader>(std::move(fs_path), fsize, fd);

Review Comment:
   reader is shared ptr? why call make unique here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] platoneko commented on a diff in pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
platoneko commented on code in PR #10896:
URL: https://github.com/apache/doris/pull/10896#discussion_r922675498


##########
be/src/io/fs/local_file_system.cpp:
##########
@@ -56,28 +47,16 @@ Status LocalFileSystem::create_file(const Path& path, std::unique_ptr<FileWriter
     return Status::OK();
 }
 
-Status LocalFileSystem::open_file(const Path& path, std::unique_ptr<FileReader>* reader) {
+Status LocalFileSystem::open_file(const Path& path, FileReaderPtr* reader) {
     auto fs_path = absolute_path(path);
-    std::shared_ptr<OpenedFileHandle<int>> file_handle(new OpenedFileHandle<int>());
-    bool found = _file_cache->lookup(fs_path.native(), file_handle.get());
-    if (!found) {
-        int fd = -1;
-        RETRY_ON_EINTR(fd, open(fs_path.c_str(), O_RDONLY));
-        if (fd < 0) {
-            return Status::IOError(
-                    fmt::format("cannot open {}: {}", fs_path.native(), std::strerror(errno)));
-        }
-        int* p_fd = new int(fd);
-        _file_cache->insert(fs_path.native(), p_fd, file_handle.get(),
-                            [](const CacheKey& key, void* value) {
-                                auto fd = reinterpret_cast<int*>(value);
-                                ::close(*fd);
-                                delete fd;
-                            });
-    }
     size_t fsize = 0;
     RETURN_IF_ERROR(file_size(fs_path, &fsize));
-    *reader = std::make_unique<LocalFileReader>(std::move(fs_path), fsize, std::move(file_handle));
+    int fd = -1;
+    RETRY_ON_EINTR(fd, open(fs_path.c_str(), O_RDONLY));
+    if (fd < 0) {
+        return Status::IOError("cannot open {}: {}", fs_path.native(), std::strerror(errno));
+    }
+    *reader = std::make_unique<LocalFileReader>(std::move(fs_path), fsize, fd);

Review Comment:
   I will fix it soon.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10896:
URL: https://github.com/apache/doris/pull/10896#discussion_r922673493


##########
be/src/io/fs/local_file_system.cpp:
##########
@@ -56,28 +47,16 @@ Status LocalFileSystem::create_file(const Path& path, std::unique_ptr<FileWriter
     return Status::OK();
 }
 
-Status LocalFileSystem::open_file(const Path& path, std::unique_ptr<FileReader>* reader) {
+Status LocalFileSystem::open_file(const Path& path, FileReaderPtr* reader) {
     auto fs_path = absolute_path(path);
-    std::shared_ptr<OpenedFileHandle<int>> file_handle(new OpenedFileHandle<int>());
-    bool found = _file_cache->lookup(fs_path.native(), file_handle.get());

Review Comment:
   file cache is useless?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10896:
URL: https://github.com/apache/doris/pull/10896#discussion_r922675020


##########
be/src/olap/rowset/segment_v2/page_io.h:
##########
@@ -46,7 +46,7 @@ namespace segment_v2 {
 
 struct PageReadOptions {
     // block to read page
-    doris::io::FileReader* file_reader = nullptr;
+    io::FileReader* file_reader = nullptr;

Review Comment:
   why not use shared pointer here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10896:
URL: https://github.com/apache/doris/pull/10896#discussion_r922670897


##########
be/src/olap/rowset/beta_rowset_writer.h:
##########
@@ -86,7 +86,7 @@ class BetaRowsetWriter : public RowsetWriter {
     std::unique_ptr<segment_v2::SegmentWriter> _segment_writer;
     mutable SpinLock _lock; // lock to protect _wblocks.
     // TODO(lingbin): it is better to wrapper in a Batch?

Review Comment:
   remove this comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10896:
URL: https://github.com/apache/doris/pull/10896#discussion_r922674424


##########
be/src/io/fs/local_file_reader.cpp:
##########
@@ -39,16 +34,18 @@ LocalFileReader::~LocalFileReader() {
 }
 
 Status LocalFileReader::close() {
-    bool expected = false;
-    if (_closed.compare_exchange_strong(expected, true)) {
-        _file_handle.reset();
-        DorisMetrics::instance()->local_file_open_reading->increment(-1);
+    if (!closed()) {

Review Comment:
   Close method maybe called concurrently, the _fd maybe set to -1 in another thread. I think we'd better use lock here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10896:
URL: https://github.com/apache/doris/pull/10896#discussion_r922671115


##########
be/src/io/fs/file_reader.h:
##########
@@ -38,7 +40,11 @@ class FileReader {
     virtual const Path& path() const = 0;
 
     virtual size_t size() const = 0;
+
+    virtual bool closed() const = 0;
 };
 
+using FileReaderPtr = std::shared_ptr<FileReader>;

Review Comment:
   Have you test that there is no performance lost when change to use shared ptr?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] platoneko commented on a diff in pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
platoneko commented on code in PR #10896:
URL: https://github.com/apache/doris/pull/10896#discussion_r922677079


##########
be/src/olap/rowset/beta_rowset_writer.h:
##########
@@ -86,7 +86,7 @@ class BetaRowsetWriter : public RowsetWriter {
     std::unique_ptr<segment_v2::SegmentWriter> _segment_writer;
     mutable SpinLock _lock; // lock to protect _wblocks.
     // TODO(lingbin): it is better to wrapper in a Batch?

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10896:
URL: https://github.com/apache/doris/pull/10896#discussion_r922671321


##########
be/src/olap/rowset/segment_v2/column_reader.h:
##########
@@ -64,7 +64,7 @@ struct ColumnReaderOptions {
 };
 
 struct ColumnIteratorOptions {
-    io::FileReader* file_reader = nullptr;
+    io::FileReaderPtr file_reader = nullptr;

Review Comment:
   the pointer is called too many times, shared ptr may have performance issues



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] platoneko commented on a diff in pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
platoneko commented on code in PR #10896:
URL: https://github.com/apache/doris/pull/10896#discussion_r922676762


##########
be/src/olap/rowset/segment_v2/page_io.h:
##########
@@ -46,7 +46,7 @@ namespace segment_v2 {
 
 struct PageReadOptions {
     // block to read page
-    doris::io::FileReader* file_reader = nullptr;
+    io::FileReader* file_reader = nullptr;

Review Comment:
   read_and_decompress_page is a synchronized method, this FileReader won't be released during reading page now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10896:
URL: https://github.com/apache/doris/pull/10896#discussion_r922674893


##########
be/src/olap/rowset/segment_v2/column_reader.cpp:
##########
@@ -149,7 +149,7 @@ Status ColumnReader::read_page(const ColumnIteratorOptions& iter_opts, const Pag
                                BlockCompressionCodec* codec) const {
     iter_opts.sanity_check();
     PageReadOptions opts;
-    opts.file_reader = iter_opts.file_reader;
+    opts.file_reader = iter_opts.file_reader.get();

Review Comment:
   opts.file_reader is raw pointer?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10896:
URL: https://github.com/apache/doris/pull/10896#discussion_r922670096


##########
be/src/io/fs/file_reader.h:
##########
@@ -38,7 +40,11 @@ class FileReader {
     virtual const Path& path() const = 0;
 
     virtual size_t size() const = 0;
+
+    virtual bool closed() const = 0;
 };
 
+using FileReaderPtr = std::shared_ptr<FileReader>;

Review Comment:
   if it is a shared ptr, FileReaderSPtr is better. If it is a unique ptr, FileReaderUPtr is better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10896:
URL: https://github.com/apache/doris/pull/10896#discussion_r922671115


##########
be/src/io/fs/file_reader.h:
##########
@@ -38,7 +40,11 @@ class FileReader {
     virtual const Path& path() const = 0;
 
     virtual size_t size() const = 0;
+
+    virtual bool closed() const = 0;
 };
 
+using FileReaderPtr = std::shared_ptr<FileReader>;

Review Comment:
   Have you test that there is no performance lost when change to use shared ptr?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10896:
URL: https://github.com/apache/doris/pull/10896#discussion_r922672207


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -349,7 +348,7 @@ Status SegmentIterator::_init_return_column_iterators() {
             ColumnIteratorOptions iter_opts;
             iter_opts.stats = _opts.stats;
             iter_opts.use_page_cache = _opts.use_page_cache;
-            iter_opts.file_reader = _file_reader.get();

Review Comment:
   I think the previous code maybe want to improve the performance by using raw pointer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei merged pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
yiguolei merged PR #10896:
URL: https://github.com/apache/doris/pull/10896


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10896:
URL: https://github.com/apache/doris/pull/10896#discussion_r922674820


##########
be/src/olap/rowset/segment_v2/column_reader.cpp:
##########
@@ -17,7 +17,8 @@
 
 #include "olap/rowset/segment_v2/column_reader.h"
 
-#include "gutil/strings/substitute.h"                // for Substitute
+#include "gutil/strings/substitute.h" // for Substitute

Review Comment:
   Substitue should not be used. Could remove this include.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10896:
URL: https://github.com/apache/doris/pull/10896#issuecomment-1186341751

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10896: [Enhancement] Use shared file reader when read a segment

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10896:
URL: https://github.com/apache/doris/pull/10896#issuecomment-1186341755

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org