You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/12/07 06:50:41 UTC

[GitHub] [doris] BePPPower opened a new pull request, #14875: [feature](file reader) Merge hdfs reader to the new file reader

BePPPower opened a new pull request, #14875:
URL: https://github.com/apache/doris/pull/14875

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [x] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [x] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [x] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [x] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [x] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on code in PR #14875:
URL: https://github.com/apache/doris/pull/14875#discussion_r1041822077


##########
be/src/vec/exec/format/file_reader/new_plain_text_line_reader.h:
##########
@@ -0,0 +1,100 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "exec/line_reader.h"
+#include "util/runtime_profile.h"
+
+namespace doris {
+namespace io {
+class FileReader;
+}
+
+class Decompressor;
+class Status;
+
+class NewPlainTextLineReader : public LineReader {
+public:
+    NewPlainTextLineReader(RuntimeProfile* profile, io::FileReader* file_reader,
+                           Decompressor* decompressor, size_t length,
+                           const std::string& line_delimiter, size_t line_delimiter_length,
+                           size_t current_offset);
+
+    ~NewPlainTextLineReader() override;
+
+    Status read_line(const uint8_t** ptr, size_t* size, bool* eof) override;
+
+    void close() override;
+
+private:
+    bool update_eof();
+
+    size_t output_buf_read_remaining() const { return _output_buf_limit - _output_buf_pos; }
+
+    size_t input_buf_read_remaining() const { return _input_buf_limit - _input_buf_pos; }
+
+    bool done() { return _file_eof && output_buf_read_remaining() == 0; }
+
+    // find line delimiter from 'start' to 'start' + len,
+    // return line delimiter pos if found, otherwise return nullptr.
+    // TODO:
+    //  save to positions of field separator
+    uint8_t* update_field_pos_and_find_line_delimiter(const uint8_t* start, size_t len);
+
+    void extend_input_buf();
+    void extend_output_buf();
+
+private:

Review Comment:
   warning: redundant access specifier has the same accessibility as the previous access specifier [readability-redundant-access-specifiers]
   
   ```suggestion
   
   ```
   **be/src/vec/exec/format/file_reader/new_plain_text_line_reader.h:43:** previously declared here
   ```cpp
   private:
   ^
   ```
   



##########
be/src/io/fs/hdfs_file_system.h:
##########
@@ -0,0 +1,159 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <gen_cpp/PlanNodes_types.h>
+#include <hdfs/hdfs.h>
+
+#include "common/status.h"
+#include "io/fs/remote_file_system.h"
+#include "io/hdfs_file_reader.h"
+namespace doris {
+
+namespace io {
+
+class HdfsFileSystemHandle {
+public:
+    HdfsFileSystemHandle(hdfsFS fs, bool cached)
+            : hdfs_fs(fs), from_cache(cached), _ref_cnt(0), _last_access_time(0), _invalid(false) {}
+
+    ~HdfsFileSystemHandle() {
+        DCHECK(_ref_cnt == 0);
+        if (hdfs_fs != nullptr) {
+            // Even if there is an error, the resources associated with the hdfsFS will be freed.
+            hdfsDisconnect(hdfs_fs);
+        }
+        hdfs_fs = nullptr;
+    }
+
+    int64_t last_access_time() { return _last_access_time; }
+
+    void inc_ref() {
+        _ref_cnt++;
+        _last_access_time = _now();
+    }
+
+    void dec_ref() {
+        _ref_cnt--;
+        _last_access_time = _now();
+    }
+
+    int ref_cnt() { return _ref_cnt; }
+
+    bool invalid() { return _invalid; }
+
+    void set_invalid() { _invalid = true; }
+
+    hdfsFS hdfs_fs;
+    // When cache is full, and all handlers are in use, HdfsFileSystemCache will return an uncached handler.
+    // Client should delete the handler in such case.
+    const bool from_cache;
+
+private:
+    // the number of referenced client
+    std::atomic<int> _ref_cnt;
+    // HdfsFileSystemCache try to remove the oldest handler when the cache is full
+    std::atomic<uint64_t> _last_access_time;
+    // Client will set invalid if error thrown, and HdfsFileSystemCache will not reuse this handler
+    std::atomic<bool> _invalid;
+
+    uint64_t _now() {
+        return std::chrono::duration_cast<std::chrono::milliseconds>(
+                       std::chrono::system_clock::now().time_since_epoch())
+                .count();
+    }
+};
+
+// Cache for HdfsFileSystemHandle
+class HdfsFileSystemCache {
+public:
+    static int MAX_CACHE_HANDLE;
+
+    static HdfsFileSystemCache* instance() {
+        static HdfsFileSystemCache s_instance;
+        return &s_instance;
+    }
+
+    HdfsFileSystemCache(const HdfsFileSystemCache&) = delete;
+    const HdfsFileSystemCache& operator=(const HdfsFileSystemCache&) = delete;
+
+    // This function is thread-safe
+    Status get_connection(THdfsParams& hdfs_params, HdfsFileSystemHandle** fs_handle);
+
+private:
+    std::mutex _lock;
+    std::unordered_map<uint64, std::unique_ptr<HdfsFileSystemHandle>> _cache;
+
+    HdfsFileSystemCache() = default;
+
+    uint64 _hdfs_hash_code(THdfsParams& hdfs_params);
+    Status _create_fs(THdfsParams& hdfs_params, hdfsFS* fs);
+    void _clean_invalid();
+    void _clean_oldest();
+};
+
+class HdfsFileSystem final : public RemoteFileSystem {
+public:
+    HdfsFileSystem(THdfsParams hdfs_params, const std::string& path);
+    ~HdfsFileSystem() override;
+
+    Status create_file(const Path& path, FileWriterPtr* writer) override;
+
+    Status open_file(const Path& path, FileReaderSPtr* reader) override;
+
+    Status delete_file(const Path& path) override;
+
+    Status create_directory(const Path& path) override;
+
+    // Delete all files under path.
+    Status delete_directory(const Path& path) override;
+
+    Status link_file(const Path& src, const Path& dest) override {

Review Comment:
   warning: parameter 'src' is unused [misc-unused-parameters]
   
   ```suggestion
       Status link_file(const Path&  /*src*/, const Path& dest) override {
   ```
   



##########
be/src/vec/exec/format/file_reader/new_file_factory.cpp:
##########
@@ -0,0 +1,144 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "vec/exec/format/file_reader/new_file_factory.h"
+
+#include "io/broker_reader.h"
+#include "io/broker_writer.h"
+#include "io/buffered_reader.h"
+#include "io/fs/file_system.h"
+#include "io/hdfs_reader_writer.h"
+#include "io/local_file_reader.h"
+#include "io/local_file_writer.h"
+#include "io/s3_reader.h"
+#include "io/s3_writer.h"
+#include "runtime/exec_env.h"
+#include "runtime/stream_load/load_stream_mgr.h"
+
+namespace doris {
+
+Status NewFileFactory::create_file_writer(TFileType::type type, ExecEnv* env,
+                                          const std::vector<TNetworkAddress>& broker_addresses,
+                                          const std::map<std::string, std::string>& properties,
+                                          const std::string& path, int64_t start_offset,
+                                          std::unique_ptr<FileWriter>& file_writer) {
+    switch (type) {
+    case TFileType::FILE_LOCAL: {
+        file_writer.reset(new LocalFileWriter(path, start_offset));
+        break;
+    }
+    case TFileType::FILE_BROKER: {
+        file_writer.reset(new BrokerWriter(env, broker_addresses, properties, path, start_offset));
+        break;
+    }
+    case TFileType::FILE_S3: {
+        file_writer.reset(new S3Writer(properties, path, start_offset));
+        break;
+    }
+    case TFileType::FILE_HDFS: {
+        RETURN_IF_ERROR(HdfsReaderWriter::create_writer(
+                const_cast<std::map<std::string, std::string>&>(properties), path, file_writer));
+        break;
+    }
+    default:
+        return Status::InternalError("unsupported file writer type: {}", std::to_string(type));
+    }
+
+    return Status::OK();
+}
+
+// ============================
+// broker scan node/unique ptr
+Status NewFileFactory::create_file_reader(TFileType::type type, ExecEnv* env,
+                                          RuntimeProfile* profile,
+                                          const std::vector<TNetworkAddress>& broker_addresses,
+                                          const std::map<std::string, std::string>& properties,
+                                          const TBrokerRangeDesc& range, int64_t start_offset,
+                                          std::unique_ptr<FileReader>& file_reader) {
+    FileReader* file_reader_ptr;
+    switch (type) {
+    case TFileType::FILE_LOCAL: {
+        file_reader_ptr = new LocalFileReader(range.path, start_offset);
+        break;
+    }
+    case TFileType::FILE_BROKER: {
+        file_reader_ptr = new BufferedReader(
+                profile,
+                new BrokerReader(env, broker_addresses, properties, range.path, start_offset,
+                                 range.__isset.file_size ? range.file_size : 0));
+        break;
+    }
+    case TFileType::FILE_S3: {
+        file_reader_ptr =
+                new BufferedReader(profile, new S3Reader(properties, range.path, start_offset));
+        break;
+    }
+    case TFileType::FILE_HDFS: {
+        FileReader* hdfs_reader = nullptr;
+        RETURN_IF_ERROR(HdfsReaderWriter::create_reader(range.hdfs_params, range.path, start_offset,
+                                                        &hdfs_reader));
+        file_reader_ptr = new BufferedReader(profile, hdfs_reader);
+        break;
+    }
+    default:
+        return Status::InternalError("unsupported file reader type: " + std::to_string(type));
+    }
+    file_reader.reset(file_reader_ptr);
+
+    return Status::OK();
+}
+
+// ============================
+// file scan node/unique ptr
+Status NewFileFactory::create_file_reader(RuntimeProfile* profile,

Review Comment:
   warning: parameter 'profile' is unused [misc-unused-parameters]
   
   ```suggestion
   Status NewFileFactory::create_file_reader(RuntimeProfile*  /*profile*/,
   ```
   



##########
be/src/io/fs/hdfs_file_system.h:
##########
@@ -0,0 +1,159 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <gen_cpp/PlanNodes_types.h>
+#include <hdfs/hdfs.h>
+
+#include "common/status.h"
+#include "io/fs/remote_file_system.h"
+#include "io/hdfs_file_reader.h"
+namespace doris {
+
+namespace io {
+
+class HdfsFileSystemHandle {
+public:
+    HdfsFileSystemHandle(hdfsFS fs, bool cached)
+            : hdfs_fs(fs), from_cache(cached), _ref_cnt(0), _last_access_time(0), _invalid(false) {}
+
+    ~HdfsFileSystemHandle() {
+        DCHECK(_ref_cnt == 0);
+        if (hdfs_fs != nullptr) {
+            // Even if there is an error, the resources associated with the hdfsFS will be freed.
+            hdfsDisconnect(hdfs_fs);
+        }
+        hdfs_fs = nullptr;
+    }
+
+    int64_t last_access_time() { return _last_access_time; }
+
+    void inc_ref() {
+        _ref_cnt++;
+        _last_access_time = _now();
+    }
+
+    void dec_ref() {
+        _ref_cnt--;
+        _last_access_time = _now();
+    }
+
+    int ref_cnt() { return _ref_cnt; }
+
+    bool invalid() { return _invalid; }
+
+    void set_invalid() { _invalid = true; }
+
+    hdfsFS hdfs_fs;
+    // When cache is full, and all handlers are in use, HdfsFileSystemCache will return an uncached handler.
+    // Client should delete the handler in such case.
+    const bool from_cache;
+
+private:
+    // the number of referenced client
+    std::atomic<int> _ref_cnt;
+    // HdfsFileSystemCache try to remove the oldest handler when the cache is full
+    std::atomic<uint64_t> _last_access_time;
+    // Client will set invalid if error thrown, and HdfsFileSystemCache will not reuse this handler
+    std::atomic<bool> _invalid;
+
+    uint64_t _now() {
+        return std::chrono::duration_cast<std::chrono::milliseconds>(
+                       std::chrono::system_clock::now().time_since_epoch())
+                .count();
+    }
+};
+
+// Cache for HdfsFileSystemHandle
+class HdfsFileSystemCache {
+public:
+    static int MAX_CACHE_HANDLE;
+
+    static HdfsFileSystemCache* instance() {
+        static HdfsFileSystemCache s_instance;
+        return &s_instance;
+    }
+
+    HdfsFileSystemCache(const HdfsFileSystemCache&) = delete;
+    const HdfsFileSystemCache& operator=(const HdfsFileSystemCache&) = delete;
+
+    // This function is thread-safe
+    Status get_connection(THdfsParams& hdfs_params, HdfsFileSystemHandle** fs_handle);
+
+private:
+    std::mutex _lock;
+    std::unordered_map<uint64, std::unique_ptr<HdfsFileSystemHandle>> _cache;
+
+    HdfsFileSystemCache() = default;
+
+    uint64 _hdfs_hash_code(THdfsParams& hdfs_params);
+    Status _create_fs(THdfsParams& hdfs_params, hdfsFS* fs);
+    void _clean_invalid();
+    void _clean_oldest();
+};
+
+class HdfsFileSystem final : public RemoteFileSystem {
+public:
+    HdfsFileSystem(THdfsParams hdfs_params, const std::string& path);
+    ~HdfsFileSystem() override;
+
+    Status create_file(const Path& path, FileWriterPtr* writer) override;
+
+    Status open_file(const Path& path, FileReaderSPtr* reader) override;
+
+    Status delete_file(const Path& path) override;
+
+    Status create_directory(const Path& path) override;
+
+    // Delete all files under path.
+    Status delete_directory(const Path& path) override;
+
+    Status link_file(const Path& src, const Path& dest) override {

Review Comment:
   warning: parameter 'dest' is unused [misc-unused-parameters]
   
   ```suggestion
       Status link_file(const Path& src, const Path&  /*dest*/) override {
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #14875:
URL: https://github.com/apache/doris/pull/14875#issuecomment-1340514192

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 34.38 seconds
    load time: 426 seconds
    storage size: 17123356275 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221207072948_clickbench_pr_59162.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14875:
URL: https://github.com/apache/doris/pull/14875#issuecomment-1341014360

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] platoneko commented on a diff in pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
platoneko commented on code in PR #14875:
URL: https://github.com/apache/doris/pull/14875#discussion_r1042348698


##########
be/src/io/fs/hdfs_file_system.h:
##########
@@ -0,0 +1,159 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <gen_cpp/PlanNodes_types.h>
+#include <hdfs/hdfs.h>
+
+#include "common/status.h"
+#include "io/fs/remote_file_system.h"
+#include "io/hdfs_file_reader.h"
+namespace doris {
+
+namespace io {
+
+class HdfsFileSystemHandle {
+public:
+    HdfsFileSystemHandle(hdfsFS fs, bool cached)
+            : hdfs_fs(fs), from_cache(cached), _ref_cnt(0), _last_access_time(0), _invalid(false) {}
+
+    ~HdfsFileSystemHandle() {
+        DCHECK(_ref_cnt == 0);
+        if (hdfs_fs != nullptr) {
+            // Even if there is an error, the resources associated with the hdfsFS will be freed.
+            hdfsDisconnect(hdfs_fs);
+        }
+        hdfs_fs = nullptr;
+    }
+
+    int64_t last_access_time() { return _last_access_time; }
+
+    void inc_ref() {
+        _ref_cnt++;
+        _last_access_time = _now();
+    }
+
+    void dec_ref() {
+        _ref_cnt--;
+        _last_access_time = _now();
+    }
+
+    int ref_cnt() { return _ref_cnt; }
+
+    bool invalid() { return _invalid; }
+
+    void set_invalid() { _invalid = true; }
+
+    hdfsFS hdfs_fs;
+    // When cache is full, and all handlers are in use, HdfsFileSystemCache will return an uncached handler.
+    // Client should delete the handler in such case.
+    const bool from_cache;
+
+private:
+    // the number of referenced client
+    std::atomic<int> _ref_cnt;
+    // HdfsFileSystemCache try to remove the oldest handler when the cache is full
+    std::atomic<uint64_t> _last_access_time;
+    // Client will set invalid if error thrown, and HdfsFileSystemCache will not reuse this handler
+    std::atomic<bool> _invalid;
+
+    uint64_t _now() {
+        return std::chrono::duration_cast<std::chrono::milliseconds>(
+                       std::chrono::system_clock::now().time_since_epoch())
+                .count();
+    }
+};
+
+// Cache for HdfsFileSystemHandle
+class HdfsFileSystemCache {
+public:
+    static int MAX_CACHE_HANDLE;
+
+    static HdfsFileSystemCache* instance() {
+        static HdfsFileSystemCache s_instance;
+        return &s_instance;
+    }
+
+    HdfsFileSystemCache(const HdfsFileSystemCache&) = delete;
+    const HdfsFileSystemCache& operator=(const HdfsFileSystemCache&) = delete;
+
+    // This function is thread-safe
+    Status get_connection(THdfsParams& hdfs_params, HdfsFileSystemHandle** fs_handle);
+
+private:
+    std::mutex _lock;
+    std::unordered_map<uint64, std::unique_ptr<HdfsFileSystemHandle>> _cache;
+
+    HdfsFileSystemCache() = default;
+
+    uint64 _hdfs_hash_code(THdfsParams& hdfs_params);
+    Status _create_fs(THdfsParams& hdfs_params, hdfsFS* fs);
+    void _clean_invalid();
+    void _clean_oldest();
+};
+
+class HdfsFileSystem final : public RemoteFileSystem {
+public:
+    HdfsFileSystem(THdfsParams hdfs_params, const std::string& path);

Review Comment:
   using `const THdfsParams&` is better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] BePPPower commented on a diff in pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
BePPPower commented on code in PR #14875:
URL: https://github.com/apache/doris/pull/14875#discussion_r1043046468


##########
be/src/io/fs/hdfs_file_system.h:
##########
@@ -0,0 +1,159 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <gen_cpp/PlanNodes_types.h>
+#include <hdfs/hdfs.h>
+
+#include "common/status.h"
+#include "io/fs/remote_file_system.h"
+#include "io/hdfs_file_reader.h"
+namespace doris {
+
+namespace io {
+
+class HdfsFileSystemHandle {
+public:
+    HdfsFileSystemHandle(hdfsFS fs, bool cached)
+            : hdfs_fs(fs), from_cache(cached), _ref_cnt(0), _last_access_time(0), _invalid(false) {}
+
+    ~HdfsFileSystemHandle() {
+        DCHECK(_ref_cnt == 0);
+        if (hdfs_fs != nullptr) {
+            // Even if there is an error, the resources associated with the hdfsFS will be freed.
+            hdfsDisconnect(hdfs_fs);
+        }
+        hdfs_fs = nullptr;
+    }
+
+    int64_t last_access_time() { return _last_access_time; }
+
+    void inc_ref() {
+        _ref_cnt++;
+        _last_access_time = _now();
+    }
+
+    void dec_ref() {
+        _ref_cnt--;
+        _last_access_time = _now();
+    }
+
+    int ref_cnt() { return _ref_cnt; }
+
+    bool invalid() { return _invalid; }
+
+    void set_invalid() { _invalid = true; }
+
+    hdfsFS hdfs_fs;
+    // When cache is full, and all handlers are in use, HdfsFileSystemCache will return an uncached handler.
+    // Client should delete the handler in such case.
+    const bool from_cache;
+
+private:
+    // the number of referenced client
+    std::atomic<int> _ref_cnt;
+    // HdfsFileSystemCache try to remove the oldest handler when the cache is full
+    std::atomic<uint64_t> _last_access_time;
+    // Client will set invalid if error thrown, and HdfsFileSystemCache will not reuse this handler
+    std::atomic<bool> _invalid;
+
+    uint64_t _now() {
+        return std::chrono::duration_cast<std::chrono::milliseconds>(
+                       std::chrono::system_clock::now().time_since_epoch())
+                .count();
+    }
+};
+
+// Cache for HdfsFileSystemHandle
+class HdfsFileSystemCache {
+public:
+    static int MAX_CACHE_HANDLE;
+
+    static HdfsFileSystemCache* instance() {
+        static HdfsFileSystemCache s_instance;
+        return &s_instance;
+    }
+
+    HdfsFileSystemCache(const HdfsFileSystemCache&) = delete;
+    const HdfsFileSystemCache& operator=(const HdfsFileSystemCache&) = delete;
+
+    // This function is thread-safe
+    Status get_connection(THdfsParams& hdfs_params, HdfsFileSystemHandle** fs_handle);
+
+private:
+    std::mutex _lock;
+    std::unordered_map<uint64, std::unique_ptr<HdfsFileSystemHandle>> _cache;
+
+    HdfsFileSystemCache() = default;
+
+    uint64 _hdfs_hash_code(THdfsParams& hdfs_params);
+    Status _create_fs(THdfsParams& hdfs_params, hdfsFS* fs);
+    void _clean_invalid();
+    void _clean_oldest();
+};
+
+class HdfsFileSystem final : public RemoteFileSystem {
+public:
+    HdfsFileSystem(THdfsParams hdfs_params, const std::string& path);

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] platoneko commented on a diff in pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
platoneko commented on code in PR #14875:
URL: https://github.com/apache/doris/pull/14875#discussion_r1042353355


##########
be/src/io/fs/hdfs_file_system.h:
##########
@@ -0,0 +1,159 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <gen_cpp/PlanNodes_types.h>
+#include <hdfs/hdfs.h>
+
+#include "common/status.h"
+#include "io/fs/remote_file_system.h"
+#include "io/hdfs_file_reader.h"
+namespace doris {
+
+namespace io {
+
+class HdfsFileSystemHandle {
+public:
+    HdfsFileSystemHandle(hdfsFS fs, bool cached)
+            : hdfs_fs(fs), from_cache(cached), _ref_cnt(0), _last_access_time(0), _invalid(false) {}
+
+    ~HdfsFileSystemHandle() {
+        DCHECK(_ref_cnt == 0);
+        if (hdfs_fs != nullptr) {
+            // Even if there is an error, the resources associated with the hdfsFS will be freed.
+            hdfsDisconnect(hdfs_fs);
+        }
+        hdfs_fs = nullptr;
+    }
+
+    int64_t last_access_time() { return _last_access_time; }
+
+    void inc_ref() {
+        _ref_cnt++;
+        _last_access_time = _now();
+    }
+
+    void dec_ref() {
+        _ref_cnt--;
+        _last_access_time = _now();
+    }
+
+    int ref_cnt() { return _ref_cnt; }
+
+    bool invalid() { return _invalid; }
+
+    void set_invalid() { _invalid = true; }
+
+    hdfsFS hdfs_fs;
+    // When cache is full, and all handlers are in use, HdfsFileSystemCache will return an uncached handler.
+    // Client should delete the handler in such case.
+    const bool from_cache;
+
+private:
+    // the number of referenced client
+    std::atomic<int> _ref_cnt;
+    // HdfsFileSystemCache try to remove the oldest handler when the cache is full
+    std::atomic<uint64_t> _last_access_time;
+    // Client will set invalid if error thrown, and HdfsFileSystemCache will not reuse this handler
+    std::atomic<bool> _invalid;
+
+    uint64_t _now() {
+        return std::chrono::duration_cast<std::chrono::milliseconds>(
+                       std::chrono::system_clock::now().time_since_epoch())
+                .count();
+    }
+};
+
+// Cache for HdfsFileSystemHandle
+class HdfsFileSystemCache {
+public:

Review Comment:
   If `HdfsFileSystemCache` won't be used outside hdfs file system, had better define this class in .cpp



##########
be/src/io/fs/hdfs_file_system.h:
##########
@@ -0,0 +1,159 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <gen_cpp/PlanNodes_types.h>
+#include <hdfs/hdfs.h>
+
+#include "common/status.h"
+#include "io/fs/remote_file_system.h"
+#include "io/hdfs_file_reader.h"
+namespace doris {
+
+namespace io {
+
+class HdfsFileSystemHandle {
+public:
+    HdfsFileSystemHandle(hdfsFS fs, bool cached)
+            : hdfs_fs(fs), from_cache(cached), _ref_cnt(0), _last_access_time(0), _invalid(false) {}
+
+    ~HdfsFileSystemHandle() {
+        DCHECK(_ref_cnt == 0);
+        if (hdfs_fs != nullptr) {
+            // Even if there is an error, the resources associated with the hdfsFS will be freed.
+            hdfsDisconnect(hdfs_fs);
+        }
+        hdfs_fs = nullptr;
+    }
+
+    int64_t last_access_time() { return _last_access_time; }
+
+    void inc_ref() {
+        _ref_cnt++;
+        _last_access_time = _now();
+    }
+
+    void dec_ref() {
+        _ref_cnt--;
+        _last_access_time = _now();
+    }
+
+    int ref_cnt() { return _ref_cnt; }
+
+    bool invalid() { return _invalid; }
+
+    void set_invalid() { _invalid = true; }
+
+    hdfsFS hdfs_fs;
+    // When cache is full, and all handlers are in use, HdfsFileSystemCache will return an uncached handler.
+    // Client should delete the handler in such case.
+    const bool from_cache;
+
+private:
+    // the number of referenced client
+    std::atomic<int> _ref_cnt;
+    // HdfsFileSystemCache try to remove the oldest handler when the cache is full
+    std::atomic<uint64_t> _last_access_time;
+    // Client will set invalid if error thrown, and HdfsFileSystemCache will not reuse this handler
+    std::atomic<bool> _invalid;
+
+    uint64_t _now() {
+        return std::chrono::duration_cast<std::chrono::milliseconds>(
+                       std::chrono::system_clock::now().time_since_epoch())
+                .count();
+    }
+};
+
+// Cache for HdfsFileSystemHandle
+class HdfsFileSystemCache {
+public:

Review Comment:
   If `HdfsFileSystemCache` won't be used outside  of hdfs file system, had better define this class in .cpp



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] BePPPower commented on a diff in pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
BePPPower commented on code in PR #14875:
URL: https://github.com/apache/doris/pull/14875#discussion_r1043046620


##########
be/src/io/hdfs_reader_writer.h:
##########
@@ -39,6 +44,8 @@ class HdfsReaderWriter {
 
     static Status create_writer(const std::map<std::string, std::string>& properties,
                                 const std::string& path, std::unique_ptr<FileWriter>& writer);
-};
 
+    static Status create_new_reader(const THdfsParams& hdfs_params, const std::string& path,

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] platoneko commented on a diff in pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
platoneko commented on code in PR #14875:
URL: https://github.com/apache/doris/pull/14875#discussion_r1042403237


##########
be/src/io/fs/hdfs_file_system.cpp:
##########
@@ -0,0 +1,287 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "io/fs/hdfs_file_system.h"
+
+#include <fcntl.h>
+#include <gen_cpp/PlanNodes_types.h>
+#include <hdfs/hdfs.h>
+
+#include "gutil/hash/hash.h"
+#include "io/fs/hdfs_file_reader.h"
+#include "io/hdfs_builder.h"
+#include "service/backend_options.h"
+
+namespace doris {
+namespace io {
+
+#ifndef CHECK_HDFS_HANDLE
+#define CHECK_HDFS_HANDLE(handle)                               \
+    if (!handle) {                                              \
+        return Status::InternalError("init Hdfs handle error"); \
+    }
+#endif
+
+HdfsFileSystem::HdfsFileSystem(THdfsParams hdfs_params, const std::string& path)
+        : RemoteFileSystem(path, "", FileSystemType::HDFS),
+          _hdfs_params(hdfs_params),
+          _path(path),
+          _fs_handle(nullptr) {
+    _namenode = _hdfs_params.fs_name;
+    // if the format of _path is hdfs://ip:port/path, replace it to /path.
+    // path like hdfs://ip:port/path can't be used by libhdfs3.
+    if (_path.find(_namenode) != std::string::npos) {
+        _path = _path.substr(_namenode.size());
+    }
+}
+
+HdfsFileSystem::~HdfsFileSystem() {
+    if (_fs_handle && _fs_handle->from_cache) {
+        _fs_handle->dec_ref();
+    }
+}
+
+Status HdfsFileSystem::connect() {
+    std::lock_guard lock(_handle_mu);

Review Comment:
   If we never reconnect HdfsFileSystem, no need to use mutex.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14875:
URL: https://github.com/apache/doris/pull/14875#issuecomment-1342430439

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman commented on a diff in pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #14875:
URL: https://github.com/apache/doris/pull/14875#discussion_r1042238674


##########
be/src/vec/exec/format/file_reader/new_file_factory.h:
##########
@@ -0,0 +1,96 @@
+// Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   Move this to `src/io/fs/`



##########
be/src/io/hdfs_reader_writer.h:
##########
@@ -39,6 +44,8 @@ class HdfsReaderWriter {
 
     static Status create_writer(const std::map<std::string, std::string>& properties,
                                 const std::string& path, std::unique_ptr<FileWriter>& writer);
-};
 
+    static Status create_new_reader(const THdfsParams& hdfs_params, const std::string& path,

Review Comment:
   Maybe we should move these methods to `file_factory` too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14875:
URL: https://github.com/apache/doris/pull/14875#issuecomment-1342399717

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14875:
URL: https://github.com/apache/doris/pull/14875#issuecomment-1342622951

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14875:
URL: https://github.com/apache/doris/pull/14875#issuecomment-1340506958

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14875:
URL: https://github.com/apache/doris/pull/14875#issuecomment-1343774860

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman merged pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
morningman merged PR #14875:
URL: https://github.com/apache/doris/pull/14875


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14875:
URL: https://github.com/apache/doris/pull/14875#issuecomment-1343866987

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14875:
URL: https://github.com/apache/doris/pull/14875#issuecomment-1343867011

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14875: [feature](file reader) Merge hdfs reader to the new file reader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14875:
URL: https://github.com/apache/doris/pull/14875#issuecomment-1342251076

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org