You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by AnatoliShein <gi...@git.apache.org> on 2017/06/20 20:49:30 UTC

[GitHub] orc pull request #134: Orc 17

GitHub user AnatoliShein opened a pull request:

    https://github.com/apache/orc/pull/134

    Orc 17

    In this pull request I added **LIBHDFS++** library for reading files from HDFS to **ORC** project. 
    
    Libhdfs++ is located in orc/c++/lib/libhdfspp and by default builds as a light-weight library without examples, tests, and tools (and by this avoids dependencies on JDK, valgrind and gmock). However, if the flag **-DHDFSPP_LIBRARY_ONLY=FALSE** is passed to cmake, then it will build the examples, tests, and tools as well.
    
    Libhdfs++ depends on protobuf libraries in orc/c++/libs/protobuf-2.6.0 and is searching the system for packages Doxygen, OpenSSL, CyrusSASL, GSasl, and Threads dynamically (however only OpenSSL and Threads are required).
    
    The folder libhdfspp also includes a script pull_hdfs.sh which pulls the latest changes from Libhdfs++ Hadoop branch to ORC, and generates file 'imported_timestamp' with the timestamp and the information about the latest commit.
    
    I also updated all the ORC tools to automatically use Libhdfs++ to read ORC files on HDFS if their path begins with 'hdfs://'.
    
    Please review.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AnatoliShein/orc ORC-17

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/orc/pull/134.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #134
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #134: Orc 17

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on the issue:

    https://github.com/apache/orc/pull/134
  
    I am adding a Travis test for OSX os. It should ease catching OSX issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by jamesclampffer <gi...@git.apache.org>.
Github user jamesclampffer commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r125994365
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -34,15 +34,13 @@
     #include "common/hdfs_configuration.h"
     #include "common/configuration_loader.h"
     
    -#define BUF_SIZE 1048576 //1 MB
    -
     namespace orc {
     
       class HdfsFileInputStream : public InputStream {
    --- End diff --
    
    For a mock that does real network stuff libhdfs++ uses the C++/JVM bindings over the minidfscluster (java namenode and datanode in a single process) for its unit tests.  That seems like a very large dependency to bring into this project.  It'd be possible to write a mock that goes over TCP in C/C++, I can think of 2 ways:
    1) Capture network traffic of a read with tcpdump/wireshark for a given request and send the response back.  This would be really brittle if IO request sizes changed and pretty ugly to store a bunch of binary data.
    2) Implement at least some of the server side HDFS RPC protocol and DataTransferProtocol to make a mock namenode and datanode.  This is possible but would be a considerable amount of work that wouldn't be as useful for libhdfs++ since it already has the minidfscluster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r138067160
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -0,0 +1,172 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +#include "orc/OrcFile.hh"
    +
    +#include "Adaptor.hh"
    +#include "Exceptions.hh"
    +
    +#include <errno.h>
    +#include <fcntl.h>
    +#include <stdio.h>
    +#include <sys/mman.h>
    +#include <sys/stat.h>
    +#include <sys/types.h>
    +#include <unistd.h>
    +
    +#include "hdfspp/hdfspp.h"
    +
    +namespace orc {
    +
    +  class HdfsFileInputStream : public InputStream {
    +  private:
    +    std::string filename;
    +    std::unique_ptr<hdfs::FileHandle> file;
    +    std::unique_ptr<hdfs::FileSystem> file_system;
    +    uint64_t totalLength;
    +
    +  public:
    +    HdfsFileInputStream(std::string _filename) {
    +      filename = _filename ;
    +
    +      //Building a URI object from the given uri_path
    +      hdfs::URI uri;
    +      try {
    +        uri = hdfs::URI::parse_from_string(filename);
    +      } catch (const hdfs::uri_parse_error&) {
    +        throw ParseError("Malformed URI: " + filename);
    +      }
    +
    +      //This sets conf path to default "$HADOOP_CONF_DIR" or "/etc/hadoop/conf"
    +      //and loads configs core-site.xml and hdfs-site.xml from the conf path
    +      hdfs::ConfigParser parser;
    +      if(!parser.LoadDefaultResources()){
    +        throw ParseError("Could not load default resources. ");
    +      }
    +      auto stats = parser.ValidateResources();
    +      //validating core-site.xml
    +      if(!stats[0].second.ok()){
    +        throw ParseError(stats[0].first + " is invalid: " + stats[0].second.ToString());
    +      }
    +      //validating hdfs-site.xml
    +      if(!stats[1].second.ok()){
    +        throw ParseError(stats[1].first + " is invalid: " + stats[1].second.ToString());
    +      }
    +      hdfs::Options options;
    +      if(!parser.get_options(options)){
    +        throw ParseError("Could not load Options object. ");
    +      }
    +      hdfs::IoService * io_service = hdfs::IoService::New();
    +      //Wrapping file_system into a unique pointer to guarantee deletion
    +      file_system = std::unique_ptr<hdfs::FileSystem>(
    +          hdfs::FileSystem::New(io_service, "", options));
    +      if (file_system.get() == nullptr) {
    +        throw ParseError("Can't create FileSystem object. ");
    +      }
    +      hdfs::Status status;
    +      //Checking if the user supplied the host
    +      if(!uri.get_host().empty()){
    +        //Using port if supplied, otherwise using "" to look up port in configs
    +        std::string port = uri.has_port() ?
    +            std::to_string(uri.get_port()) : "";
    +        status = file_system->Connect(uri.get_host(), port);
    +        if (!status.ok()) {
    +          throw ParseError("Can't connect to " + uri.get_host()
    +              + ":" + port + ". " + status.ToString());
    +        }
    +      } else {
    +        status = file_system->ConnectToDefaultFs();
    +        if (!status.ok()) {
    +          if(!options.defaultFS.get_host().empty()){
    +            throw ParseError("Error connecting to " +
    +                options.defaultFS.str() + ". " + status.ToString());
    +          } else {
    +            throw ParseError(
    +                "Error connecting to the cluster: defaultFS is empty. "
    +                + status.ToString());
    +          }
    +        }
    +      }
    +
    +      if (file_system.get() == nullptr) {
    +        throw ParseError("Can't connect the file system. ");
    +      }
    +
    +      hdfs::FileHandle *file_raw = nullptr;
    +      status = file_system->Open(uri.get_path(), &file_raw);
    --- End diff --
    
    I am not quite sure why we will want to cache "status", since it is just used for the return values of HDFS operations (success or error message). Could you please explain?


---

[GitHub] orc pull request #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r125674828
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -121,39 +116,24 @@ namespace orc {
         }
     
         uint64_t getNaturalReadSize() const override {
    -      return BUF_SIZE;
    +      return 1048576; //1 MB
         }
     
         void read(void* buf,
                   uint64_t length,
                   uint64_t offset) override {
     
    -      ssize_t total_bytes_read = 0;
           size_t last_bytes_read = 0;
     
           if (!buf) {
             throw ParseError("Buffer is null");
           }
     
           hdfs::Status status;
    -      char input_buffer[BUF_SIZE];
    -      do{
    -        //Reading file chunks
    -        status = file->PositionRead(input_buffer, sizeof(input_buffer), offset, &last_bytes_read);
    -        if(status.ok()) {
    -          //Writing file chunks to buf
    -          if(total_bytes_read + last_bytes_read >= length){
    -            memcpy((char*) buf + total_bytes_read, input_buffer, length - total_bytes_read);
    -            break;
    -          } else {
    -            memcpy((char*) buf + total_bytes_read, input_buffer, last_bytes_read);
    -            total_bytes_read += last_bytes_read;
    -            offset += last_bytes_read;
    -          }
    -        } else {
    -          throw ParseError("Error reading the file: " + status.ToString());
    -        }
    -      } while (true);
    +      status = file->PositionRead(buf, static_cast<size_t>(length), static_cast<off_t>(offset), &last_bytes_read);
    --- End diff --
    
    This makes sense. I just added a loop here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #134: Orc 17

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on the issue:

    https://github.com/apache/orc/pull/134
  
    
    
    .. Owen
    
    > On Aug 18, 2017, at 07:14, jamesclampffer <no...@github.com> wrote:
    > 
    > @majetideepak Is there a way to get the OSX CI to use a generic LLVM/Clang install rather than the one that gets shipped with OSX?
    > 
    
    Don't do that. If it doesn't compile with the "real" Mac OS compiler, you'll break my dev platform.
    > Apparently they apply a patch to Clang to break thread_local, my understanding is they don't want to break the ABI if they ever get around to making builtin thread local specialized for their OS.
    > 
    >         .Case("cxx_thread_local",
    > -                 LangOpts.CPlusPlus11 && PP.getTargetInfo().isTLSSupported() &&
    > -                 !PP.getTargetInfo().getTriple().isOSDarwin())
    > +                 LangOpts.CPlusPlus11 && PP.getTargetInfo().isTLSSupported())
    > Otherwise this won't get coverage on OSX CI builds.
    > 
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub, or mute the thread.
    > 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r125990380
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -34,15 +34,13 @@
     #include "common/hdfs_configuration.h"
     #include "common/configuration_loader.h"
     
    -#define BUF_SIZE 1048576 //1 MB
    -
     namespace orc {
     
       class HdfsFileInputStream : public InputStream {
    --- End diff --
    
    Currently we don't actually have a mocked service interface in libhdfs++.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r138019713
  
    --- Diff: c++/include/orc/OrcFile.hh ---
    @@ -103,12 +103,18 @@ namespace orc {
       };
     
       /**
    -   * Create a stream to a local file.
    +   * Create a stream to a local file or HDFS file if path begins with "hdfs://"
    --- End diff --
    
    why do you want readLocalFile() to handle hdfs files? You already have readHdfsFile() which does exactly that.


---

[GitHub] orc issue #134: Orc 17

Posted by jamesclampffer <gi...@git.apache.org>.
Github user jamesclampffer commented on the issue:

    https://github.com/apache/orc/pull/134
  
    Those errors should go away after HDFS-10787 is finished (or applied to the PR temporarily) since it exposes an API to get at the configuration without using std::tr2::optional.  We had to suppress a bunch of errors coming from optional when it was added to libhdfs++; eventually I'd like to get rid of the remaining uses of it in the implementation as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on the issue:

    https://github.com/apache/orc/pull/134
  
    Hi guys, I now fixed all the build errors and warnings for OS X (with Xcode 8+), and I would like some additional feedback on this pull request if you have any. Thanks!
    (@omalley , @majetideepak , @xndai, @jamesclampffer )


---

[GitHub] orc issue #134: Orc 17

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on the issue:

    https://github.com/apache/orc/pull/134
  
    Unfortunately, this doesn't build on MacOS. :(
    
    I'll take some time this week to figure out what needs to be fixed. I've gone ahead and rebased it to the current master and squashed the commits in https://github.com/omalley/orc/tree/pr/134


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on the issue:

    https://github.com/apache/orc/pull/134
  
    Hi @omalley ,
    What error messages do you get when trying to build on MacOS?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r123842164
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -0,0 +1,170 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +#include "orc/OrcFile.hh"
    +
    +#include "Adaptor.hh"
    +#include "Exceptions.hh"
    +
    +#include <errno.h>
    +#include <fcntl.h>
    +#include <stdio.h>
    +#include <sys/mman.h>
    +#include <sys/stat.h>
    +#include <sys/types.h>
    +#include <unistd.h>
    +
    +#include "hdfspp/hdfspp.h"
    +#include "common/uri.h"
    +#include "common/hdfs_configuration.h"
    +#include "common/configuration_loader.h"
    +
    +#define BUF_SIZE 1048576 //1 MB
    +
    +namespace orc {
    +
    +  class HdfsFileInputStream : public InputStream {
    +  private:
    +    std::string filename;
    +    std::unique_ptr<hdfs::FileHandle> file;
    +    std::shared_ptr<hdfs::FileSystem> file_system;
    +    uint64_t totalLength;
    +
    +  public:
    +    HdfsFileInputStream(std::string _filename) {
    +      filename = _filename ;
    +
    +      //Building a URI object from the given uri_path
    +      hdfs::optional<hdfs::URI> uri = hdfs::URI::parse_from_string(filename);
    +      if (!uri) {
    +        throw ParseError("Malformed URI: " + filename);
    +      }
    +
    +      hdfs::Options options;
    +      //Setting the config path to the default: "$HADOOP_CONF_DIR" or "/etc/hadoop/conf"
    +      hdfs::ConfigurationLoader loader;
    +      //Loading default config files core-site.xml and hdfs-site.xml from the config path
    +      hdfs::optional<hdfs::HdfsConfiguration> config = loader.LoadDefaultResources<hdfs::HdfsConfiguration>();
    +      //TODO: HDFS-9539 - after this is resolved, valid config will always be returned.
    +      if(config){
    +        //Loading options from the config
    +        options = config->GetOptions();
    +      }
    +      hdfs::IoService * io_service = hdfs::IoService::New();
    +      //Wrapping fs into a shared pointer to guarantee deletion
    +      std::shared_ptr<hdfs::FileSystem> fs(hdfs::FileSystem::New(io_service, "", options));
    +      if (!fs) {
    +        throw ParseError("Can't create FileSystem object. ");
    +      }
    +      hdfs::Status status;
    +      //Check if the user supplied the host
    +      if(!uri.value().get_host().empty()){
    +        //If port is supplied we use it, otherwise we use the empty string so that it will be looked up in configs.
    +        std::string port = (uri.value().get_port()) ? std::to_string(uri.value().get_port().value()) : "";
    +        status = fs->Connect(uri.value().get_host(), port);
    +        if (!status.ok()) {
    +          throw ParseError("Can't connect to " + uri.value().get_host() + ":" + port + ". " + status.ToString());
    +        }
    +      } else {
    +        status = fs->ConnectToDefaultFs();
    +        if (!status.ok()) {
    +          if(!options.defaultFS.get_host().empty()){
    +            throw ParseError("Error connecting to " + options.defaultFS.str() + ". " + status.ToString());
    +          } else {
    +            throw ParseError("Error connecting to the cluster: defaultFS is empty. " + status.ToString());
    +          }
    +        }
    +      }
    +
    +      if (!fs) {
    +        throw ParseError("Can't connect the file system. ");
    +      }
    +
    +      hdfs::FileHandle *file_raw = nullptr;
    +      status = fs->Open(uri->get_path(), &file_raw);
    +      if (!status.ok()) {
    +        throw ParseError("Can't open " + uri->get_path() + ". " + status.ToString());
    +      }
    +      //wrapping file_raw into a unique pointer to guarantee deletion
    +      std::unique_ptr<hdfs::FileHandle> file_handle(file_raw);
    +      file = std::move(file_handle);
    +      //transferring the ownership of FileSystem to HdfsFileInputStream to avoid premature deallocation
    +      file_system = fs;
    +
    +      hdfs::StatInfo stat_info;
    +      status = fs->GetFileInfo(uri->get_path(), stat_info);
    +      if (!status.ok()) {
    +        throw ParseError("Can't stat " + uri->get_path() + ". " + status.ToString());
    +      }
    +      totalLength = stat_info.length;
    +    }
    +
    +    ~HdfsFileInputStream();
    +
    +    uint64_t getLength() const override {
    +      return totalLength;
    +    }
    +
    +    uint64_t getNaturalReadSize() const override {
    +      return BUF_SIZE;
    +    }
    +
    +    void read(void* buf,
    +              uint64_t length,
    +              uint64_t offset) override {
    +
    +      ssize_t total_bytes_read = 0;
    +      size_t last_bytes_read = 0;
    +
    +      if (!buf) {
    +        throw ParseError("Buffer is null");
    +      }
    +
    +      hdfs::Status status;
    +      char input_buffer[BUF_SIZE];
    +      do{
    +        //Reading file chunks
    +        status = file->PositionRead(input_buffer, sizeof(input_buffer), offset, &last_bytes_read);
    --- End diff --
    
    I used intermediate buffer because it was used for PositionRead in a similar case. So I just tried without this buffer and it worked.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r125728966
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -123,17 +123,21 @@ namespace orc {
                   uint64_t length,
                   uint64_t offset) override {
     
    -      size_t last_bytes_read = 0;
    -
           if (!buf) {
             throw ParseError("Buffer is null");
           }
     
           hdfs::Status status;
    -      status = file->PositionRead(buf, static_cast<size_t>(length), static_cast<off_t>(offset), &last_bytes_read);
    -      if(!status.ok()) {
    -        throw ParseError("Error reading the file: " + status.ToString());
    -      }
    +      size_t total_bytes_read = 0;
    +      size_t last_bytes_read = 0;
    +      
    +      do {
    +        status = file->PositionRead(buf, static_cast<size_t>(length) - total_bytes_read, static_cast<off_t>(offset + total_bytes_read), &last_bytes_read);
    --- End diff --
    
    Make sure to wrap on 80 characters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r123841444
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -0,0 +1,170 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +#include "orc/OrcFile.hh"
    +
    +#include "Adaptor.hh"
    +#include "Exceptions.hh"
    +
    +#include <errno.h>
    +#include <fcntl.h>
    +#include <stdio.h>
    +#include <sys/mman.h>
    +#include <sys/stat.h>
    +#include <sys/types.h>
    +#include <unistd.h>
    +
    +#include "hdfspp/hdfspp.h"
    +#include "common/uri.h"
    +#include "common/hdfs_configuration.h"
    +#include "common/configuration_loader.h"
    +
    +#define BUF_SIZE 1048576 //1 MB
    +
    +namespace orc {
    +
    +  class HdfsFileInputStream : public InputStream {
    +  private:
    +    std::string filename;
    +    std::unique_ptr<hdfs::FileHandle> file;
    +    std::shared_ptr<hdfs::FileSystem> file_system;
    +    uint64_t totalLength;
    +
    +  public:
    +    HdfsFileInputStream(std::string _filename) {
    +      filename = _filename ;
    +
    +      //Building a URI object from the given uri_path
    +      hdfs::optional<hdfs::URI> uri = hdfs::URI::parse_from_string(filename);
    +      if (!uri) {
    +        throw ParseError("Malformed URI: " + filename);
    +      }
    +
    +      hdfs::Options options;
    +      //Setting the config path to the default: "$HADOOP_CONF_DIR" or "/etc/hadoop/conf"
    +      hdfs::ConfigurationLoader loader;
    +      //Loading default config files core-site.xml and hdfs-site.xml from the config path
    +      hdfs::optional<hdfs::HdfsConfiguration> config = loader.LoadDefaultResources<hdfs::HdfsConfiguration>();
    +      //TODO: HDFS-9539 - after this is resolved, valid config will always be returned.
    +      if(config){
    +        //Loading options from the config
    +        options = config->GetOptions();
    +      }
    +      hdfs::IoService * io_service = hdfs::IoService::New();
    +      //Wrapping fs into a shared pointer to guarantee deletion
    +      std::shared_ptr<hdfs::FileSystem> fs(hdfs::FileSystem::New(io_service, "", options));
    +      if (!fs) {
    +        throw ParseError("Can't create FileSystem object. ");
    +      }
    +      hdfs::Status status;
    +      //Check if the user supplied the host
    +      if(!uri.value().get_host().empty()){
    +        //If port is supplied we use it, otherwise we use the empty string so that it will be looked up in configs.
    +        std::string port = (uri.value().get_port()) ? std::to_string(uri.value().get_port().value()) : "";
    +        status = fs->Connect(uri.value().get_host(), port);
    +        if (!status.ok()) {
    +          throw ParseError("Can't connect to " + uri.value().get_host() + ":" + port + ". " + status.ToString());
    +        }
    +      } else {
    +        status = fs->ConnectToDefaultFs();
    +        if (!status.ok()) {
    +          if(!options.defaultFS.get_host().empty()){
    +            throw ParseError("Error connecting to " + options.defaultFS.str() + ". " + status.ToString());
    +          } else {
    +            throw ParseError("Error connecting to the cluster: defaultFS is empty. " + status.ToString());
    +          }
    +        }
    +      }
    +
    +      if (!fs) {
    +        throw ParseError("Can't connect the file system. ");
    +      }
    +
    +      hdfs::FileHandle *file_raw = nullptr;
    +      status = fs->Open(uri->get_path(), &file_raw);
    +      if (!status.ok()) {
    +        throw ParseError("Can't open " + uri->get_path() + ". " + status.ToString());
    +      }
    +      //wrapping file_raw into a unique pointer to guarantee deletion
    +      std::unique_ptr<hdfs::FileHandle> file_handle(file_raw);
    +      file = std::move(file_handle);
    --- End diff --
    
    Yep, good catch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #134: Orc 17

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on the issue:

    https://github.com/apache/orc/pull/134
  
    LGTM


---

[GitHub] orc pull request #134: Orc 17

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r138018196
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -0,0 +1,172 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +#include "orc/OrcFile.hh"
    +
    +#include "Adaptor.hh"
    +#include "Exceptions.hh"
    +
    +#include <errno.h>
    +#include <fcntl.h>
    +#include <stdio.h>
    +#include <sys/mman.h>
    +#include <sys/stat.h>
    +#include <sys/types.h>
    +#include <unistd.h>
    +
    +#include "hdfspp/hdfspp.h"
    +
    +namespace orc {
    +
    +  class HdfsFileInputStream : public InputStream {
    +  private:
    +    std::string filename;
    +    std::unique_ptr<hdfs::FileHandle> file;
    +    std::unique_ptr<hdfs::FileSystem> file_system;
    +    uint64_t totalLength;
    +
    +  public:
    +    HdfsFileInputStream(std::string _filename) {
    +      filename = _filename ;
    +
    +      //Building a URI object from the given uri_path
    +      hdfs::URI uri;
    +      try {
    +        uri = hdfs::URI::parse_from_string(filename);
    +      } catch (const hdfs::uri_parse_error&) {
    +        throw ParseError("Malformed URI: " + filename);
    +      }
    +
    +      //This sets conf path to default "$HADOOP_CONF_DIR" or "/etc/hadoop/conf"
    +      //and loads configs core-site.xml and hdfs-site.xml from the conf path
    +      hdfs::ConfigParser parser;
    +      if(!parser.LoadDefaultResources()){
    +        throw ParseError("Could not load default resources. ");
    +      }
    +      auto stats = parser.ValidateResources();
    +      //validating core-site.xml
    +      if(!stats[0].second.ok()){
    +        throw ParseError(stats[0].first + " is invalid: " + stats[0].second.ToString());
    +      }
    +      //validating hdfs-site.xml
    +      if(!stats[1].second.ok()){
    +        throw ParseError(stats[1].first + " is invalid: " + stats[1].second.ToString());
    +      }
    +      hdfs::Options options;
    +      if(!parser.get_options(options)){
    +        throw ParseError("Could not load Options object. ");
    +      }
    +      hdfs::IoService * io_service = hdfs::IoService::New();
    +      //Wrapping file_system into a unique pointer to guarantee deletion
    +      file_system = std::unique_ptr<hdfs::FileSystem>(
    +          hdfs::FileSystem::New(io_service, "", options));
    +      if (file_system.get() == nullptr) {
    +        throw ParseError("Can't create FileSystem object. ");
    +      }
    +      hdfs::Status status;
    +      //Checking if the user supplied the host
    +      if(!uri.get_host().empty()){
    +        //Using port if supplied, otherwise using "" to look up port in configs
    +        std::string port = uri.has_port() ?
    +            std::to_string(uri.get_port()) : "";
    +        status = file_system->Connect(uri.get_host(), port);
    +        if (!status.ok()) {
    +          throw ParseError("Can't connect to " + uri.get_host()
    +              + ":" + port + ". " + status.ToString());
    +        }
    +      } else {
    +        status = file_system->ConnectToDefaultFs();
    +        if (!status.ok()) {
    +          if(!options.defaultFS.get_host().empty()){
    +            throw ParseError("Error connecting to " +
    +                options.defaultFS.str() + ". " + status.ToString());
    +          } else {
    +            throw ParseError(
    +                "Error connecting to the cluster: defaultFS is empty. "
    +                + status.ToString());
    +          }
    +        }
    +      }
    +
    +      if (file_system.get() == nullptr) {
    +        throw ParseError("Can't connect the file system. ");
    +      }
    +
    +      hdfs::FileHandle *file_raw = nullptr;
    +      status = file_system->Open(uri.get_path(), &file_raw);
    +      if (!status.ok()) {
    +        throw ParseError("Can't open "
    +            + uri.get_path() + ". " + status.ToString());
    +      }
    +      //Wrapping file_raw into a unique pointer to guarantee deletion
    +      file.reset(file_raw);
    +
    +      hdfs::StatInfo stat_info;
    +      status = file_system->GetFileInfo(uri.get_path(), stat_info);
    +      if (!status.ok()) {
    +        throw ParseError("Can't stat "
    +            + uri.get_path() + ". " + status.ToString());
    +      }
    +      totalLength = stat_info.length;
    +    }
    +
    +    uint64_t getLength() const override {
    +      return totalLength;
    +    }
    +
    +    uint64_t getNaturalReadSize() const override {
    +      return 1048576; //1 MB
    --- End diff --
    
    nit: Use const variable?


---

[GitHub] orc issue #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on the issue:

    https://github.com/apache/orc/pull/134
  
    Hi all! It looks like thread_local feature is by default supported by Xcode 8.0 (and above) which runs on OS X El Capitan (10.11.5) or newer. Is it possible to update the osx_image of the CI-run from "xcode6.4" to at least "xcode8.0"? For older versions we can just display a warning message and not build libhdfspp.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r125668140
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -66,22 +64,22 @@ namespace orc {
             options = config->GetOptions();
           }
           hdfs::IoService * io_service = hdfs::IoService::New();
    -      //Wrapping fs into a shared pointer to guarantee deletion
    -      std::shared_ptr<hdfs::FileSystem> fs(hdfs::FileSystem::New(io_service, "", options));
    -      if (!fs) {
    +      //Wrapping file_system into a unique pointer to guarantee deletion
    +      file_system = std::unique_ptr<hdfs::FileSystem>(hdfs::FileSystem::New(io_service, "", options));
    +      if (!file_system) {
    --- End diff --
    
    Yes, unique pointer actually supports this notation (thanks to [operator bool](http://en.cppreference.com/w/cpp/memory/unique_ptr/operator_bool))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r123853053
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -66,22 +64,22 @@ namespace orc {
             options = config->GetOptions();
           }
           hdfs::IoService * io_service = hdfs::IoService::New();
    -      //Wrapping fs into a shared pointer to guarantee deletion
    -      std::shared_ptr<hdfs::FileSystem> fs(hdfs::FileSystem::New(io_service, "", options));
    -      if (!fs) {
    +      //Wrapping file_system into a unique pointer to guarantee deletion
    +      file_system = std::unique_ptr<hdfs::FileSystem>(hdfs::FileSystem::New(io_service, "", options));
    +      if (!file_system) {
    --- End diff --
    
    are you sure this would work? Should it be if (file_system.get() != nullptr) ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on the issue:

    https://github.com/apache/orc/pull/134
  
    Hi @omalley , the linking should be fixed with the new commit, and we will fix the warnings shortly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r138018152
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -0,0 +1,172 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +#include "orc/OrcFile.hh"
    +
    +#include "Adaptor.hh"
    +#include "Exceptions.hh"
    +
    +#include <errno.h>
    +#include <fcntl.h>
    +#include <stdio.h>
    +#include <sys/mman.h>
    +#include <sys/stat.h>
    +#include <sys/types.h>
    +#include <unistd.h>
    +
    +#include "hdfspp/hdfspp.h"
    +
    +namespace orc {
    +
    +  class HdfsFileInputStream : public InputStream {
    +  private:
    +    std::string filename;
    +    std::unique_ptr<hdfs::FileHandle> file;
    +    std::unique_ptr<hdfs::FileSystem> file_system;
    +    uint64_t totalLength;
    +
    +  public:
    +    HdfsFileInputStream(std::string _filename) {
    +      filename = _filename ;
    +
    +      //Building a URI object from the given uri_path
    +      hdfs::URI uri;
    +      try {
    +        uri = hdfs::URI::parse_from_string(filename);
    +      } catch (const hdfs::uri_parse_error&) {
    +        throw ParseError("Malformed URI: " + filename);
    +      }
    +
    +      //This sets conf path to default "$HADOOP_CONF_DIR" or "/etc/hadoop/conf"
    +      //and loads configs core-site.xml and hdfs-site.xml from the conf path
    +      hdfs::ConfigParser parser;
    +      if(!parser.LoadDefaultResources()){
    +        throw ParseError("Could not load default resources. ");
    +      }
    +      auto stats = parser.ValidateResources();
    +      //validating core-site.xml
    +      if(!stats[0].second.ok()){
    +        throw ParseError(stats[0].first + " is invalid: " + stats[0].second.ToString());
    +      }
    +      //validating hdfs-site.xml
    +      if(!stats[1].second.ok()){
    +        throw ParseError(stats[1].first + " is invalid: " + stats[1].second.ToString());
    +      }
    +      hdfs::Options options;
    +      if(!parser.get_options(options)){
    +        throw ParseError("Could not load Options object. ");
    +      }
    +      hdfs::IoService * io_service = hdfs::IoService::New();
    +      //Wrapping file_system into a unique pointer to guarantee deletion
    +      file_system = std::unique_ptr<hdfs::FileSystem>(
    +          hdfs::FileSystem::New(io_service, "", options));
    +      if (file_system.get() == nullptr) {
    +        throw ParseError("Can't create FileSystem object. ");
    +      }
    +      hdfs::Status status;
    +      //Checking if the user supplied the host
    +      if(!uri.get_host().empty()){
    +        //Using port if supplied, otherwise using "" to look up port in configs
    +        std::string port = uri.has_port() ?
    +            std::to_string(uri.get_port()) : "";
    +        status = file_system->Connect(uri.get_host(), port);
    +        if (!status.ok()) {
    +          throw ParseError("Can't connect to " + uri.get_host()
    +              + ":" + port + ". " + status.ToString());
    +        }
    +      } else {
    +        status = file_system->ConnectToDefaultFs();
    +        if (!status.ok()) {
    +          if(!options.defaultFS.get_host().empty()){
    +            throw ParseError("Error connecting to " +
    +                options.defaultFS.str() + ". " + status.ToString());
    +          } else {
    +            throw ParseError(
    +                "Error connecting to the cluster: defaultFS is empty. "
    +                + status.ToString());
    +          }
    +        }
    +      }
    +
    +      if (file_system.get() == nullptr) {
    +        throw ParseError("Can't connect the file system. ");
    +      }
    +
    +      hdfs::FileHandle *file_raw = nullptr;
    +      status = file_system->Open(uri.get_path(), &file_raw);
    --- End diff --
    
    nit: you may want to cache "status" within the class.


---

[GitHub] orc issue #134: Orc 17

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on the issue:

    https://github.com/apache/orc/pull/134
  
    Can you check if `thread_local` is supported by the platform, and then disable libhdfspp on those platforms?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on the issue:

    https://github.com/apache/orc/pull/134
  
    Now ORC can be built without LIBHDFSPP like this:
    cmake -DBUILD_LIBHDFSPP=off ..


---

[GitHub] orc issue #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on the issue:

    https://github.com/apache/orc/pull/134
  
    Hi @omalley ,
    Yes, I am working on rebasing it right now. Running into some protobuf problems, but it should not be too bad.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r138066140
  
    --- Diff: c++/include/orc/OrcFile.hh ---
    @@ -103,12 +103,18 @@ namespace orc {
       };
     
       /**
    -   * Create a stream to a local file.
    +   * Create a stream to a local file or HDFS file if path begins with "hdfs://"
    --- End diff --
    
    Yes, I guess this is a little confusing. Currently I am calling readHdfsFile from readLocalFile if it starts with "hdfs://". To fix this I will add a new function readFile which will call either readHdfsFile or readLocalFile.


---

[GitHub] orc issue #134: Orc 17

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on the issue:

    https://github.com/apache/orc/pull/134
  
    Sorry, this patch is really out of date with the master. Can you rebase it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/orc/pull/134


---

[GitHub] orc pull request #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r138066200
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -0,0 +1,172 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +#include "orc/OrcFile.hh"
    +
    +#include "Adaptor.hh"
    +#include "Exceptions.hh"
    +
    +#include <errno.h>
    +#include <fcntl.h>
    +#include <stdio.h>
    +#include <sys/mman.h>
    +#include <sys/stat.h>
    +#include <sys/types.h>
    +#include <unistd.h>
    +
    +#include "hdfspp/hdfspp.h"
    +
    +namespace orc {
    +
    +  class HdfsFileInputStream : public InputStream {
    +  private:
    +    std::string filename;
    +    std::unique_ptr<hdfs::FileHandle> file;
    +    std::unique_ptr<hdfs::FileSystem> file_system;
    +    uint64_t totalLength;
    +
    +  public:
    +    HdfsFileInputStream(std::string _filename) {
    +      filename = _filename ;
    +
    +      //Building a URI object from the given uri_path
    +      hdfs::URI uri;
    +      try {
    +        uri = hdfs::URI::parse_from_string(filename);
    +      } catch (const hdfs::uri_parse_error&) {
    +        throw ParseError("Malformed URI: " + filename);
    +      }
    +
    +      //This sets conf path to default "$HADOOP_CONF_DIR" or "/etc/hadoop/conf"
    +      //and loads configs core-site.xml and hdfs-site.xml from the conf path
    +      hdfs::ConfigParser parser;
    +      if(!parser.LoadDefaultResources()){
    +        throw ParseError("Could not load default resources. ");
    +      }
    +      auto stats = parser.ValidateResources();
    +      //validating core-site.xml
    +      if(!stats[0].second.ok()){
    +        throw ParseError(stats[0].first + " is invalid: " + stats[0].second.ToString());
    +      }
    +      //validating hdfs-site.xml
    +      if(!stats[1].second.ok()){
    +        throw ParseError(stats[1].first + " is invalid: " + stats[1].second.ToString());
    +      }
    +      hdfs::Options options;
    +      if(!parser.get_options(options)){
    +        throw ParseError("Could not load Options object. ");
    +      }
    +      hdfs::IoService * io_service = hdfs::IoService::New();
    +      //Wrapping file_system into a unique pointer to guarantee deletion
    +      file_system = std::unique_ptr<hdfs::FileSystem>(
    +          hdfs::FileSystem::New(io_service, "", options));
    +      if (file_system.get() == nullptr) {
    +        throw ParseError("Can't create FileSystem object. ");
    +      }
    +      hdfs::Status status;
    +      //Checking if the user supplied the host
    +      if(!uri.get_host().empty()){
    +        //Using port if supplied, otherwise using "" to look up port in configs
    +        std::string port = uri.has_port() ?
    +            std::to_string(uri.get_port()) : "";
    +        status = file_system->Connect(uri.get_host(), port);
    +        if (!status.ok()) {
    +          throw ParseError("Can't connect to " + uri.get_host()
    +              + ":" + port + ". " + status.ToString());
    +        }
    +      } else {
    +        status = file_system->ConnectToDefaultFs();
    +        if (!status.ok()) {
    +          if(!options.defaultFS.get_host().empty()){
    +            throw ParseError("Error connecting to " +
    +                options.defaultFS.str() + ". " + status.ToString());
    +          } else {
    +            throw ParseError(
    +                "Error connecting to the cluster: defaultFS is empty. "
    +                + status.ToString());
    +          }
    +        }
    +      }
    +
    +      if (file_system.get() == nullptr) {
    +        throw ParseError("Can't connect the file system. ");
    +      }
    +
    +      hdfs::FileHandle *file_raw = nullptr;
    +      status = file_system->Open(uri.get_path(), &file_raw);
    +      if (!status.ok()) {
    +        throw ParseError("Can't open "
    +            + uri.get_path() + ". " + status.ToString());
    +      }
    +      //Wrapping file_raw into a unique pointer to guarantee deletion
    +      file.reset(file_raw);
    +
    +      hdfs::StatInfo stat_info;
    +      status = file_system->GetFileInfo(uri.get_path(), stat_info);
    +      if (!status.ok()) {
    +        throw ParseError("Can't stat "
    +            + uri.get_path() + ". " + status.ToString());
    +      }
    +      totalLength = stat_info.length;
    +    }
    +
    +    uint64_t getLength() const override {
    +      return totalLength;
    +    }
    +
    +    uint64_t getNaturalReadSize() const override {
    +      return 1048576; //1 MB
    --- End diff --
    
    This sounds good, I will update this.


---

[GitHub] orc pull request #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r123841451
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -0,0 +1,170 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +#include "orc/OrcFile.hh"
    +
    +#include "Adaptor.hh"
    +#include "Exceptions.hh"
    +
    +#include <errno.h>
    +#include <fcntl.h>
    +#include <stdio.h>
    +#include <sys/mman.h>
    +#include <sys/stat.h>
    +#include <sys/types.h>
    +#include <unistd.h>
    +
    +#include "hdfspp/hdfspp.h"
    +#include "common/uri.h"
    +#include "common/hdfs_configuration.h"
    +#include "common/configuration_loader.h"
    +
    +#define BUF_SIZE 1048576 //1 MB
    +
    +namespace orc {
    +
    +  class HdfsFileInputStream : public InputStream {
    +  private:
    +    std::string filename;
    +    std::unique_ptr<hdfs::FileHandle> file;
    +    std::shared_ptr<hdfs::FileSystem> file_system;
    +    uint64_t totalLength;
    +
    +  public:
    +    HdfsFileInputStream(std::string _filename) {
    +      filename = _filename ;
    +
    +      //Building a URI object from the given uri_path
    +      hdfs::optional<hdfs::URI> uri = hdfs::URI::parse_from_string(filename);
    +      if (!uri) {
    +        throw ParseError("Malformed URI: " + filename);
    +      }
    +
    +      hdfs::Options options;
    +      //Setting the config path to the default: "$HADOOP_CONF_DIR" or "/etc/hadoop/conf"
    +      hdfs::ConfigurationLoader loader;
    +      //Loading default config files core-site.xml and hdfs-site.xml from the config path
    +      hdfs::optional<hdfs::HdfsConfiguration> config = loader.LoadDefaultResources<hdfs::HdfsConfiguration>();
    +      //TODO: HDFS-9539 - after this is resolved, valid config will always be returned.
    +      if(config){
    +        //Loading options from the config
    +        options = config->GetOptions();
    +      }
    +      hdfs::IoService * io_service = hdfs::IoService::New();
    +      //Wrapping fs into a shared pointer to guarantee deletion
    +      std::shared_ptr<hdfs::FileSystem> fs(hdfs::FileSystem::New(io_service, "", options));
    +      if (!fs) {
    +        throw ParseError("Can't create FileSystem object. ");
    +      }
    +      hdfs::Status status;
    +      //Check if the user supplied the host
    +      if(!uri.value().get_host().empty()){
    +        //If port is supplied we use it, otherwise we use the empty string so that it will be looked up in configs.
    +        std::string port = (uri.value().get_port()) ? std::to_string(uri.value().get_port().value()) : "";
    +        status = fs->Connect(uri.value().get_host(), port);
    +        if (!status.ok()) {
    +          throw ParseError("Can't connect to " + uri.value().get_host() + ":" + port + ". " + status.ToString());
    +        }
    +      } else {
    +        status = fs->ConnectToDefaultFs();
    +        if (!status.ok()) {
    +          if(!options.defaultFS.get_host().empty()){
    +            throw ParseError("Error connecting to " + options.defaultFS.str() + ". " + status.ToString());
    +          } else {
    +            throw ParseError("Error connecting to the cluster: defaultFS is empty. " + status.ToString());
    +          }
    +        }
    +      }
    +
    +      if (!fs) {
    +        throw ParseError("Can't connect the file system. ");
    +      }
    +
    +      hdfs::FileHandle *file_raw = nullptr;
    +      status = fs->Open(uri->get_path(), &file_raw);
    +      if (!status.ok()) {
    +        throw ParseError("Can't open " + uri->get_path() + ". " + status.ToString());
    +      }
    +      //wrapping file_raw into a unique pointer to guarantee deletion
    +      std::unique_ptr<hdfs::FileHandle> file_handle(file_raw);
    +      file = std::move(file_handle);
    +      //transferring the ownership of FileSystem to HdfsFileInputStream to avoid premature deallocation
    +      file_system = fs;
    --- End diff --
    
    This makes sense, we don't really need a shared pointer here. Just fixed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r123646956
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -0,0 +1,170 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +#include "orc/OrcFile.hh"
    +
    +#include "Adaptor.hh"
    +#include "Exceptions.hh"
    +
    +#include <errno.h>
    +#include <fcntl.h>
    +#include <stdio.h>
    +#include <sys/mman.h>
    +#include <sys/stat.h>
    +#include <sys/types.h>
    +#include <unistd.h>
    +
    +#include "hdfspp/hdfspp.h"
    +#include "common/uri.h"
    +#include "common/hdfs_configuration.h"
    +#include "common/configuration_loader.h"
    +
    +#define BUF_SIZE 1048576 //1 MB
    +
    +namespace orc {
    +
    +  class HdfsFileInputStream : public InputStream {
    +  private:
    +    std::string filename;
    +    std::unique_ptr<hdfs::FileHandle> file;
    +    std::shared_ptr<hdfs::FileSystem> file_system;
    +    uint64_t totalLength;
    +
    +  public:
    +    HdfsFileInputStream(std::string _filename) {
    +      filename = _filename ;
    +
    +      //Building a URI object from the given uri_path
    +      hdfs::optional<hdfs::URI> uri = hdfs::URI::parse_from_string(filename);
    +      if (!uri) {
    +        throw ParseError("Malformed URI: " + filename);
    +      }
    +
    +      hdfs::Options options;
    +      //Setting the config path to the default: "$HADOOP_CONF_DIR" or "/etc/hadoop/conf"
    +      hdfs::ConfigurationLoader loader;
    +      //Loading default config files core-site.xml and hdfs-site.xml from the config path
    +      hdfs::optional<hdfs::HdfsConfiguration> config = loader.LoadDefaultResources<hdfs::HdfsConfiguration>();
    +      //TODO: HDFS-9539 - after this is resolved, valid config will always be returned.
    +      if(config){
    +        //Loading options from the config
    +        options = config->GetOptions();
    +      }
    +      hdfs::IoService * io_service = hdfs::IoService::New();
    +      //Wrapping fs into a shared pointer to guarantee deletion
    +      std::shared_ptr<hdfs::FileSystem> fs(hdfs::FileSystem::New(io_service, "", options));
    +      if (!fs) {
    +        throw ParseError("Can't create FileSystem object. ");
    +      }
    +      hdfs::Status status;
    +      //Check if the user supplied the host
    +      if(!uri.value().get_host().empty()){
    +        //If port is supplied we use it, otherwise we use the empty string so that it will be looked up in configs.
    +        std::string port = (uri.value().get_port()) ? std::to_string(uri.value().get_port().value()) : "";
    +        status = fs->Connect(uri.value().get_host(), port);
    +        if (!status.ok()) {
    +          throw ParseError("Can't connect to " + uri.value().get_host() + ":" + port + ". " + status.ToString());
    +        }
    +      } else {
    +        status = fs->ConnectToDefaultFs();
    +        if (!status.ok()) {
    +          if(!options.defaultFS.get_host().empty()){
    +            throw ParseError("Error connecting to " + options.defaultFS.str() + ". " + status.ToString());
    +          } else {
    +            throw ParseError("Error connecting to the cluster: defaultFS is empty. " + status.ToString());
    +          }
    +        }
    +      }
    +
    +      if (!fs) {
    +        throw ParseError("Can't connect the file system. ");
    +      }
    +
    +      hdfs::FileHandle *file_raw = nullptr;
    +      status = fs->Open(uri->get_path(), &file_raw);
    +      if (!status.ok()) {
    +        throw ParseError("Can't open " + uri->get_path() + ". " + status.ToString());
    +      }
    +      //wrapping file_raw into a unique pointer to guarantee deletion
    +      std::unique_ptr<hdfs::FileHandle> file_handle(file_raw);
    +      file = std::move(file_handle);
    +      //transferring the ownership of FileSystem to HdfsFileInputStream to avoid premature deallocation
    +      file_system = fs;
    +
    +      hdfs::StatInfo stat_info;
    +      status = fs->GetFileInfo(uri->get_path(), stat_info);
    +      if (!status.ok()) {
    +        throw ParseError("Can't stat " + uri->get_path() + ". " + status.ToString());
    +      }
    +      totalLength = stat_info.length;
    +    }
    +
    +    ~HdfsFileInputStream();
    +
    +    uint64_t getLength() const override {
    +      return totalLength;
    +    }
    +
    +    uint64_t getNaturalReadSize() const override {
    +      return BUF_SIZE;
    +    }
    +
    +    void read(void* buf,
    +              uint64_t length,
    +              uint64_t offset) override {
    +
    +      ssize_t total_bytes_read = 0;
    +      size_t last_bytes_read = 0;
    +
    +      if (!buf) {
    +        throw ParseError("Buffer is null");
    +      }
    +
    +      hdfs::Status status;
    +      char input_buffer[BUF_SIZE];
    +      do{
    +        //Reading file chunks
    +        status = file->PositionRead(input_buffer, sizeof(input_buffer), offset, &last_bytes_read);
    --- End diff --
    
    Why do we need an intermedia buffer here? Can we just pass in "buf" and "length" into PositionRead()? This way we don't need following memcpy() and can also simply the logic here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r125674525
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -34,15 +34,13 @@
     #include "common/hdfs_configuration.h"
     #include "common/configuration_loader.h"
     
    -#define BUF_SIZE 1048576 //1 MB
    -
     namespace orc {
     
       class HdfsFileInputStream : public InputStream {
    --- End diff --
    
    I don't currently have the UTs for this class because to test it we need a hadoop cluster, and in order to fire one up we would need to bring the entire hadoop tree into ORC (which is actually much larger than ORC), and it can make things very heavy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #134: Orc 17

Posted by jamesclampffer <gi...@git.apache.org>.
Github user jamesclampffer commented on the issue:

    https://github.com/apache/orc/pull/134
  
    @majetideepak Is there a way to get the OSX CI to use a generic LLVM/Clang install rather than the one that gets shipped with OSX?
    
    Apparently they apply a patch to Clang to break thread_local, my understanding is they don't want to break the ABI if they ever get around to making builtin thread local specialized for their OS.
    ```
            .Case("cxx_thread_local",
    -                 LangOpts.CPlusPlus11 && PP.getTargetInfo().isTLSSupported() &&
    -                 !PP.getTargetInfo().getTriple().isOSDarwin())
    +                 LangOpts.CPlusPlus11 && PP.getTargetInfo().isTLSSupported())
    ```
    
    Otherwise this won't get coverage on OSX CI builds.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #134: Orc 17

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on the issue:

    https://github.com/apache/orc/pull/134
  
    It looks like it is a "long" vs "long long" problem.
    
    ````
    /Users/owen/work/code/orc/build/libhdfspp_ep-prefix/src/libhdfspp_ep/lib/common/configuration.cc:85:12: error: no viable conversion from returned value of type 'optional<long>' to function return type 'optional<long long>'
        return result;
               ^~~~~~
    /Users/owen/work/code/orc/build/libhdfspp_ep-prefix/src/libhdfspp_ep/third_party/tr2/optional.hpp:427:13: note: candidate constructor not viable: no known conversion from 'std::experimental::optional<long>' to 'std::experimental::nullopt_t' for 1st argument
      constexpr optional(nullopt_t) noexcept : OptionalBase<T>() {};
                ^
    /Users/owen/work/code/orc/build/libhdfspp_ep-prefix/src/libhdfspp_ep/third_party/tr2/optional.hpp:429:3: note: candidate constructor not viable: no known conversion from 'std::experimental::optional<long>' to 'const std::experimental::optional<long long> &' for 1st argument
      optional(const optional& rhs)
      ^
    /Users/owen/work/code/orc/build/libhdfspp_ep-prefix/src/libhdfspp_ep/third_party/tr2/optional.hpp:438:3: note: candidate constructor not viable: no known conversion from 'std::experimental::optional<long>' to 'std::experimental::optional<long long> &&' for 1st argument
      optional(optional&& rhs) noexcept(is_nothrow_move_constructible<T>::value)
      ^
    /Users/owen/work/code/orc/build/libhdfspp_ep-prefix/src/libhdfspp_ep/third_party/tr2/optional.hpp:447:13: note: candidate constructor not viable: no known conversion from 'std::experimental::optional<long>' to 'const long long &' for 1st argument
      constexpr optional(const T& v) : OptionalBase<T>(v) {}
                ^
    /Users/owen/work/code/orc/build/libhdfspp_ep-prefix/src/libhdfspp_ep/third_party/tr2/optional.hpp:449:13: note: candidate constructor not viable: no known conversion from 'std::experimental::optional<long>' to 'long long &&' for 1st argument
      constexpr optional(T&& v) : OptionalBase<T>(constexpr_move(v)) {}
                ^
    1 error generated.
    make[5]: *** [lib/common/CMakeFiles/common_obj.dir/configuration.cc.o] Error 1
    make[4]: *** [lib/common/CMakeFiles/common_obj.dir/all] Error 2
    make[3]: *** [all] Error 2
    ````
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r125901693
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -123,17 +123,21 @@ namespace orc {
                   uint64_t length,
                   uint64_t offset) override {
     
    -      size_t last_bytes_read = 0;
    -
           if (!buf) {
             throw ParseError("Buffer is null");
           }
     
           hdfs::Status status;
    -      status = file->PositionRead(buf, static_cast<size_t>(length), static_cast<off_t>(offset), &last_bytes_read);
    -      if(!status.ok()) {
    -        throw ParseError("Error reading the file: " + status.ToString());
    -      }
    +      size_t total_bytes_read = 0;
    +      size_t last_bytes_read = 0;
    +      
    +      do {
    +        status = file->PositionRead(buf, static_cast<size_t>(length) - total_bytes_read, static_cast<off_t>(offset + total_bytes_read), &last_bytes_read);
    --- End diff --
    
    Just fixed the whole file to wrap on 80 chars.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r123647405
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -0,0 +1,170 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +#include "orc/OrcFile.hh"
    +
    +#include "Adaptor.hh"
    +#include "Exceptions.hh"
    +
    +#include <errno.h>
    +#include <fcntl.h>
    +#include <stdio.h>
    +#include <sys/mman.h>
    +#include <sys/stat.h>
    +#include <sys/types.h>
    +#include <unistd.h>
    +
    +#include "hdfspp/hdfspp.h"
    +#include "common/uri.h"
    +#include "common/hdfs_configuration.h"
    +#include "common/configuration_loader.h"
    +
    +#define BUF_SIZE 1048576 //1 MB
    +
    +namespace orc {
    +
    +  class HdfsFileInputStream : public InputStream {
    +  private:
    +    std::string filename;
    +    std::unique_ptr<hdfs::FileHandle> file;
    +    std::shared_ptr<hdfs::FileSystem> file_system;
    +    uint64_t totalLength;
    +
    +  public:
    +    HdfsFileInputStream(std::string _filename) {
    +      filename = _filename ;
    +
    +      //Building a URI object from the given uri_path
    +      hdfs::optional<hdfs::URI> uri = hdfs::URI::parse_from_string(filename);
    +      if (!uri) {
    +        throw ParseError("Malformed URI: " + filename);
    +      }
    +
    +      hdfs::Options options;
    +      //Setting the config path to the default: "$HADOOP_CONF_DIR" or "/etc/hadoop/conf"
    +      hdfs::ConfigurationLoader loader;
    +      //Loading default config files core-site.xml and hdfs-site.xml from the config path
    +      hdfs::optional<hdfs::HdfsConfiguration> config = loader.LoadDefaultResources<hdfs::HdfsConfiguration>();
    +      //TODO: HDFS-9539 - after this is resolved, valid config will always be returned.
    +      if(config){
    +        //Loading options from the config
    +        options = config->GetOptions();
    +      }
    +      hdfs::IoService * io_service = hdfs::IoService::New();
    +      //Wrapping fs into a shared pointer to guarantee deletion
    +      std::shared_ptr<hdfs::FileSystem> fs(hdfs::FileSystem::New(io_service, "", options));
    +      if (!fs) {
    +        throw ParseError("Can't create FileSystem object. ");
    +      }
    +      hdfs::Status status;
    +      //Check if the user supplied the host
    +      if(!uri.value().get_host().empty()){
    +        //If port is supplied we use it, otherwise we use the empty string so that it will be looked up in configs.
    +        std::string port = (uri.value().get_port()) ? std::to_string(uri.value().get_port().value()) : "";
    +        status = fs->Connect(uri.value().get_host(), port);
    +        if (!status.ok()) {
    +          throw ParseError("Can't connect to " + uri.value().get_host() + ":" + port + ". " + status.ToString());
    +        }
    +      } else {
    +        status = fs->ConnectToDefaultFs();
    +        if (!status.ok()) {
    +          if(!options.defaultFS.get_host().empty()){
    +            throw ParseError("Error connecting to " + options.defaultFS.str() + ". " + status.ToString());
    +          } else {
    +            throw ParseError("Error connecting to the cluster: defaultFS is empty. " + status.ToString());
    +          }
    +        }
    +      }
    +
    +      if (!fs) {
    +        throw ParseError("Can't connect the file system. ");
    +      }
    +
    +      hdfs::FileHandle *file_raw = nullptr;
    +      status = fs->Open(uri->get_path(), &file_raw);
    +      if (!status.ok()) {
    +        throw ParseError("Can't open " + uri->get_path() + ". " + status.ToString());
    +      }
    +      //wrapping file_raw into a unique pointer to guarantee deletion
    +      std::unique_ptr<hdfs::FileHandle> file_handle(file_raw);
    +      file = std::move(file_handle);
    +      //transferring the ownership of FileSystem to HdfsFileInputStream to avoid premature deallocation
    +      file_system = fs;
    --- End diff --
    
    Not quite understand why we need a local variable "fs". You could just initialize this->file_system. Also "fs" should have been a unique_ptr as it doesn't share ownership with anyone. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #134: Orc 17

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on the issue:

    https://github.com/apache/orc/pull/134
  
    Can we add libhdfspp as a tarball and add a line to untar it?
    I am going to make these changes for other libraries in the lib folder as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by jamesclampffer <gi...@git.apache.org>.
Github user jamesclampffer commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r126017188
  
    --- Diff: c++/libs/libhdfspp/CMakeLists.txt ---
    @@ -0,0 +1,248 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# If cmake variable HDFSPP_LIBRARY_ONLY is set, then tests, examples, and
    +# tools will not be built. This allows for faster builds of the libhdfspp
    +# library alone, avoids looking for a JDK, valgrind, and gmock, and
    +# prevents the generation of multiple binaries that might not be relevant
    +# to other projects during normal use.
    +# Example of cmake invocation with HDFSPP_LIBRARY_ONLY enabled:
    +# cmake -DHDFSPP_LIBRARY_ONLY=1
    +
    +project (libhdfspp)
    +
    +cmake_minimum_required(VERSION 2.8)
    +
    +enable_testing()
    +include (CTest)
    +SET(CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/CMake" ${CMAKE_MODULE_PATH})
    +
    +# If there's a better way to inform FindCyrusSASL.cmake, let's make this cleaner:
    +SET(CMAKE_PREFIX_PATH "${CMAKE_PREFIX_PATH};${CYRUS_SASL_DIR};${GSASL_DIR}")
    +
    +find_package(Doxygen)
    +find_package(OpenSSL REQUIRED)
    +find_package(Protobuf REQUIRED)
    +find_package(CyrusSASL)
    +find_package(GSasl)
    --- End diff --
    
    I think you're going to need a flag to make sure it doesn't attempt to find and link against GSasl since the Apache and GPL licenses aren't compatible.  Eventually GSasl will be removed from libhdfs++ for the same reason.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r125728350
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -66,22 +64,22 @@ namespace orc {
             options = config->GetOptions();
           }
           hdfs::IoService * io_service = hdfs::IoService::New();
    -      //Wrapping fs into a shared pointer to guarantee deletion
    -      std::shared_ptr<hdfs::FileSystem> fs(hdfs::FileSystem::New(io_service, "", options));
    -      if (!fs) {
    +      //Wrapping file_system into a unique pointer to guarantee deletion
    +      file_system = std::unique_ptr<hdfs::FileSystem>(hdfs::FileSystem::New(io_service, "", options));
    +      if (!file_system) {
    --- End diff --
    
    Unfortunately unique_ptr can be redefined as auto_ptr if platform doesn't support unique_ptr (see orc_config.hh). We talked about removing this redefine, but it hasn't been done yet. So I'd suggest we stick to file_system.get() != nullptr for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r123647562
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -0,0 +1,170 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +#include "orc/OrcFile.hh"
    +
    +#include "Adaptor.hh"
    +#include "Exceptions.hh"
    +
    +#include <errno.h>
    +#include <fcntl.h>
    +#include <stdio.h>
    +#include <sys/mman.h>
    +#include <sys/stat.h>
    +#include <sys/types.h>
    +#include <unistd.h>
    +
    +#include "hdfspp/hdfspp.h"
    +#include "common/uri.h"
    +#include "common/hdfs_configuration.h"
    +#include "common/configuration_loader.h"
    +
    +#define BUF_SIZE 1048576 //1 MB
    +
    +namespace orc {
    +
    +  class HdfsFileInputStream : public InputStream {
    +  private:
    +    std::string filename;
    +    std::unique_ptr<hdfs::FileHandle> file;
    +    std::shared_ptr<hdfs::FileSystem> file_system;
    +    uint64_t totalLength;
    +
    +  public:
    +    HdfsFileInputStream(std::string _filename) {
    +      filename = _filename ;
    +
    +      //Building a URI object from the given uri_path
    +      hdfs::optional<hdfs::URI> uri = hdfs::URI::parse_from_string(filename);
    +      if (!uri) {
    +        throw ParseError("Malformed URI: " + filename);
    +      }
    +
    +      hdfs::Options options;
    +      //Setting the config path to the default: "$HADOOP_CONF_DIR" or "/etc/hadoop/conf"
    +      hdfs::ConfigurationLoader loader;
    +      //Loading default config files core-site.xml and hdfs-site.xml from the config path
    +      hdfs::optional<hdfs::HdfsConfiguration> config = loader.LoadDefaultResources<hdfs::HdfsConfiguration>();
    +      //TODO: HDFS-9539 - after this is resolved, valid config will always be returned.
    +      if(config){
    +        //Loading options from the config
    +        options = config->GetOptions();
    +      }
    +      hdfs::IoService * io_service = hdfs::IoService::New();
    +      //Wrapping fs into a shared pointer to guarantee deletion
    +      std::shared_ptr<hdfs::FileSystem> fs(hdfs::FileSystem::New(io_service, "", options));
    +      if (!fs) {
    +        throw ParseError("Can't create FileSystem object. ");
    +      }
    +      hdfs::Status status;
    +      //Check if the user supplied the host
    +      if(!uri.value().get_host().empty()){
    +        //If port is supplied we use it, otherwise we use the empty string so that it will be looked up in configs.
    +        std::string port = (uri.value().get_port()) ? std::to_string(uri.value().get_port().value()) : "";
    +        status = fs->Connect(uri.value().get_host(), port);
    +        if (!status.ok()) {
    +          throw ParseError("Can't connect to " + uri.value().get_host() + ":" + port + ". " + status.ToString());
    +        }
    +      } else {
    +        status = fs->ConnectToDefaultFs();
    +        if (!status.ok()) {
    +          if(!options.defaultFS.get_host().empty()){
    +            throw ParseError("Error connecting to " + options.defaultFS.str() + ". " + status.ToString());
    +          } else {
    +            throw ParseError("Error connecting to the cluster: defaultFS is empty. " + status.ToString());
    +          }
    +        }
    +      }
    +
    +      if (!fs) {
    +        throw ParseError("Can't connect the file system. ");
    +      }
    +
    +      hdfs::FileHandle *file_raw = nullptr;
    +      status = fs->Open(uri->get_path(), &file_raw);
    +      if (!status.ok()) {
    +        throw ParseError("Can't open " + uri->get_path() + ". " + status.ToString());
    +      }
    +      //wrapping file_raw into a unique pointer to guarantee deletion
    +      std::unique_ptr<hdfs::FileHandle> file_handle(file_raw);
    +      file = std::move(file_handle);
    --- End diff --
    
    You could combine these two lines into file.reset(file_raw);


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on the issue:

    https://github.com/apache/orc/pull/134
  
    Rebased the branch and resolved the protobuf issues, but still need to move Libhdfspp out of the libs and fix other samll issues. Testing if it passes CI-tests now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #134: Orc 17

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on the issue:

    https://github.com/apache/orc/pull/134
  
    There are a lot of warnings in libhdfs that clang reports.
    
    It is currently failing in:
    [ 91%] Linking CXX shared library libhdfspp.dylib
    
    with link errors about sasl and protobuf.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on the issue:

    https://github.com/apache/orc/pull/134
  
    Looks like [THIS](https://docs.travis-ci.com/user/reference/osx/#OS-X-Version) is the list of available CI images for OS X. I will try osx_image: xcode8.3 now and see what happens.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r138531251
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -0,0 +1,172 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +#include "orc/OrcFile.hh"
    +
    +#include "Adaptor.hh"
    +#include "Exceptions.hh"
    +
    +#include <errno.h>
    +#include <fcntl.h>
    +#include <stdio.h>
    +#include <sys/mman.h>
    +#include <sys/stat.h>
    +#include <sys/types.h>
    +#include <unistd.h>
    +
    +#include "hdfspp/hdfspp.h"
    +
    +namespace orc {
    +
    +  class HdfsFileInputStream : public InputStream {
    +  private:
    +    std::string filename;
    +    std::unique_ptr<hdfs::FileHandle> file;
    +    std::unique_ptr<hdfs::FileSystem> file_system;
    +    uint64_t totalLength;
    +
    +  public:
    +    HdfsFileInputStream(std::string _filename) {
    +      filename = _filename ;
    +
    +      //Building a URI object from the given uri_path
    +      hdfs::URI uri;
    +      try {
    +        uri = hdfs::URI::parse_from_string(filename);
    +      } catch (const hdfs::uri_parse_error&) {
    +        throw ParseError("Malformed URI: " + filename);
    +      }
    +
    +      //This sets conf path to default "$HADOOP_CONF_DIR" or "/etc/hadoop/conf"
    +      //and loads configs core-site.xml and hdfs-site.xml from the conf path
    +      hdfs::ConfigParser parser;
    +      if(!parser.LoadDefaultResources()){
    +        throw ParseError("Could not load default resources. ");
    +      }
    +      auto stats = parser.ValidateResources();
    +      //validating core-site.xml
    +      if(!stats[0].second.ok()){
    +        throw ParseError(stats[0].first + " is invalid: " + stats[0].second.ToString());
    +      }
    +      //validating hdfs-site.xml
    +      if(!stats[1].second.ok()){
    +        throw ParseError(stats[1].first + " is invalid: " + stats[1].second.ToString());
    +      }
    +      hdfs::Options options;
    +      if(!parser.get_options(options)){
    +        throw ParseError("Could not load Options object. ");
    +      }
    +      hdfs::IoService * io_service = hdfs::IoService::New();
    +      //Wrapping file_system into a unique pointer to guarantee deletion
    +      file_system = std::unique_ptr<hdfs::FileSystem>(
    +          hdfs::FileSystem::New(io_service, "", options));
    +      if (file_system.get() == nullptr) {
    +        throw ParseError("Can't create FileSystem object. ");
    +      }
    +      hdfs::Status status;
    +      //Checking if the user supplied the host
    +      if(!uri.get_host().empty()){
    +        //Using port if supplied, otherwise using "" to look up port in configs
    +        std::string port = uri.has_port() ?
    +            std::to_string(uri.get_port()) : "";
    +        status = file_system->Connect(uri.get_host(), port);
    +        if (!status.ok()) {
    +          throw ParseError("Can't connect to " + uri.get_host()
    +              + ":" + port + ". " + status.ToString());
    +        }
    +      } else {
    +        status = file_system->ConnectToDefaultFs();
    +        if (!status.ok()) {
    +          if(!options.defaultFS.get_host().empty()){
    +            throw ParseError("Error connecting to " +
    +                options.defaultFS.str() + ". " + status.ToString());
    +          } else {
    +            throw ParseError(
    +                "Error connecting to the cluster: defaultFS is empty. "
    +                + status.ToString());
    +          }
    +        }
    +      }
    +
    +      if (file_system.get() == nullptr) {
    +        throw ParseError("Can't connect the file system. ");
    +      }
    +
    +      hdfs::FileHandle *file_raw = nullptr;
    +      status = file_system->Open(uri.get_path(), &file_raw);
    --- End diff --
    
    My feeling is that you may want to return other meta info in the future. But it's your call.


---

[GitHub] orc pull request #134: Orc 17

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r123853204
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -34,15 +34,13 @@
     #include "common/hdfs_configuration.h"
     #include "common/configuration_loader.h"
     
    -#define BUF_SIZE 1048576 //1 MB
    -
     namespace orc {
     
       class HdfsFileInputStream : public InputStream {
    --- End diff --
    
    do you have corresponding UTs for this new class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r125729396
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -34,15 +34,13 @@
     #include "common/hdfs_configuration.h"
     #include "common/configuration_loader.h"
     
    -#define BUF_SIZE 1048576 //1 MB
    -
     namespace orc {
     
       class HdfsFileInputStream : public InputStream {
    --- End diff --
    
    Does HDFS C++ provide a mocked service interface? My concern is the testability of HdfsFileInputStream.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by AnatoliShein <gi...@git.apache.org>.
Github user AnatoliShein commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r125901512
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -66,22 +64,22 @@ namespace orc {
             options = config->GetOptions();
           }
           hdfs::IoService * io_service = hdfs::IoService::New();
    -      //Wrapping fs into a shared pointer to guarantee deletion
    -      std::shared_ptr<hdfs::FileSystem> fs(hdfs::FileSystem::New(io_service, "", options));
    -      if (!fs) {
    +      //Wrapping file_system into a unique pointer to guarantee deletion
    +      file_system = std::unique_ptr<hdfs::FileSystem>(hdfs::FileSystem::New(io_service, "", options));
    +      if (!file_system) {
    --- End diff --
    
    I changed it here, however '!fs' notation is also used in many other parts of the libhdfs++ library, like examples, tests, tools, and c bindings, so hopefully this does not create too much trouble.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #134: Orc 17

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/134#discussion_r123852932
  
    --- Diff: c++/src/OrcHdfsFile.cc ---
    @@ -121,39 +116,24 @@ namespace orc {
         }
     
         uint64_t getNaturalReadSize() const override {
    -      return BUF_SIZE;
    +      return 1048576; //1 MB
         }
     
         void read(void* buf,
                   uint64_t length,
                   uint64_t offset) override {
     
    -      ssize_t total_bytes_read = 0;
           size_t last_bytes_read = 0;
     
           if (!buf) {
             throw ParseError("Buffer is null");
           }
     
           hdfs::Status status;
    -      char input_buffer[BUF_SIZE];
    -      do{
    -        //Reading file chunks
    -        status = file->PositionRead(input_buffer, sizeof(input_buffer), offset, &last_bytes_read);
    -        if(status.ok()) {
    -          //Writing file chunks to buf
    -          if(total_bytes_read + last_bytes_read >= length){
    -            memcpy((char*) buf + total_bytes_read, input_buffer, length - total_bytes_read);
    -            break;
    -          } else {
    -            memcpy((char*) buf + total_bytes_read, input_buffer, last_bytes_read);
    -            total_bytes_read += last_bytes_read;
    -            offset += last_bytes_read;
    -          }
    -        } else {
    -          throw ParseError("Error reading the file: " + status.ToString());
    -        }
    -      } while (true);
    +      status = file->PositionRead(buf, static_cast<size_t>(length), static_cast<off_t>(offset), &last_bytes_read);
    --- End diff --
    
    you still need to put this in a loop in case last_bytes_read is less than length.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #134: Orc 17

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on the issue:

    https://github.com/apache/orc/pull/134
  
    Ok, this patch breaks the docker scripts on at least centos7, debian8, and ubuntu16 by adding new dependencies for the c++ build. I'm going through and fixing them.


---