You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by eminency <gi...@git.apache.org> on 2016/01/12 08:35:16 UTC

[GitHub] tajo pull request: TAJO-2052: Upgrading ORC reader version

GitHub user eminency opened a pull request:

    https://github.com/apache/tajo/pull/937

    TAJO-2052: Upgrading ORC reader version

    Code is refined, and it is based on presto-orc-0.132.
    
    * Base data structure is changed from Vector to Block.
    * Some bugs are fixed.
    * Compatibility with Hive is improved.
    * Unluckily, there is no speed improvement.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eminency/tajo orc_upgrade

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/937.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #937
    
----
commit 4b9b992fcdbfba0901145a6df494a12a96d0a974
Author: Jongyoung Park <em...@gmail.com>
Date:   2016-01-12T03:26:46Z

    ORC version upgraded

commit 3c562d3eea0b421604167695e8151c9f1adc7085
Author: Jongyoung Park <em...@gmail.com>
Date:   2016-01-12T05:32:12Z

    Types added

commit 696a8162056314c7975c2961917f080ed9943cf9
Author: Jongyoung Park <em...@gmail.com>
Date:   2016-01-12T07:19:34Z

    comment modified

commit 9a2b54b381f35317ad71b9f5102f48279878edc8
Author: Jongyoung Park <em...@gmail.com>
Date:   2016-01-12T07:28:13Z

    The document is refined

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2052: Upgrading ORC reader version

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/937#discussion_r51085103
  
    --- Diff: tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/thirdparty/orc/HdfsOrcDataSource.java ---
    @@ -114,9 +117,9 @@ public void readFully(long position, byte[] buffer, int bufferOffset, int buffer
           buffers.put(mergedRange, buffer);
         }
     
    -    ImmutableMap.Builder<K, Slice> slices = ImmutableMap.builder();
    +    ImmutableMap.Builder<K, FixedLengthSliceInput> slices = ImmutableMap.builder();
         for (Entry<K, DiskRange> entry : diskRanges.entrySet()) {
    --- End diff --
    
    If you use lambda expression, following codes would be simple.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2052: Upgrading ORC reader version

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on the pull request:

    https://github.com/apache/tajo/pull/937#issuecomment-175999530
  
    @blrunner 
    It's already reported issue. Refer this:
    
    https://issues.apache.org/jira/browse/TAJO-1929
    
    I will fix it after this PR because it needs ORC upgrade.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2052: Upgrading ORC reader version

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/tajo/pull/937


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2052: Upgrading ORC reader version

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/937#issuecomment-176010907
  
    Ok, I understood your comments. Could you check my trivial comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2052: Upgrading ORC reader version

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/937#issuecomment-175968378
  
    @eminency 
    
    Thank you for your contribution.
    When using MySQLStore, this PR runs successfully. But when using HiveCatalogStore, it throws NoClassDefFoundError as following:
    
    ```
    2016-01-28 13:25:53,025 ERROR org.apache.tajo.master.GlobalEngine: 
    Stack Trace:
    java.lang.NoClassDefFoundError: com/facebook/presto/hive/protobuf/CodedInputStream
    	at com.facebook.presto.orc.metadata.OrcMetadataReader.readPostScript(OrcMetadataReader.java:48)
    	at com.facebook.presto.orc.OrcReader.<init>(OrcReader.java:99)
    	at org.apache.tajo.storage.orc.ORCScanner.init(ORCScanner.java:136)
    	at org.apache.tajo.engine.planner.physical.SeqScanExec.initScanner(SeqScanExec.java:286)
    	at org.apache.tajo.engine.planner.physical.SeqScanExec.init(SeqScanExec.java:191)
    	at org.apache.tajo.engine.planner.physical.PartitionMergeScanExec.initScanExecutors(PartitionMergeScanExec.java:80)
    	at org.apache.tajo.engine.planner.physical.PartitionMergeScanExec.init(PartitionMergeScanExec.java:67)
    ```
    
    For the reference, I added the installed directory of apache hive 1.2.1 to tajo-env.sh file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2052: Upgrading ORC reader version

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on the pull request:

    https://github.com/apache/tajo/pull/937#issuecomment-176605183
  
    @blrunner 
    I fixed it. Please verify if it fits in your intention.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2052: Upgrading ORC reader version

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/937#issuecomment-177064844
  
    +1
    
    LGTM! I'll commit this PR soon. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---