You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by eminency <gi...@git.apache.org> on 2015/09/17 04:38:08 UTC

[GitHub] tajo pull request: TAJO-1682: Write ORC document

GitHub user eminency opened a pull request:

    https://github.com/apache/tajo/pull/764

    TAJO-1682: Write ORC document

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eminency/tajo TAJO-1682

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/764.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #764
    
----
commit 962dd359c3959cda5c9513567cab94511e1b800a
Author: Jongyoung Park <em...@gmail.com>
Date:   2015-07-17T03:29:58Z

    Initial ORC document

commit a8766828bb0cc359c236cad84486e7aaefc6fe4f
Author: Jongyoung Park <em...@gmail.com>
Date:   2015-07-17T03:30:56Z

    adjust title length

commit 75e3ac4b03cc398fd3794f8e2f8cc3d8d1f26833
Author: Jongyoung Park <em...@gmail.com>
Date:   2015-07-17T05:42:58Z

    file_formats.rst is modified for orc

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1682: Write ORC document

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/764#discussion_r39709431
  
    --- Diff: tajo-docs/src/main/sphinx/table_management/orc.rst ---
    @@ -0,0 +1,48 @@
    +***
    +ORC
    +***
    +
    +**ORC(Optimized Row Columnar)** is a columnar storage format from Hive. ORC improves performance for reading,
    +writing, and processing data.
    +For more details, please refer to `ORC Files <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC>`_ at Hive wiki.
    +
    +==========================
    +How to Create a ORC Table?
    +==========================
    +
    +If you are not familiar with ``CREATE TABLE`` statement, please refer to Data Definition Language :doc:`/sql_language/ddl`.
    +
    +In order to specify a certain file format for your table, you need to use the ``USING`` clause in your ``CREATE TABLE``
    +statement. Below is an example statement for creating a table using orc files.
    +
    +.. code-block:: sql
    +
    +  CREATE TABLE table1 (
    +    id int,
    +    name text,
    +    score float,
    +    type text
    +  ) USING orc;
    +
    +===================
    +Physical Properties
    +===================
    +
    +Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters.
    +The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters.
    +
    +Now, ORC file provides the following physical properties.
    +
    +* ``orc.max.merge.distance``: Reading property. When stripes are too closer and the distance is lower than this value, they are merged and read at once. Default is 1MB.
    +* ``orc.stripe.size``: Writing property. It decides size of each stripe. Default is 64MB.
    +* ``orc.compression.kind``: Writing property. The compression algorithm used to compress data. It should be one of ``none``, ``snappy``, ``zlib``. Default is ``none``.
    +* ``orc.buffer.size``: Writing property. It decides size of writing buffer. Default is 256KB.
    +* ``orc.rowindex.stride``: Writing property. Define the default ORC index stride in number of rows. (Stride is the number of rows an index entry represents.) Default is 10000.
    +
    +======================================
    +Compatibility Issues with Apache Hiveā„¢
    +======================================
    +
    +At the moment, Tajo only supports flat relational tables.
    +As a result, Tajo's ORC storage type does not support nested schemas.
    +However, we are currently working on adding support for nested schemas and non-scalar types (`TAJO-710 <https://issues.apache.org/jira/browse/TAJO-710>`_).
    --- End diff --
    
    I think that this sentence is redundant. It would be enough if tajo currently supports only flat schema.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1682: Write ORC document

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/764#discussion_r39709356
  
    --- Diff: tajo-docs/src/main/sphinx/table_management/orc.rst ---
    @@ -0,0 +1,48 @@
    +***
    +ORC
    +***
    +
    +**ORC(Optimized Row Columnar)** is a columnar storage format from Hive. ORC improves performance for reading,
    +writing, and processing data.
    +For more details, please refer to `ORC Files <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC>`_ at Hive wiki.
    +
    +==========================
    +How to Create a ORC Table?
    --- End diff --
    
    Should be changed to ```create an orc```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1682: Write ORC document

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/764#issuecomment-140980974
  
    +1, thank you for the quick update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1682: Write ORC document

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on the pull request:

    https://github.com/apache/tajo/pull/764#issuecomment-140977064
  
    Hi, @jihoonson .
    
    I applied your suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1682: Write ORC document

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/tajo/pull/764


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1682: Write ORC document

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/764#discussion_r39709603
  
    --- Diff: tajo-docs/src/main/sphinx/table_management/orc.rst ---
    @@ -0,0 +1,48 @@
    +***
    +ORC
    +***
    +
    +**ORC(Optimized Row Columnar)** is a columnar storage format from Hive. ORC improves performance for reading,
    +writing, and processing data.
    +For more details, please refer to `ORC Files <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC>`_ at Hive wiki.
    +
    +==========================
    +How to Create a ORC Table?
    +==========================
    +
    +If you are not familiar with ``CREATE TABLE`` statement, please refer to Data Definition Language :doc:`/sql_language/ddl`.
    +
    +In order to specify a certain file format for your table, you need to use the ``USING`` clause in your ``CREATE TABLE``
    +statement. Below is an example statement for creating a table using orc files.
    +
    +.. code-block:: sql
    +
    +  CREATE TABLE table1 (
    +    id int,
    +    name text,
    +    score float,
    +    type text
    +  ) USING orc;
    +
    +===================
    +Physical Properties
    +===================
    +
    +Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters.
    +The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters.
    +
    +Now, ORC file provides the following physical properties.
    +
    +* ``orc.max.merge.distance``: Reading property. When stripes are too closer and the distance is lower than this value, they are merged and read at once. Default is 1MB.
    --- End diff --
    
    ```Reading property``` and ```writing property``` look weird to me. These sentences will be enough without them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---