You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by eminency <gi...@git.apache.org> on 2015/07/24 10:02:24 UTC

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

GitHub user eminency opened a pull request:

    https://github.com/apache/tajo/pull/652

    TAJO-1465: Add ORCFileAppender to write into ORCFile table

    Hi all.
    
    This is for writer code for ORC files.
    
    Many files are added, but most of them are from Hive. See following notes :
    
    1. Core working is in ORCAppender.java and WriterImpl.java
    2. Files in org.apache.tajo.storage.orc.objectinspector are trivial. Those are for checking data type.
    3. Files in org.apache.tajo.storage.thirdparty are from Hive. WriterImpl.java is also from Hive, but it is modified to support Tajo types and Datum.
    
    Please review and advise.
    Thanks.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eminency/tajo TAJO-1465

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/652.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #652
    
----
commit aacc3b7e259c52e2f8cd078c6915bace4afc32c1
Author: Jihoon Son <ji...@apache.org>
Date:   2015-03-21T09:38:19Z

    TAJO-1439: Some method name is written wrongly. (Contributed by Jongyoung Park. Committed by jihoon)

commit e563db223ec6a00dfc9ccf386adb6995a8e65fac
Author: navis.ryu <na...@apache.org>
Date:   2015-03-12T13:15:49Z

    TAJO-1396 Unexpected IllegalMonitorStateException can be thrown in QueryInProgress
    
    Closes #416
    
    Signed-off-by: Jinho Kim <jh...@apache.org>

commit e854f830f2f9ff67a23c751cbbd7afa48d6f0500
Author: Dongjoon Hyun <do...@apache.org>
Date:   2015-03-25T03:51:41Z

    TAJO-1434: Fix supporting version of Hadoop.
    
    Closes #443
    
    Signed-off-by: Jinho Kim <jh...@apache.org>

commit a5350257d70366de1297877492111afe89336913
Author: Jongyoung Park <em...@gmail.com>
Date:   2015-03-24T10:05:24Z

    TAJO-1147: Simple query doesn't work in Web UI
    
    Signed-off-by: JaeHwa Jung <bl...@apache.org>

commit becf85b0ac83e764e431f4c30706cca0c16d836f
Author: Jongyoung Park <em...@gmail.com>
Date:   2015-03-27T09:03:53Z

    TAJO-1440: Some tests fail in parallel test environment in TestKillQuery.
    
    Closes #472
    
    Signed-off-by: Jinho Kim <jh...@apache.org>

commit 8b00e41f56eea0ef54011cfab822506d10b34c14
Author: Dongjoon Hyun <do...@apache.org>
Date:   2015-03-29T12:47:07Z

    TAJO-1437: Resolve findbug warnings on Tajo JDBC Module.
    
    Closes #447
    
    Signed-off-by: Jihoon Son <ji...@apache.org>

commit c3b7d1354a5c50cd7a6d2950b671a04b30310c26
Author: Jongyoung Park <em...@gmail.com>
Date:   2015-04-03T02:11:10Z

    TAJO-1501: Too many log message of HashShuffleAppenderManager.
    
    Signed-off-by: JaeHwa Jung <bl...@apache.org>

commit b729a49c821a874db8f432342c2c9fc83d537793
Author: DaeMyung Kang <ch...@naver.com>
Date:   2015-04-03T02:38:57Z

    TAJO-1360: VALUES_ field in OPTIONS table of catalog store should be longer.
    
    Signed-off-by: JaeHwa Jung <bl...@apache.org>

commit 8173bc1f491f45050b3611868d1d8d0099056d0c
Author: Dongjoon Hyun <do...@apache.org>
Date:   2015-04-04T10:00:31Z

    TAJO-1492: Replace CSV examples into TEXT examples in docs.
    
    Signed-off-by: Jihoon Son <ji...@apache.org>

commit cb6965977e0939e97e87efaf667be75c14d70d49
Author: YeonSu Han <ha...@gmail.com>
Date:   2015-04-05T14:27:05Z

    TAJO-1400: Add TajoStatement::setMaxRows method support.
    
    Signed-off-by: Jihoon Son <ji...@apache.org>

commit 7f7b132884d2d1dede71428fb68f44c08e3fbb05
Author: Jinho Kim <jh...@apache.org>
Date:   2015-04-07T07:13:23Z

    TAJO-1529: Implement json_extract_path_text(string, string) function.

commit e75b928fe92b00c4cf635abfc877fecd3b7e31e1
Author: navis.ryu <na...@apache.org>
Date:   2015-04-07T13:24:34Z

    TAJO-1538: TajoWorkerResourceManager.allocatedResourceMap is increasing forever.
    
    Signed-off-by: Jinho Kim <jh...@apache.org>

commit 75e985eaca8146020b2b9d79e609ba1ed041998f
Author: Jinho Kim <jh...@apache.org>
Date:   2015-04-18T05:53:05Z

    TAJO-1564: TestFetcher fails occasionally. (jinho)

commit db3bbc9efa13c76cb47bd8067926e2f7e3920a12
Author: Dongjoon Hyun <do...@apache.org>
Date:   2015-04-19T08:39:49Z

    TAJO-1567: Update old license in some pom.xml files.
    
    Signed-off-by: Jinho Kim <jh...@apache.org>

commit fbe24e89c129c7f25551ad3fcde2bae3f86d0a4d
Author: Jinho Kim <jh...@apache.org>
Date:   2015-04-19T09:41:41Z

    TAJO-1560: HashShuffle report should be ignored when a succeed tasks are not included. (jinho)

commit 4a02456d3e93a3630931066af60d17a61e28638a
Author: navis.ryu <na...@apache.org>
Date:   2015-04-20T01:34:25Z

    TAJO-1522: NPE making stage history before task scheduler is initialized.
    
    Signed-off-by: Jinho Kim <jh...@apache.org>

commit 5e1fa93b53cc4b575996a4aceaeb781567dc47d6
Author: Jinho Kim <jh...@apache.org>
Date:   2015-04-20T02:13:14Z

    TAJO-1568: Apply UnpooledByteBufAllocator when a tajo.test.enabled is set to enable.

commit 47008c58ea866a9609f56405b09f968665c66d47
Author: Jinho Kim <jh...@apache.org>
Date:   2015-04-20T05:12:09Z

    TAJO-1571: Merge TAJO-1497 and TAJO-1569 to 0.10.1. (jinho)
    
    Closes #544

commit 3722599d72c012d508cf5201cc101111affa764e
Author: navis.ryu <na...@apache.org>
Date:   2015-04-20T11:37:59Z

    TAJO-1481: Numeric conversion of Inet4 type should be considered as unsigned.
    
    Signed-off-by: Jihoon Son <ji...@apache.org>

commit 8d7988438dafd35068c52031393a9fb1c040bdf5
Author: DaeMyung Kang <ch...@naver.com>
Date:   2015-04-20T21:18:37Z

    TAJO-1419: Tsql session command doesn't work.
    
    Signed-off-by: Jihoon Son <ji...@apache.org>

commit c33b8630ff92eb6e0b123735332c82c2c262bbee
Author: Jongyoung Park <em...@gmail.com>
Date:   2015-04-24T02:13:08Z

    TAJO-1575: HBASE_HOME guidance is duplicated in tajo-env.sh.
    
    Signed-off-by: Jihoon Son <ji...@apache.org>

commit b488a5e09e19fd0400147ba530b070495186c7f4
Author: Dongjoon Hyun <do...@apache.org>
Date:   2015-04-24T02:17:00Z

    TAJO-1559: Fix data model description (tinyint, smallint).
    
    Signed-off-by: Jihoon Son <ji...@apache.org>

commit 9ca7688f648d433d0a0eaf5dedb1919b67f106d8
Author: Jongyoung Park <em...@gmail.com>
Date:   2015-04-27T05:16:54Z

    TAJO-1580: Error line number is incorrect.
    
    Signed-off-by: JaeHwa Jung <bl...@apache.org>

commit c9b5e11347c4e380a3dd97671758a841941ae75b
Author: Jinho Kim <jh...@apache.org>
Date:   2015-04-27T05:45:47Z

    TAJO-1581: Does not update last state of query stage in non-hash shuffle. (jinho)

commit 7ccd8341ab45db79202a6c35ada56533107ee2aa
Author: Dongjoon Hyun <do...@apache.org>
Date:   2015-04-27T13:35:27Z

    TAJO-1574: Fix NPE on natural join.
    
    Signed-off-by: Jihoon Son <ji...@apache.org>

commit 4db8ca055897ba712cb2bdc40594d63905a11e64
Author: navis.ryu <na...@apache.org>
Date:   2015-04-30T10:15:00Z

    TAJO-1374: Support multi-bytes delimiter for CSV file.
    
    Signed-off-by: Jinho Kim <jh...@apache.org>

commit b0ab7eafd35a51bfeddb3812bff8718446029f56
Author: navis.ryu <na...@apache.org>
Date:   2015-04-30T10:17:45Z

    TAJO-1381: Support multi-bytes delimiter for Text file.
    
    Signed-off-by: Jinho Kim <jh...@apache.org>

commit f9a531c0ff9ac4fadb5562eeb6b0b48da44cf547
Author: Jinho Kim <jh...@apache.org>
Date:   2015-05-06T03:09:39Z

    TAJO-1534: DelimitedTextFile return null instead of a NullDatum. (jinho)

commit 9d331f9af11d77f16d332029ad7debe932689cf2
Author: Yongjin Choi <su...@gmail.com>
Date:   2015-05-06T09:47:38Z

    TAJO-1556: "insert into select" with reordered column list does not work.
    
    Signed-off-by: Jihoon Son <ji...@apache.org>

commit 556498f58451c9f17c1e734390c159fdf41b88b4
Author: Jongyoung Park <em...@gmail.com>
Date:   2015-05-08T06:12:15Z

    TAJO-1576: Sometimes DefaultTajoCliOutputFormatter.parseErrorMessage() eliminates an important kind of information.
    
    Signed-off-by: Jihoon Son <ji...@apache.org>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r38281312
  
    --- Diff: tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/orc/ORCScanner.java ---
    @@ -152,11 +155,13 @@ public void init() throws IOException {
     
         orcReader = new OrcReader(orcDataSource, new OrcMetadataReader());
     
    +    TimeZone timezone = TimeZone.getTimeZone(meta.getOption(StorageConstants.TIMEZONE,
    +      TajoConstants.DEFAULT_SYSTEM_TIMEZONE));
    +
         // TODO: make OrcPredicate useful
    -    // TODO: TimeZone should be from conf
    -    // TODO: it might be splittable
    +    // presto-orc uses joda timezone, so it needs to be converted.
    --- End diff --
    
    Could you explain what is Joda timezone? It seems to be just a java timezone.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r39255689
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java ---
    @@ -333,7 +330,7 @@ public static DateDatum createDate(Datum datum) {
         case DATE:
           return (DateDatum) datum;
         default:
    -      throw new InvalidCastException(datum.type(), Type.DATE);
    +      throw new TajoRuntimeException(Errors.ResultCode.INVALID_CAST, datum.type().toString(), Type.DATE.name());
    --- End diff --
    
    It should be ```new TajoRuntimeException(new InvalidCastException(...)```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r39255382
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java ---
    @@ -107,53 +107,50 @@ public static Datum createFromString(DataType dataType, String value) {
         case INET4:
           return createInet4(value);
         default:
    -      throw new UnsupportedOperationException(dataType.toString());
    +      throw new TajoRuntimeException(Errors.ResultCode.UNSUPPORTED_DATATYPE, dataType.toString());
         }
       }
     
       public static Datum createFromBytes(DataType dataType, byte[] bytes) {
         switch (dataType.getType()) {
     
    -    case BOOLEAN:
    -      return createBool(bytes[0]);
    -    case INT2:
    -      return createInt2(NumberUtil.toShort(bytes));
    -    case INT4:
    -      return createInt4(NumberUtil.toInt(bytes));
    -    case INT8:
    -      return createInt8(NumberUtil.toLong(bytes));
    -    case FLOAT4:
    -      return createFloat4(NumberUtil.toFloat(bytes));
    -    case FLOAT8:
    -      return createFloat8(NumberUtil.toDouble(bytes));
    -    case CHAR:
    -      return createChar(bytes);
    -    case TEXT:
    -      return createText(bytes);
    -    case DATE:
    -      return new DateDatum(NumberUtil.toInt(bytes));
    -    case TIME:
    -      return new TimeDatum(NumberUtil.toLong(bytes));
    -    case TIMESTAMP:
    -      return new TimestampDatum(NumberUtil.toLong(bytes));
    -    case BIT:
    -      return createBit(bytes[0]);
    -    case BLOB:
    -      return createBlob(bytes);
    -    case INET4:
    -      return createInet4(bytes);
    -    case PROTOBUF:
    -      ProtobufDatumFactory factory = ProtobufDatumFactory.get(dataType);
    -      Message.Builder builder = factory.newBuilder();
    -      try {
    -        builder.mergeFrom(bytes);
    -        return factory.createDatum(builder.build());
    -      } catch (IOException e) {
    -        e.printStackTrace();
    -        throw new RuntimeException(e);
    -      }
    -    default:
    -        throw new UnsupportedOperationException(dataType.toString());
    +      case BOOLEAN:
    --- End diff --
    
    SWITCH and CASE should be in the same column position.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/652#issuecomment-125183236
  
    The patch looks good to me. I leave one comment about timezone.
    
    Also, this patch borrows a part of orc implementation. So, we need to continue upgrading the borrowed orc implementation. Could you add some comments to point out borrowed ones or our own implementation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r39255608
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java ---
    @@ -177,7 +174,7 @@ public static Datum createFromInt8(DataType type, long val) {
         case TIME:
           return createTime(val); 
         default:
    -      throw new UnsupportedOperationException("Cannot create " + type.getType().name() + " datum from INT8");
    +      throw new TajoRuntimeException(Errors.ResultCode.UNSUPPORTED_DATATYPE, type.getType().name());
    --- End diff --
    
    It should be ```new TajoRuntimeException(new UnsupportedDataType(...)```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on the pull request:

    https://github.com/apache/tajo/pull/652#issuecomment-136634809
  
    I refined throwing exception code and added a list for modified files in package-info.java.
    Please check them out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r38386740
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java ---
    @@ -114,45 +114,42 @@ public static Datum createFromString(DataType dataType, String value) {
       public static Datum createFromBytes(DataType dataType, byte[] bytes) {
         switch (dataType.getType()) {
     
    -    case BOOLEAN:
    -      return createBool(bytes[0]);
    -    case INT2:
    -      return createInt2(NumberUtil.toShort(bytes));
    -    case INT4:
    -      return createInt4(NumberUtil.toInt(bytes));
    -    case INT8:
    -      return createInt8(NumberUtil.toLong(bytes));
    -    case FLOAT4:
    -      return createFloat4(NumberUtil.toFloat(bytes));
    -    case FLOAT8:
    -      return createFloat8(NumberUtil.toDouble(bytes));
    -    case CHAR:
    -      return createChar(bytes);
    -    case TEXT:
    -      return createText(bytes);
    -    case DATE:
    -      return new DateDatum(NumberUtil.toInt(bytes));
    -    case TIME:
    -      return new TimeDatum(NumberUtil.toLong(bytes));
    -    case TIMESTAMP:
    -      return new TimestampDatum(NumberUtil.toLong(bytes));
    -    case BIT:
    -      return createBit(bytes[0]);
    -    case BLOB:
    -      return createBlob(bytes);
    -    case INET4:
    -      return createInet4(bytes);
    -    case PROTOBUF:
    -      ProtobufDatumFactory factory = ProtobufDatumFactory.get(dataType);
    -      Message.Builder builder = factory.newBuilder();
    -      try {
    -        builder.mergeFrom(bytes);
    -        return factory.createDatum(builder.build());
    -      } catch (IOException e) {
    -        e.printStackTrace();
    -        throw new RuntimeException(e);
    -      }
    -    default:
    +      case BOOLEAN:
    +        return createBool(bytes[0]);
    +      case INT2:
    +        return createInt2(NumberUtil.toShort(bytes));
    +      case INT4:
    +        return createInt4(NumberUtil.toInt(bytes));
    +      case INT8:
    +        return createInt8(NumberUtil.toLong(bytes));
    +      case FLOAT4:
    +        return createFloat4(NumberUtil.toFloat(bytes));
    +      case FLOAT8:
    +        return createFloat8(NumberUtil.toDouble(bytes));
    +      case CHAR:
    +        return createChar(bytes);
    +      case TEXT:
    +        return createText(bytes);
    +      case DATE:
    +        return new DateDatum(NumberUtil.toInt(bytes));
    +      case TIME:
    +        return new TimeDatum(NumberUtil.toLong(bytes));
    +      case TIMESTAMP:
    +        return new TimestampDatum(NumberUtil.toLong(bytes));
    +      case BIT:
    +        return createBit(bytes[0]);
    +      case BLOB:
    +        return createBlob(bytes);
    +      case INET4:
    +        return createInet4(bytes);
    +      case PROTOBUF:
    +        try {
    +          return ProtobufDatumFactory.createDatum(dataType, bytes);
    +        } catch (IOException e) {
    +          e.printStackTrace();
    +          throw new RuntimeException(e);
    +        }
    +      default:
             throw new UnsupportedOperationException(dataType.toString());
    --- End diff --
    
    I see. I'm working on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on the pull request:

    https://github.com/apache/tajo/pull/652#issuecomment-140944200
  
    Thank you very much for your working.
    I close this issue at final.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/652#issuecomment-136239691
  
    It borrows many codes from ORC. It looks the best now. But, we need to make maintenance easy. How do you think separating the borrowed code into a separate maven module while keeping the same java package? It makes us to easily identify the copied and modified ones.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r39255600
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java ---
    @@ -164,7 +161,7 @@ public static Datum createFromInt4(DataType type, int val) {
         case DATE:
           return new DateDatum(val);
         default:
    -      throw new UnsupportedOperationException("Cannot create " + type.getType().name() + " datum from INT4");
    +      throw new TajoRuntimeException(Errors.ResultCode.UNSUPPORTED_DATATYPE, type.getType().name());
    --- End diff --
    
    It should be ```new TajoRuntimeException(new UnsupportedDataType(...)```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r39339177
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java ---
    @@ -333,7 +330,7 @@ public static DateDatum createDate(Datum datum) {
         case DATE:
           return (DateDatum) datum;
         default:
    -      throw new InvalidCastException(datum.type(), Type.DATE);
    +      throw new TajoRuntimeException(Errors.ResultCode.INVALID_CAST, datum.type().toString(), Type.DATE.name());
    --- End diff --
    
    @hyunsik 
    InvalidCastException is not subclass of TajoException, thus it can be used.
    Should it be changed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r39255553
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java ---
    @@ -107,53 +107,50 @@ public static Datum createFromString(DataType dataType, String value) {
         case INET4:
           return createInet4(value);
         default:
    -      throw new UnsupportedOperationException(dataType.toString());
    +      throw new TajoRuntimeException(Errors.ResultCode.UNSUPPORTED_DATATYPE, dataType.toString());
         }
       }
     
       public static Datum createFromBytes(DataType dataType, byte[] bytes) {
         switch (dataType.getType()) {
     
    -    case BOOLEAN:
    -      return createBool(bytes[0]);
    -    case INT2:
    -      return createInt2(NumberUtil.toShort(bytes));
    -    case INT4:
    -      return createInt4(NumberUtil.toInt(bytes));
    -    case INT8:
    -      return createInt8(NumberUtil.toLong(bytes));
    -    case FLOAT4:
    -      return createFloat4(NumberUtil.toFloat(bytes));
    -    case FLOAT8:
    -      return createFloat8(NumberUtil.toDouble(bytes));
    -    case CHAR:
    -      return createChar(bytes);
    -    case TEXT:
    -      return createText(bytes);
    -    case DATE:
    -      return new DateDatum(NumberUtil.toInt(bytes));
    -    case TIME:
    -      return new TimeDatum(NumberUtil.toLong(bytes));
    -    case TIMESTAMP:
    -      return new TimestampDatum(NumberUtil.toLong(bytes));
    -    case BIT:
    -      return createBit(bytes[0]);
    -    case BLOB:
    -      return createBlob(bytes);
    -    case INET4:
    -      return createInet4(bytes);
    -    case PROTOBUF:
    -      ProtobufDatumFactory factory = ProtobufDatumFactory.get(dataType);
    -      Message.Builder builder = factory.newBuilder();
    -      try {
    -        builder.mergeFrom(bytes);
    -        return factory.createDatum(builder.build());
    -      } catch (IOException e) {
    -        e.printStackTrace();
    -        throw new RuntimeException(e);
    -      }
    -    default:
    -        throw new UnsupportedOperationException(dataType.toString());
    +      case BOOLEAN:
    +        return createBool(bytes[0]);
    +      case INT2:
    +        return createInt2(NumberUtil.toShort(bytes));
    +      case INT4:
    +        return createInt4(NumberUtil.toInt(bytes));
    +      case INT8:
    +        return createInt8(NumberUtil.toLong(bytes));
    +      case FLOAT4:
    +        return createFloat4(NumberUtil.toFloat(bytes));
    +      case FLOAT8:
    +        return createFloat8(NumberUtil.toDouble(bytes));
    +      case CHAR:
    +        return createChar(bytes);
    +      case TEXT:
    +        return createText(bytes);
    +      case DATE:
    +        return new DateDatum(NumberUtil.toInt(bytes));
    +      case TIME:
    +        return new TimeDatum(NumberUtil.toLong(bytes));
    +      case TIMESTAMP:
    +        return new TimestampDatum(NumberUtil.toLong(bytes));
    +      case BIT:
    +        return createBit(bytes[0]);
    +      case BLOB:
    +        return createBlob(bytes);
    +      case INET4:
    +        return createInet4(bytes);
    +      case PROTOBUF:
    +        try {
    +          return ProtobufDatumFactory.createDatum(dataType, bytes);
    +        } catch (IOException e) {
    +          e.printStackTrace();
    --- End diff --
    
    Don't use e.printStackTrace. Exceptions will print the proper log message in some place. This exception is not normal exception. So, it should be ```throw new TajoInternalError(e)```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r39255905
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java ---
    @@ -352,7 +349,7 @@ public static TimeDatum createTime(Datum datum, @Nullable TimeZone tz) {
         case TIME:
           return (TimeDatum) datum;
         default:
    -      throw new InvalidCastException(datum.type(), Type.TIME);
    +      throw new TajoRuntimeException(Errors.ResultCode.INVALID_CAST, datum.type().name(), Type.TIME.name());
    --- End diff --
    
    It should be ```new TajoRuntimeException(new InvalidCastException(...)```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r39270206
  
    --- Diff: tajo-storage/tajo-storage-hdfs/src/test/java/org/apache/tajo/storage/orc/TestOrc.java ---
    @@ -39,18 +47,29 @@
     
     import java.io.IOException;
     import java.net.URL;
    +import java.util.List;
     
    -public class TestORCScanner {
    +public class TestOrc {
    --- End diff --
    
    Does this class tests more things that TestStorages cannot cover?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/652#issuecomment-125929650
  
    It seems to need the query tests with timezone. Actually, I'm not able to ensure that this change and ORC file guarantee the correct timezone support like Text file. It can be proved from some unit tests. Could you add unit tests for ORC? You can see 5 types of tests in TestSelectQuery class.  Right now, you can add the tests into TestSelectQuery. Those timezone tests should be moved into the better place later. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/652#issuecomment-138199280
  
    Thank. Could you rebase the patch against the latest revision?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on the pull request:

    https://github.com/apache/tajo/pull/652#issuecomment-132961895
  
    Hi, @hyunsik.
    
    I modified orc timezone code and added an corresponding unit test similar to existing testTimezonedTable() tests.
    
    Please review it and leave a comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r39255139
  
    --- Diff: tajo-cluster-tests/src/test/java/org/apache/tajo/QueryTestCaseBase.java ---
    @@ -913,7 +913,8 @@ private Path getDataSetFile(String fileName) throws IOException {
               throw new IOException("Cannot find " + fileName + " at " + currentQueryPath + " and " + namedQueryPath);
             }
           } else {
    -        throw new IOException("Cannot find " + fileName + " at " + currentQueryPath + " and " + namedQueryPath);
    +        // to make empty table (used to insert test)
    --- End diff --
    
    This code will make a directory in a resource directory. It seems to be not good. Could you remove this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r35528682
  
    --- Diff: tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/thirdparty/orc/WriterImpl.java ---
    @@ -0,0 +1,2251 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.tajo.storage.thirdparty.orc;
    +
    +import com.google.common.annotations.VisibleForTesting;
    +import com.google.common.base.Joiner;
    +import com.google.common.collect.Lists;
    +import com.google.common.primitives.Longs;
    +import com.google.protobuf.ByteString;
    +import com.google.protobuf.CodedOutputStream;
    +import org.apache.commons.logging.Log;
    +import org.apache.commons.logging.LogFactory;
    +import org.apache.hadoop.conf.Configuration;
    +import org.apache.hadoop.fs.FSDataOutputStream;
    +import org.apache.hadoop.fs.FileSystem;
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.hive.ql.io.IOConstants;
    +import org.apache.hadoop.hive.serde2.io.DateWritable;
    +import org.apache.hadoop.hive.shims.ShimLoader;
    +import org.apache.tajo.datum.*;
    +import org.apache.tajo.storage.Tuple;
    +import org.apache.tajo.storage.thirdparty.orc.CompressionCodec.Modifier;
    +import org.apache.tajo.storage.thirdparty.orc.OrcProto.RowIndexEntry;
    +import org.apache.tajo.storage.thirdparty.orc.OrcProto.StripeStatistics;
    +import org.apache.tajo.storage.thirdparty.orc.OrcProto.Type;
    +import org.apache.tajo.storage.thirdparty.orc.OrcProto.UserMetadataItem;
    +import org.apache.hadoop.hive.ql.util.JavaDataModel;
    +import org.apache.hadoop.hive.serde2.objectinspector.*;
    +import org.apache.hadoop.hive.serde2.objectinspector.primitive.*;
    +import org.apache.hadoop.hive.serde2.typeinfo.CharTypeInfo;
    +import org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo;
    +import org.apache.hadoop.hive.serde2.typeinfo.VarcharTypeInfo;
    +import org.apache.hadoop.io.Text;
    +import org.apache.tajo.util.datetime.DateTimeUtil;
    +
    +import java.io.IOException;
    +import java.io.OutputStream;
    +import java.lang.management.ManagementFactory;
    +import java.nio.ByteBuffer;
    +import java.sql.Timestamp;
    +import java.util.*;
    +
    +import static com.google.common.base.Preconditions.checkArgument;
    +
    +/**
    + * An ORC file writer. The file is divided into stripes, which is the natural
    + * unit of work when reading. Each stripe is buffered in memory until the
    + * memory reaches the stripe size and then it is written out broken down by
    + * columns. Each column is written by a TreeWriter that is specific to that
    + * type of column. TreeWriters may have children TreeWriters that handle the
    + * sub-types. Each of the TreeWriters writes the column's data as a set of
    + * streams.
    + *
    + * This class is unsynchronized like most Stream objects, so from the creation of an OrcFile and all
    + * access to a single instance has to be from a single thread.
    + *
    + * There are no known cases where these happen between different threads today.
    + *
    + * Caveat: the MemoryManager is created during WriterOptions create, that has to be confined to a single
    + * thread as well.
    + *
    + */
    +public class WriterImpl implements Writer, MemoryManager.Callback {
    +
    +  private static final Log LOG = LogFactory.getLog(WriterImpl.class);
    +
    +  private static final int HDFS_BUFFER_SIZE = 256 * 1024;
    +  private static final int MIN_ROW_INDEX_STRIDE = 1000;
    +
    +  // threshold above which buffer size will be automatically resized
    +  private static final int COLUMN_COUNT_THRESHOLD = 1000;
    +
    +  private final FileSystem fs;
    +  private final Path path;
    +  private final long defaultStripeSize;
    +  private long adjustedStripeSize;
    +  private final int rowIndexStride;
    +  private final CompressionKind compress;
    +  private final CompressionCodec codec;
    +  private final boolean addBlockPadding;
    +  private final int bufferSize;
    +  private final long blockSize;
    +  private final float paddingTolerance;
    +  // the streams that make up the current stripe
    +  private final Map<StreamName, BufferedStream> streams =
    +    new TreeMap<StreamName, BufferedStream>();
    +
    +  private FSDataOutputStream rawWriter = null;
    +  // the compressed metadata information outStream
    +  private OutStream writer = null;
    +  // a protobuf outStream around streamFactory
    +  private CodedOutputStream protobufWriter = null;
    +  private long headerLength;
    +  private int columnCount;
    +  private long rowCount = 0;
    +  private long rowsInStripe = 0;
    +  private long rawDataSize = 0;
    +  private int rowsInIndex = 0;
    +  private int stripesAtLastFlush = -1;
    +  private final List<OrcProto.StripeInformation> stripes =
    +    new ArrayList<OrcProto.StripeInformation>();
    +  private final Map<String, ByteString> userMetadata =
    +    new TreeMap<String, ByteString>();
    +  private final TreeWriter treeWriter;
    +  private final boolean buildIndex;
    +  private final MemoryManager memoryManager;
    +  private final OrcFile.Version version;
    +  private final Configuration conf;
    +  private final OrcFile.WriterCallback callback;
    +  private final OrcFile.WriterContext callbackContext;
    +  private final OrcFile.EncodingStrategy encodingStrategy;
    +  private final OrcFile.CompressionStrategy compressionStrategy;
    +  private final boolean[] bloomFilterColumns;
    +  private final double bloomFilterFpp;
    +  private boolean writeTimeZone;
    +
    +  WriterImpl(FileSystem fs,
    +      Path path,
    +      Configuration conf,
    +      ObjectInspector inspector,
    +      long stripeSize,
    +      CompressionKind compress,
    +      int bufferSize,
    +      int rowIndexStride,
    +      MemoryManager memoryManager,
    +      boolean addBlockPadding,
    +      OrcFile.Version version,
    +      OrcFile.WriterCallback callback,
    +      OrcFile.EncodingStrategy encodingStrategy,
    +      OrcFile.CompressionStrategy compressionStrategy,
    +      float paddingTolerance,
    +      long blockSizeValue,
    +      String bloomFilterColumnNames,
    +      double bloomFilterFpp) throws IOException {
    +    this.fs = fs;
    +    this.path = path;
    +    this.conf = conf;
    +    this.callback = callback;
    +    if (callback != null) {
    +      callbackContext = new OrcFile.WriterContext(){
    +
    +        @Override
    +        public Writer getWriter() {
    +          return WriterImpl.this;
    +        }
    +      };
    +    } else {
    +      callbackContext = null;
    +    }
    +    this.adjustedStripeSize = stripeSize;
    +    this.defaultStripeSize = stripeSize;
    +    this.version = version;
    +    this.encodingStrategy = encodingStrategy;
    +    this.compressionStrategy = compressionStrategy;
    +    this.addBlockPadding = addBlockPadding;
    +    this.blockSize = blockSizeValue;
    +    this.paddingTolerance = paddingTolerance;
    +    this.compress = compress;
    +    this.rowIndexStride = rowIndexStride;
    +    this.memoryManager = memoryManager;
    +    buildIndex = rowIndexStride > 0;
    +    codec = createCodec(compress);
    +    String allColumns = conf.get(IOConstants.COLUMNS);
    +    if (allColumns == null) {
    +      allColumns = getColumnNamesFromInspector(inspector);
    +    }
    +    this.bufferSize = getEstimatedBufferSize(allColumns, bufferSize);
    +    if (version == OrcFile.Version.V_0_11) {
    +      /* do not write bloom filters for ORC v11 */
    +      this.bloomFilterColumns =
    +          OrcUtils.includeColumns(null, allColumns, inspector);
    +    } else {
    +      this.bloomFilterColumns =
    +          OrcUtils.includeColumns(bloomFilterColumnNames, allColumns, inspector);
    +    }
    +    this.bloomFilterFpp = bloomFilterFpp;
    +    treeWriter = createTreeWriter(inspector, new StreamFactory(), false);
    +    if (buildIndex && rowIndexStride < MIN_ROW_INDEX_STRIDE) {
    +      throw new IllegalArgumentException("Row stride must be at least " +
    +          MIN_ROW_INDEX_STRIDE);
    +    }
    +
    +    // ensure that we are able to handle callbacks before we register ourselves
    +    memoryManager.addWriter(path, stripeSize, this);
    +  }
    +
    +  private String getColumnNamesFromInspector(ObjectInspector inspector) {
    +    List<String> fieldNames = Lists.newArrayList();
    +    Joiner joiner = Joiner.on(",");
    +    if (inspector instanceof StructObjectInspector) {
    +      StructObjectInspector soi = (StructObjectInspector) inspector;
    +      List<? extends StructField> fields = soi.getAllStructFieldRefs();
    +      for(StructField sf : fields) {
    +        fieldNames.add(sf.getFieldName());
    +      }
    +    }
    +    return joiner.join(fieldNames);
    +  }
    +
    +  @VisibleForTesting
    +  int getEstimatedBufferSize(int bs) {
    +      return getEstimatedBufferSize(conf.get(IOConstants.COLUMNS), bs);
    +  }
    +
    +  int getEstimatedBufferSize(String colNames, int bs) {
    +    long availableMem = getMemoryAvailableForORC();
    +    if (colNames != null) {
    +      final int numCols = colNames.split(",").length;
    +      if (numCols > COLUMN_COUNT_THRESHOLD) {
    +        // In BufferedStream, there are 3 outstream buffers (compressed,
    +        // uncompressed and overflow) and list of previously compressed buffers.
    +        // Since overflow buffer is rarely used, lets consider only 2 allocation.
    +        // Also, initially, the list of compression buffers will be empty.
    +        final int outStreamBuffers = codec == null ? 1 : 2;
    +
    +        // max possible streams per column is 5. For string columns, there is
    +        // ROW_INDEX, PRESENT, DATA, LENGTH, DICTIONARY_DATA streams.
    +        final int maxStreams = 5;
    +
    +        // Lets assume 10% memory for holding dictionary in memory and other
    +        // object allocations
    +        final long miscAllocation = (long) (0.1f * availableMem);
    +
    +        // compute the available memory
    +        final long remainingMem = availableMem - miscAllocation;
    +
    +        int estBufferSize = (int) (remainingMem /
    +            (maxStreams * outStreamBuffers * numCols));
    +        estBufferSize = getClosestBufferSize(estBufferSize, bs);
    +        if (estBufferSize > bs) {
    +          estBufferSize = bs;
    +        }
    +
    +        LOG.info("WIDE TABLE - Number of columns: " + numCols +
    +            " Chosen compression buffer size: " + estBufferSize);
    +        return estBufferSize;
    +      }
    +    }
    +    return bs;
    +  }
    +
    +  private int getClosestBufferSize(int estBufferSize, int bs) {
    +    final int kb4 = 4 * 1024;
    +    final int kb8 = 8 * 1024;
    +    final int kb16 = 16 * 1024;
    +    final int kb32 = 32 * 1024;
    +    final int kb64 = 64 * 1024;
    +    final int kb128 = 128 * 1024;
    +    final int kb256 = 256 * 1024;
    +    if (estBufferSize <= kb4) {
    +      return kb4;
    +    } else if (estBufferSize > kb4 && estBufferSize <= kb8) {
    +      return kb8;
    +    } else if (estBufferSize > kb8 && estBufferSize <= kb16) {
    +      return kb16;
    +    } else if (estBufferSize > kb16 && estBufferSize <= kb32) {
    +      return kb32;
    +    } else if (estBufferSize > kb32 && estBufferSize <= kb64) {
    +      return kb64;
    +    } else if (estBufferSize > kb64 && estBufferSize <= kb128) {
    +      return kb128;
    +    } else {
    +      return kb256;
    +    }
    +  }
    +
    +  // the assumption is only one ORC writer open at a time, which holds true for
    +  // most of the cases. HIVE-6455 forces single writer case.
    +  private long getMemoryAvailableForORC() {
    +    OrcConf.ConfVars poolVar = OrcConf.ConfVars.HIVE_ORC_FILE_MEMORY_POOL;
    +    double maxLoad = conf.getFloat(poolVar.varname, poolVar.defaultFloatVal);
    +    long totalMemoryPool = Math.round(ManagementFactory.getMemoryMXBean().
    +        getHeapMemoryUsage().getMax() * maxLoad);
    +    return totalMemoryPool;
    +  }
    +
    +  public static CompressionCodec createCodec(CompressionKind kind) {
    +    switch (kind) {
    +      case NONE:
    +        return null;
    +      case ZLIB:
    +        return new ZlibCodec();
    +      case SNAPPY:
    +        return new SnappyCodec();
    +      case LZO:
    +        try {
    +          Class<? extends CompressionCodec> lzo =
    +              (Class<? extends CompressionCodec>)
    +                  Class.forName("org.apache.hadoop.hive.ql.io.orc.LzoCodec");
    +          return lzo.newInstance();
    +        } catch (ClassNotFoundException e) {
    +          throw new IllegalArgumentException("LZO is not available.", e);
    +        } catch (InstantiationException e) {
    +          throw new IllegalArgumentException("Problem initializing LZO", e);
    +        } catch (IllegalAccessException e) {
    +          throw new IllegalArgumentException("Insufficient access to LZO", e);
    +        }
    +      default:
    +        throw new IllegalArgumentException("Unknown compression codec: " +
    +            kind);
    +    }
    +  }
    +
    +  @Override
    +  public boolean checkMemory(double newScale) throws IOException {
    +    long limit = (long) Math.round(adjustedStripeSize * newScale);
    +    long size = estimateStripeSize();
    +    if (LOG.isDebugEnabled()) {
    +      LOG.debug("ORC writer " + path + " size = " + size + " limit = " +
    +                limit);
    +    }
    +    if (size > limit) {
    +      flushStripe();
    +      return true;
    +    }
    +    return false;
    +  }
    +
    +  /**
    +   * This class is used to hold the contents of streams as they are buffered.
    +   * The TreeWriters write to the outStream and the codec compresses the
    +   * data as buffers fill up and stores them in the output list. When the
    +   * stripe is being written, the whole stream is written to the file.
    +   */
    +  private class BufferedStream implements OutStream.OutputReceiver {
    +    private final OutStream outStream;
    +    private final List<ByteBuffer> output = new ArrayList<ByteBuffer>();
    +
    +    BufferedStream(String name, int bufferSize,
    +                   CompressionCodec codec) throws IOException {
    +      outStream = new OutStream(name, bufferSize, codec, this);
    +    }
    +
    +    /**
    +     * Receive a buffer from the compression codec.
    +     * @param buffer the buffer to save
    +     * @throws IOException
    +     */
    +    @Override
    +    public void output(ByteBuffer buffer) {
    +      output.add(buffer);
    +    }
    +
    +    /**
    +     * Get the number of bytes in buffers that are allocated to this stream.
    +     * @return number of bytes in buffers
    +     */
    +    public long getBufferSize() {
    +      long result = 0;
    +      for(ByteBuffer buf: output) {
    +        result += buf.capacity();
    +      }
    +      return outStream.getBufferSize() + result;
    +    }
    +
    +    /**
    +     * Flush the stream to the codec.
    +     * @throws IOException
    +     */
    +    public void flush() throws IOException {
    +      outStream.flush();
    +    }
    +
    +    /**
    +     * Clear all of the buffers.
    +     * @throws IOException
    +     */
    +    public void clear() throws IOException {
    +      outStream.clear();
    +      output.clear();
    +    }
    +
    +    /**
    +     * Check the state of suppress flag in output stream
    +     * @return value of suppress flag
    +     */
    +    public boolean isSuppressed() {
    +      return outStream.isSuppressed();
    +    }
    +
    +    /**
    +     * Get the number of bytes that will be written to the output. Assumes
    +     * the stream has already been flushed.
    +     * @return the number of bytes
    +     */
    +    public long getOutputSize() {
    +      long result = 0;
    +      for(ByteBuffer buffer: output) {
    +        result += buffer.remaining();
    +      }
    +      return result;
    +    }
    +
    +    /**
    +     * Write the saved compressed buffers to the OutputStream.
    +     * @param out the stream to write to
    +     * @throws IOException
    +     */
    +    void spillTo(OutputStream out) throws IOException {
    +      for(ByteBuffer buffer: output) {
    +        out.write(buffer.array(), buffer.arrayOffset() + buffer.position(),
    +          buffer.remaining());
    +      }
    +    }
    +
    +    @Override
    +    public String toString() {
    +      return outStream.toString();
    +    }
    +  }
    +
    +  /**
    +   * An output receiver that writes the ByteBuffers to the output stream
    +   * as they are received.
    +   */
    +  private class DirectStream implements OutStream.OutputReceiver {
    +    private final FSDataOutputStream output;
    +
    +    DirectStream(FSDataOutputStream output) {
    +      this.output = output;
    +    }
    +
    +    @Override
    +    public void output(ByteBuffer buffer) throws IOException {
    +      output.write(buffer.array(), buffer.arrayOffset() + buffer.position(),
    +        buffer.remaining());
    +    }
    +  }
    +
    +  private static class RowIndexPositionRecorder implements PositionRecorder {
    +    private final OrcProto.RowIndexEntry.Builder builder;
    +
    +    RowIndexPositionRecorder(OrcProto.RowIndexEntry.Builder builder) {
    +      this.builder = builder;
    +    }
    +
    +    @Override
    +    public void addPosition(long position) {
    +      builder.addPositions(position);
    +    }
    +  }
    +
    +  /**
    +   * Interface from the Writer to the TreeWriters. This limits the visibility
    +   * that the TreeWriters have into the Writer.
    +   */
    +  private class StreamFactory {
    +    /**
    +     * Create a stream to store part of a column.
    +     * @param column the column id for the stream
    +     * @param kind the kind of stream
    +     * @return The output outStream that the section needs to be written to.
    +     * @throws IOException
    +     */
    +    public OutStream createStream(int column,
    +                                  OrcProto.Stream.Kind kind
    +                                  ) throws IOException {
    +      final StreamName name = new StreamName(column, kind);
    +      final EnumSet<CompressionCodec.Modifier> modifiers;
    +
    +      switch (kind) {
    +        case BLOOM_FILTER:
    +        case DATA:
    +        case DICTIONARY_DATA:
    +          if (getCompressionStrategy() == OrcFile.CompressionStrategy.SPEED) {
    +            modifiers = EnumSet.of(Modifier.FAST, Modifier.TEXT);
    +          } else {
    +            modifiers = EnumSet.of(Modifier.DEFAULT, Modifier.TEXT);
    +          }
    +          break;
    +        case LENGTH:
    +        case DICTIONARY_COUNT:
    +        case PRESENT:
    +        case ROW_INDEX:
    +        case SECONDARY:
    +          // easily compressed using the fastest modes
    +          modifiers = EnumSet.of(Modifier.FASTEST, Modifier.BINARY);
    +          break;
    +        default:
    +          LOG.warn("Missing ORC compression modifiers for " + kind);
    +          modifiers = null;
    +          break;
    +      }
    +
    +      BufferedStream result = streams.get(name);
    +      if (result == null) {
    +        result = new BufferedStream(name.toString(), bufferSize,
    +            codec == null ? codec : codec.modify(modifiers));
    +        streams.put(name, result);
    +      }
    +      return result.outStream;
    +    }
    +
    +    /**
    +     * Get the next column id.
    +     * @return a number from 0 to the number of columns - 1
    +     */
    +    public int getNextColumnId() {
    +      return columnCount++;
    +    }
    +
    +    /**
    +     * Get the current column id. After creating all tree writers this count should tell how many
    +     * columns (including columns within nested complex objects) are created in total.
    +     * @return current column id
    +     */
    +    public int getCurrentColumnId() {
    +      return columnCount;
    +    }
    +
    +    /**
    +     * Get the stride rate of the row index.
    +     */
    +    public int getRowIndexStride() {
    +      return rowIndexStride;
    +    }
    +
    +    /**
    +     * Should be building the row index.
    +     * @return true if we are building the index
    +     */
    +    public boolean buildIndex() {
    +      return buildIndex;
    +    }
    +
    +    /**
    +     * Is the ORC file compressed?
    +     * @return are the streams compressed
    +     */
    +    public boolean isCompressed() {
    +      return codec != null;
    +    }
    +
    +    /**
    +     * Get the encoding strategy to use.
    +     * @return encoding strategy
    +     */
    +    public OrcFile.EncodingStrategy getEncodingStrategy() {
    +      return encodingStrategy;
    +    }
    +
    +    /**
    +     * Get the compression strategy to use.
    +     * @return compression strategy
    +     */
    +    public OrcFile.CompressionStrategy getCompressionStrategy() {
    +      return compressionStrategy;
    +    }
    +
    +    /**
    +     * Get the bloom filter columns
    +     * @return bloom filter columns
    +     */
    +    public boolean[] getBloomFilterColumns() {
    +      return bloomFilterColumns;
    +    }
    +
    +    /**
    +     * Get bloom filter false positive percentage.
    +     * @return fpp
    +     */
    +    public double getBloomFilterFPP() {
    +      return bloomFilterFpp;
    +    }
    +
    +    /**
    +     * Get the writer's configuration.
    +     * @return configuration
    +     */
    +    public Configuration getConfiguration() {
    +      return conf;
    +    }
    +
    +    /**
    +     * Get the version of the file to write.
    +     */
    +    public OrcFile.Version getVersion() {
    +      return version;
    +    }
    +
    +    public void useWriterTimeZone(boolean val) {
    +      writeTimeZone = val;
    +    }
    +
    +    public boolean hasWriterTimeZone() {
    +      return writeTimeZone;
    +    }
    +  }
    +
    +  /**
    +   * The parent class of all of the writers for each column. Each column
    +   * is written by an instance of this class. The compound types (struct,
    +   * list, map, and union) have children tree writers that write the children
    +   * types.
    +   */
    +  private abstract static class TreeWriter {
    +    protected final int id;
    +    protected final ObjectInspector inspector;
    +    private final BitFieldWriter isPresent;
    +    private final boolean isCompressed;
    +    protected final ColumnStatisticsImpl indexStatistics;
    +    protected final ColumnStatisticsImpl stripeColStatistics;
    +    private final ColumnStatisticsImpl fileStatistics;
    +    protected TreeWriter[] childrenWriters;
    +    protected final RowIndexPositionRecorder rowIndexPosition;
    +    private final OrcProto.RowIndex.Builder rowIndex;
    +    private final OrcProto.RowIndexEntry.Builder rowIndexEntry;
    +    private final PositionedOutputStream rowIndexStream;
    +    private final PositionedOutputStream bloomFilterStream;
    +    protected final BloomFilterIO bloomFilter;
    +    protected final boolean createBloomFilter;
    +    private final OrcProto.BloomFilterIndex.Builder bloomFilterIndex;
    +    private final OrcProto.BloomFilter.Builder bloomFilterEntry;
    +    private boolean foundNulls;
    +    private OutStream isPresentOutStream;
    +    private final List<StripeStatistics.Builder> stripeStatsBuilders;
    +    private final StreamFactory streamFactory;
    +
    +    /**
    +     * Create a tree writer.
    +     * @param columnId the column id of the column to write
    +     * @param inspector the object inspector to use
    +     * @param streamFactory limited access to the Writer's data.
    +     * @param nullable can the value be null?
    +     * @throws IOException
    +     */
    +    TreeWriter(int columnId, ObjectInspector inspector,
    +               StreamFactory streamFactory,
    +               boolean nullable) throws IOException {
    +      this.streamFactory = streamFactory;
    +      this.isCompressed = streamFactory.isCompressed();
    +      this.id = columnId;
    +      this.inspector = inspector;
    +      if (nullable) {
    +        isPresentOutStream = streamFactory.createStream(id,
    +            OrcProto.Stream.Kind.PRESENT);
    +        isPresent = new BitFieldWriter(isPresentOutStream, 1);
    +      } else {
    +        isPresent = null;
    +      }
    +      this.foundNulls = false;
    +      createBloomFilter = streamFactory.getBloomFilterColumns()[columnId];
    +      indexStatistics = ColumnStatisticsImpl.create(inspector);
    +      stripeColStatistics = ColumnStatisticsImpl.create(inspector);
    +      fileStatistics = ColumnStatisticsImpl.create(inspector);
    +      childrenWriters = new TreeWriter[0];
    +      rowIndex = OrcProto.RowIndex.newBuilder();
    +      rowIndexEntry = OrcProto.RowIndexEntry.newBuilder();
    +      rowIndexPosition = new RowIndexPositionRecorder(rowIndexEntry);
    +      stripeStatsBuilders = Lists.newArrayList();
    +      if (streamFactory.buildIndex()) {
    +        rowIndexStream = streamFactory.createStream(id, OrcProto.Stream.Kind.ROW_INDEX);
    +      } else {
    +        rowIndexStream = null;
    +      }
    +      if (createBloomFilter) {
    +        bloomFilterEntry = OrcProto.BloomFilter.newBuilder();
    +        bloomFilterIndex = OrcProto.BloomFilterIndex.newBuilder();
    +        bloomFilterStream = streamFactory.createStream(id, OrcProto.Stream.Kind.BLOOM_FILTER);
    +        bloomFilter = new BloomFilterIO(streamFactory.getRowIndexStride(),
    +            streamFactory.getBloomFilterFPP());
    +      } else {
    +        bloomFilterEntry = null;
    +        bloomFilterIndex = null;
    +        bloomFilterStream = null;
    +        bloomFilter = null;
    +      }
    +    }
    +
    +    protected OrcProto.RowIndex.Builder getRowIndex() {
    +      return rowIndex;
    +    }
    +
    +    protected ColumnStatisticsImpl getStripeStatistics() {
    +      return stripeColStatistics;
    +    }
    +
    +    protected ColumnStatisticsImpl getFileStatistics() {
    +      return fileStatistics;
    +    }
    +
    +    protected OrcProto.RowIndexEntry.Builder getRowIndexEntry() {
    +      return rowIndexEntry;
    +    }
    +
    +    IntegerWriter createIntegerWriter(PositionedOutputStream output,
    +                                      boolean signed, boolean isDirectV2,
    +                                      StreamFactory writer) {
    +      if (isDirectV2) {
    +        boolean alignedBitpacking = false;
    +        if (writer.getEncodingStrategy().equals(OrcFile.EncodingStrategy.SPEED)) {
    +          alignedBitpacking = true;
    +        }
    +        return new RunLengthIntegerWriterV2(output, signed, alignedBitpacking);
    +      } else {
    +        return new RunLengthIntegerWriter(output, signed);
    +      }
    +    }
    +
    +    boolean isNewWriteFormat(StreamFactory writer) {
    +      return writer.getVersion() != OrcFile.Version.V_0_11;
    +    }
    +
    +    /**
    +     * Add a new value to the column.
    +     * @param datum
    +     * @throws IOException
    +     */
    +    void write(Datum datum) throws IOException {
    +      if (datum != null && datum.isNotNull()) {
    +        indexStatistics.increment();
    +      } else {
    +        indexStatistics.setNull();
    +      }
    +      if (isPresent != null) {
    +        if(datum == null || datum.isNull()) {
    +          foundNulls = true;
    +          isPresent.write(0);
    +        }
    +        else {
    +          isPresent.write(1);
    +        }
    +      }
    +    }
    +
    +    void write(Tuple tuple) throws IOException {
    +      if (tuple != null) {
    +        indexStatistics.increment();
    +      } else {
    +        indexStatistics.setNull();
    +      }
    +      if (isPresent != null) {
    +        if (tuple == null) {
    +          foundNulls = true;
    +          isPresent.write(0);
    +        } else {
    +          isPresent.write(1);
    +        }
    +      }
    +    }
    +
    +    private void removeIsPresentPositions() {
    +      for(int i=0; i < rowIndex.getEntryCount(); ++i) {
    +        RowIndexEntry.Builder entry = rowIndex.getEntryBuilder(i);
    +        List<Long> positions = entry.getPositionsList();
    +        // bit streams use 3 positions if uncompressed, 4 if compressed
    +        positions = positions.subList(isCompressed ? 4 : 3, positions.size());
    +        entry.clearPositions();
    +        entry.addAllPositions(positions);
    +      }
    +    }
    +
    +    /**
    +     * Write the stripe out to the file.
    +     * @param builder the stripe footer that contains the information about the
    +     *                layout of the stripe. The TreeWriter is required to update
    +     *                the footer with its information.
    +     * @param requiredIndexEntries the number of index entries that are
    +     *                             required. this is to check to make sure the
    +     *                             row index is well formed.
    +     * @throws IOException
    +     */
    +    void writeStripe(OrcProto.StripeFooter.Builder builder,
    +                     int requiredIndexEntries) throws IOException {
    +      if (isPresent != null) {
    +        isPresent.flush();
    +
    +        // if no nulls are found in a stream, then suppress the stream
    +        if(!foundNulls) {
    +          isPresentOutStream.suppress();
    +          // since isPresent bitstream is suppressed, update the index to
    +          // remove the positions of the isPresent stream
    +          if (rowIndexStream != null) {
    +            removeIsPresentPositions();
    +          }
    +        }
    +      }
    +
    +      // merge stripe-level column statistics to file statistics and write it to
    +      // stripe statistics
    +      OrcProto.StripeStatistics.Builder stripeStatsBuilder = OrcProto.StripeStatistics.newBuilder();
    +      writeStripeStatistics(stripeStatsBuilder, this);
    +      stripeStatsBuilders.add(stripeStatsBuilder);
    +
    +      // reset the flag for next stripe
    +      foundNulls = false;
    +
    +      builder.addColumns(getEncoding());
    +      if (streamFactory.hasWriterTimeZone()) {
    --- End diff --
    
    This timezone policy may be different from those of Tajo. Could you check the timezone? The following codes would be helpful to this work.
    
    https://github.com/apache/tajo/blob/master/tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/TextFieldSerializerDeserializer.java#L56


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r39591790
  
    --- Diff: tajo-storage/tajo-storage-hdfs/src/test/java/org/apache/tajo/storage/orc/TestOrc.java ---
    @@ -39,18 +47,29 @@
     
     import java.io.IOException;
     import java.net.URL;
    +import java.util.List;
     
    -public class TestORCScanner {
    +public class TestOrc {
    --- End diff --
    
    It can be removed.
    When I make the test, I didn't know about TestStorages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on the pull request:

    https://github.com/apache/tajo/pull/652#issuecomment-127533651
  
    It seems timezone processing has some bug.
    See https://issues.apache.org/jira/browse/TAJO-1741.
    Currently writing the test is blocked.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r39255974
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java ---
    @@ -365,7 +362,7 @@ public static TimestampDatum createTimestamp(Datum datum, @Nullable TimeZone tz)
           case TIMESTAMP:
             return (TimestampDatum) datum;
           default:
    -        throw new InvalidCastException(datum.type(), Type.TIMESTAMP);
    +        throw new TajoRuntimeException(Errors.ResultCode.INVALID_CAST, datum.type().name(), Type.TIMESTAMP.name());
    --- End diff --
    
    It should be ```new TajoRuntimeException(new InvalidCastException(...)```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on the pull request:

    https://github.com/apache/tajo/pull/652#issuecomment-125900543
  
    Hi @hyunsik .
    
    I modified what you mentioned.
    Please review again, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r39255333
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java ---
    @@ -107,53 +107,50 @@ public static Datum createFromString(DataType dataType, String value) {
         case INET4:
           return createInet4(value);
         default:
    -      throw new UnsupportedOperationException(dataType.toString());
    +      throw new TajoRuntimeException(Errors.ResultCode.UNSUPPORTED_DATATYPE, dataType.toString());
    --- End diff --
    
    It should be ```new TajoRuntimeException(new UnsupportedDataType(...)```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r39255401
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java ---
    @@ -107,53 +107,50 @@ public static Datum createFromString(DataType dataType, String value) {
         case INET4:
           return createInet4(value);
         default:
    -      throw new UnsupportedOperationException(dataType.toString());
    +      throw new TajoRuntimeException(Errors.ResultCode.UNSUPPORTED_DATATYPE, dataType.toString());
         }
       }
     
       public static Datum createFromBytes(DataType dataType, byte[] bytes) {
         switch (dataType.getType()) {
     
    -    case BOOLEAN:
    -      return createBool(bytes[0]);
    -    case INT2:
    -      return createInt2(NumberUtil.toShort(bytes));
    -    case INT4:
    -      return createInt4(NumberUtil.toInt(bytes));
    -    case INT8:
    -      return createInt8(NumberUtil.toLong(bytes));
    -    case FLOAT4:
    -      return createFloat4(NumberUtil.toFloat(bytes));
    -    case FLOAT8:
    -      return createFloat8(NumberUtil.toDouble(bytes));
    -    case CHAR:
    -      return createChar(bytes);
    -    case TEXT:
    -      return createText(bytes);
    -    case DATE:
    -      return new DateDatum(NumberUtil.toInt(bytes));
    -    case TIME:
    -      return new TimeDatum(NumberUtil.toLong(bytes));
    -    case TIMESTAMP:
    -      return new TimestampDatum(NumberUtil.toLong(bytes));
    -    case BIT:
    -      return createBit(bytes[0]);
    -    case BLOB:
    -      return createBlob(bytes);
    -    case INET4:
    -      return createInet4(bytes);
    -    case PROTOBUF:
    -      ProtobufDatumFactory factory = ProtobufDatumFactory.get(dataType);
    -      Message.Builder builder = factory.newBuilder();
    -      try {
    -        builder.mergeFrom(bytes);
    -        return factory.createDatum(builder.build());
    -      } catch (IOException e) {
    -        e.printStackTrace();
    -        throw new RuntimeException(e);
    -      }
    -    default:
    -        throw new UnsupportedOperationException(dataType.toString());
    +      case BOOLEAN:
    +        return createBool(bytes[0]);
    +      case INT2:
    +        return createInt2(NumberUtil.toShort(bytes));
    +      case INT4:
    +        return createInt4(NumberUtil.toInt(bytes));
    +      case INT8:
    +        return createInt8(NumberUtil.toLong(bytes));
    +      case FLOAT4:
    +        return createFloat4(NumberUtil.toFloat(bytes));
    +      case FLOAT8:
    +        return createFloat8(NumberUtil.toDouble(bytes));
    +      case CHAR:
    +        return createChar(bytes);
    +      case TEXT:
    +        return createText(bytes);
    +      case DATE:
    +        return new DateDatum(NumberUtil.toInt(bytes));
    +      case TIME:
    +        return new TimeDatum(NumberUtil.toLong(bytes));
    +      case TIMESTAMP:
    +        return new TimestampDatum(NumberUtil.toLong(bytes));
    +      case BIT:
    +        return createBit(bytes[0]);
    +      case BLOB:
    +        return createBlob(bytes);
    +      case INET4:
    +        return createInet4(bytes);
    +      case PROTOBUF:
    +        try {
    +          return ProtobufDatumFactory.createDatum(dataType, bytes);
    +        } catch (IOException e) {
    +          e.printStackTrace();
    +          throw new TajoRuntimeException(Errors.ResultCode.IO_ERROR, e.getMessage());
    +        }
    +      default:
    +        throw new TajoRuntimeException(Errors.ResultCode.UNSUPPORTED_DATATYPE, dataType.toString());
    --- End diff --
    
    It should be new TajoRuntimeException(new UnsupportedDataType(...).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r39255984
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java ---
    @@ -462,7 +459,7 @@ public static Datum cast(Datum operandDatum, DataType target, @Nullable TimeZone
         case ANY:
           return DatumFactory.createAny(operandDatum);
         default:
    -      throw new InvalidCastException(operandDatum.type(), target.getType());
    +      throw new TajoRuntimeException(Errors.ResultCode.INVALID_CAST, operandDatum.type().name(), target.getType().name());
    --- End diff --
    
    It should be ```new TajoRuntimeException(new InvalidCastException(...)```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r39255326
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java ---
    @@ -71,7 +71,7 @@
           case NULL_TYPE:
             return NullDatum.class;
           default:
    -        throw new UnsupportedOperationException(type.name());
    +        throw new TajoRuntimeException(Errors.ResultCode.UNSUPPORTED_DATATYPE, type.name());
    --- End diff --
    
    It should be ```new TajoRuntimeException(new UnsupportedDataType(...)```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/652#issuecomment-140759719
  
    +1
    
    Before commiting the patch, I slightly changed as follows
     * ``INVALID_CAST`` to ``INVALID_VALUE_FOR_CAST``
     * Add SQLState for ``INVALID_VALUE_FOR_CAST`` to ``SQLExceptionUtil``
     * Add an map entry for ``INVALID_VALUE_FOR_CAST`` to ``ExceptionUtil``


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/652#discussion_r38281202
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java ---
    @@ -114,45 +114,42 @@ public static Datum createFromString(DataType dataType, String value) {
       public static Datum createFromBytes(DataType dataType, byte[] bytes) {
         switch (dataType.getType()) {
     
    -    case BOOLEAN:
    -      return createBool(bytes[0]);
    -    case INT2:
    -      return createInt2(NumberUtil.toShort(bytes));
    -    case INT4:
    -      return createInt4(NumberUtil.toInt(bytes));
    -    case INT8:
    -      return createInt8(NumberUtil.toLong(bytes));
    -    case FLOAT4:
    -      return createFloat4(NumberUtil.toFloat(bytes));
    -    case FLOAT8:
    -      return createFloat8(NumberUtil.toDouble(bytes));
    -    case CHAR:
    -      return createChar(bytes);
    -    case TEXT:
    -      return createText(bytes);
    -    case DATE:
    -      return new DateDatum(NumberUtil.toInt(bytes));
    -    case TIME:
    -      return new TimeDatum(NumberUtil.toLong(bytes));
    -    case TIMESTAMP:
    -      return new TimestampDatum(NumberUtil.toLong(bytes));
    -    case BIT:
    -      return createBit(bytes[0]);
    -    case BLOB:
    -      return createBlob(bytes);
    -    case INET4:
    -      return createInet4(bytes);
    -    case PROTOBUF:
    -      ProtobufDatumFactory factory = ProtobufDatumFactory.get(dataType);
    -      Message.Builder builder = factory.newBuilder();
    -      try {
    -        builder.mergeFrom(bytes);
    -        return factory.createDatum(builder.build());
    -      } catch (IOException e) {
    -        e.printStackTrace();
    -        throw new RuntimeException(e);
    -      }
    -    default:
    +      case BOOLEAN:
    +        return createBool(bytes[0]);
    +      case INT2:
    +        return createInt2(NumberUtil.toShort(bytes));
    +      case INT4:
    +        return createInt4(NumberUtil.toInt(bytes));
    +      case INT8:
    +        return createInt8(NumberUtil.toLong(bytes));
    +      case FLOAT4:
    +        return createFloat4(NumberUtil.toFloat(bytes));
    +      case FLOAT8:
    +        return createFloat8(NumberUtil.toDouble(bytes));
    +      case CHAR:
    +        return createChar(bytes);
    +      case TEXT:
    +        return createText(bytes);
    +      case DATE:
    +        return new DateDatum(NumberUtil.toInt(bytes));
    +      case TIME:
    +        return new TimeDatum(NumberUtil.toLong(bytes));
    +      case TIMESTAMP:
    +        return new TimestampDatum(NumberUtil.toLong(bytes));
    +      case BIT:
    +        return createBit(bytes[0]);
    +      case BLOB:
    +        return createBlob(bytes);
    +      case INET4:
    +        return createInet4(bytes);
    +      case PROTOBUF:
    +        try {
    +          return ProtobufDatumFactory.createDatum(dataType, bytes);
    +        } catch (IOException e) {
    +          e.printStackTrace();
    +          throw new RuntimeException(e);
    +        }
    +      default:
             throw new UnsupportedOperationException(dataType.toString());
    --- End diff --
    
    It should be UndefinedOperatorException wrapped by TajoRuntimeException. You need to take a look at ErrorMessage.java in order to know how it makes the error message for UndefinedOperatorException.
    
    I also think that UnsupportedException is also proper here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/652#issuecomment-125080517
  
    I'm reviewing this patch. I'll leave comments soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/652#issuecomment-140762095
  
    Thank you for your work. I committed as c3c78fc2147768595051b8cfb1006f13ca2db1bf. But, I missed to add github command in the commit log. Could you close this ticket?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by eminency <gi...@git.apache.org>.
Github user eminency closed the pull request at:

    https://github.com/apache/tajo/pull/652


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1465: Add ORCFileAppender to write into OR...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/652#issuecomment-139545475
  
    I leaved comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---