You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by blrunner <gi...@git.apache.org> on 2015/07/06 03:51:49 UTC

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

GitHub user blrunner opened a pull request:

    https://github.com/apache/tajo/pull/624

    TAJO-1493: Add a method to get partition directories with filter conditions.

    Not yet implemented unit test cases. 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/blrunner/tajo TAJO-1493

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/624.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #624
    
----
commit 743122bade809e5a6fd6ccbb5025a87233e0ac40
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-07-06T01:49:20Z

    TAJO-1493: Add a method to get partition directories with filter conditions.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Make partition pruning based on cata...

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39335162
  
    --- Diff: tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/AbstractDBStore.java ---
    @@ -2172,6 +2178,252 @@ private void setPartitionKeys(int pid, PartitionDescProto.Builder partitionDesc)
         return partitions;
       }
     
    +  /**
    +   * Check if list of partitios exist on catalog.
    +   *
    +   *
    +   * @param databaseId
    +   * @param tableId
    +   * @return
    +   */
    +  public boolean existPartitionsOnCatalog(int databaseId, int tableId) {
    +    Connection conn = null;
    +    ResultSet res = null;
    +    PreparedStatement pstmt = null;
    +    boolean result = false;
    +
    +    try {
    +      String sql = "SELECT COUNT(*) CNT FROM "
    +        + TB_PARTTIONS +" WHERE " + COL_TABLES_PK + " = ?  ";
    +
    +      if (LOG.isDebugEnabled()) {
    +        LOG.debug(sql);
    +      }
    +
    +      conn = getConnection();
    +      pstmt = conn.prepareStatement(sql);
    +      pstmt.setInt(1, tableId);
    +      res = pstmt.executeQuery();
    +
    +      if (res.next()) {
    +        if (res.getInt("CNT") > 0) {
    +          result = true;
    +        }
    +      }
    +    } catch (SQLException se) {
    +      throw new TajoInternalError(se);
    +    } finally {
    +      CatalogUtil.closeQuietly(pstmt, res);
    +    }
    +    return result;
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByFilter(PartitionsByFilterProto request) {
    +    throw new TajoRuntimeException(new UnsupportedException());
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByAlgebra(PartitionsByAlgebraProto request) throws
    +      UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException {
    +    Connection conn = null;
    +    PreparedStatement pstmt = null;
    +    ResultSet res = null;
    +    int currentIndex = 1;
    +    String selectStatement = null;
    +
    +    List<PartitionDescProto> partitions = TUtil.newList();
    +    List<PartitionFilterSet> filterSets = TUtil.newList();
    +
    +    try {
    +      int databaseId = getDatabaseId(request.getDatabaseName());
    +      int tableId = getTableId(databaseId, request.getDatabaseName(), request.getTableName());
    +      if (!existPartitionMethod(request.getDatabaseName(), request.getTableName())) {
    +        throw new UndefinedPartitionMethodException(request.getTableName());
    +      }
    +
    +      if (!existPartitionsOnCatalog(databaseId, tableId)) {
    +        throw new PartitionNotFoundException(request.getTableName());
    --- End diff --
    
    The purpose is for partition pruning when there is no partitions on catalog and partition directories exist on file system. Above case, users need to execute recover partition (https://github.com/apache/tajo/pull/626). But we need to handle their queries if  they don't recover their partitions located on file system to catalog. So, I implemented the exception and if catalog throws the exception, PartitionedTableRewriter would set a list of partitions to null. And then, the rewriter would check directories from file system.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39007546
  
    --- Diff: tajo-catalog/tajo-catalog-client/src/main/java/org/apache/tajo/catalog/AbstractCatalogClient.java ---
    @@ -446,6 +450,52 @@ public final PartitionDescProto getPartition(final String databaseName, final St
       }
     
       @Override
    +  public boolean existPartitions(String databaseName, String tableName) {
    --- End diff --
    
    How about renaming it into ``isPartitionedTable``? It seems to be better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39010692
  
    --- Diff: tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/AbstractDBStore.java ---
    @@ -2173,6 +2179,248 @@ private void setPartitionKeys(int pid, PartitionDescProto.Builder partitionDesc)
       }
     
       @Override
    +  public boolean existPartitions(String databaseName, String tableName) throws UndefinedDatabaseException,
    +    UndefinedTableException, UndefinedPartitionMethodException {
    +    Connection conn = null;
    +    ResultSet res = null;
    +    PreparedStatement pstmt = null;
    +    boolean result = false;
    +
    +    try {
    +      int databaseId = getDatabaseId(databaseName);
    +      int tableId = getTableId(databaseId, databaseName, tableName);
    +
    +      if (!existPartitionMethod(databaseName, tableName)) {
    +        throw new UndefinedPartitionMethodException(tableName);
    +      }
    +
    +      String sql = "SELECT COUNT(*) CNT FROM "
    +        + TB_PARTTIONS +" WHERE " + COL_TABLES_PK + " = ?  ";
    +
    +      if (LOG.isDebugEnabled()) {
    +        LOG.debug(sql);
    +      }
    +
    +      conn = getConnection();
    +      pstmt = conn.prepareStatement(sql);
    +      pstmt.setInt(1, tableId);
    +      res = pstmt.executeQuery();
    +
    +      if (res.next()) {
    +        if (res.getInt("CNT") > 0) {
    +          result = true;
    +        }
    +      }
    +    } catch (SQLException se) {
    +      throw new TajoInternalError(se);
    +    } finally {
    +      CatalogUtil.closeQuietly(pstmt, res);
    +    }
    +    return result;
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByDirectSql(PartitionsByDirectSqlProto request) {
    +    throw new TajoRuntimeException(new UnsupportedException("getPartitionsByDirectSql"));
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByAlgebra(PartitionsByAlgebraProto request) throws
    +      UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException {
    +    Connection conn = null;
    +    PreparedStatement pstmt = null;
    +    ResultSet res = null;
    +    int currentIndex = 1;
    +    String directSQL = null;
    +
    +    List<PartitionDescProto> partitions = TUtil.newList();
    +    List<PartitionFilterSet> filterSets = TUtil.newList();
    +
    +    try {
    +      int databaseId = getDatabaseId(request.getDatabaseName());
    +      int tableId = getTableId(databaseId, request.getDatabaseName(), request.getTableName());
    +      if (!existPartitionMethod(request.getDatabaseName(), request.getTableName())) {
    +        throw new UndefinedPartitionMethodException(request.getTableName());
    +      }
    +
    +      TableDescProto tableDesc = getTable(request.getDatabaseName(), request.getTableName());
    +      conn = getConnection();
    --- End diff --
    
    It may be very trivial. It would be better if this line is moved above ``pstmt = conn.prepareStatement(directSQL);``.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Make partition pruning based on cata...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/624#issuecomment-140765711
  
    Thank you for your work. The latest patch looks good to me. Here is my +1. I leaved one comment. You can commit it after reflecting my comment if you agree.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/624#issuecomment-137321819
  
    I added DateTime types and more unit test cases for various partition column types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r38730338
  
    --- Diff: tajo-catalog/tajo-catalog-client/src/main/java/org/apache/tajo/catalog/AbstractCatalogClient.java ---
    @@ -446,6 +446,46 @@ public final PartitionDescProto getPartition(final String databaseName, final St
       }
     
       @Override
    +  public boolean existPartitions(String databaseName, String tableName) {
    +    try {
    +      final BlockingInterface stub = getStub();
    +      final TableIdentifierProto request = buildTableIdentifier(databaseName, tableName);
    +      return isSuccess(stub.existPartitionsByTableName(null, request));
    +    } catch (ServiceException e) {
    +      throw new RuntimeException(e);
    +    }
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByAlgebra(PartitionsByAlgebraProto request) throws
    +    UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException,
    +    UndefinedOperatorException {
    +    try {
    +      final BlockingInterface stub = getStub();
    +      GetPartitionsResponse response = stub.getPartitionsByAlgebra(null, request);
    +      ensureOk(response.getState());
    +    return response.getPartitionList();
    +    } catch (ServiceException e) {
    +      LOG.error(e.getMessage(), e);
    +      return null;
    +    }
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByDirectSql(PartitionsByDirectSqlProto request) throws
    +    UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException,
    --- End diff --
    
    To handle such exceptions, you should put throwsIfThisError method for the exceptions. Please refer to https://github.com/apache/tajo/pull/624/files#diff-9ecb2222e062d0cc008fb1c3da61d4b0L483.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/624#issuecomment-127042279
  
    I implemented CatalogStore::getPartitionsByDirectSql which execute a select statement to CatalogStore for searching right partition directories. But current tajo testing cluster use MemStore for default CatalogStore and I couldn't execute the select statement because MemStore just retain all catalog informations to HashMap. If you want to verify this patch, you can test as following:
    
    ```
    mvn clean install -Pparallel-test -DLOG_LEVEL=WARN -Dmaven.fork.count=2 -Dtajo.catalog.store.class=org.apache.tajo.catalog.store.DerbyStore
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Make partition pruning based on cata...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39271506
  
    --- Diff: tajo-catalog/tajo-catalog-client/src/main/java/org/apache/tajo/catalog/AbstractCatalogClient.java ---
    @@ -25,6 +25,7 @@
     import org.apache.tajo.catalog.CatalogProtocol.CatalogProtocolService.BlockingInterface;
     import org.apache.tajo.catalog.CatalogProtocol.*;
     import org.apache.tajo.catalog.partition.PartitionMethodDesc;
    +import org.apache.tajo.catalog.proto.CatalogProtos;
    --- End diff --
    
    It seems to be not used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39001856
  
    --- Diff: tajo-catalog/tajo-catalog-client/src/main/java/org/apache/tajo/catalog/AbstractCatalogClient.java ---
    @@ -446,6 +450,52 @@ public final PartitionDescProto getPartition(final String databaseName, final St
       }
     
       @Override
    +  public boolean existPartitions(String databaseName, String tableName) {
    +    try {
    +      final BlockingInterface stub = getStub();
    +      final TableIdentifierProto request = buildTableIdentifier(databaseName, tableName);
    +      return isSuccess(stub.existPartitionsByTableName(null, request));
    --- End diff --
    
    Even through this API is checking if the table exists, it potentially returns unexpected exception. If you check only its success, it may return a wrong result.
    
    So, you should return true or false depending on only both its success and UNDEFINED_PARTITION_METHODS. You can find the similar code pattern from ``existPartitionMethod`` method in this class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r36275539
  
    --- Diff: tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/MemStore.java ---
    @@ -28,7 +28,15 @@
     import org.apache.tajo.catalog.*;
     import org.apache.tajo.catalog.exception.*;
     import org.apache.tajo.catalog.proto.CatalogProtos;
    -import org.apache.tajo.catalog.proto.CatalogProtos.*;
    +//import org.apache.tajo.catalog.proto.CatalogProtos.ColumnProto;
    --- End diff --
    
    You may forget the removal of commented out lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r38731533
  
    --- Diff: tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/CatalogServer.java ---
    @@ -957,7 +957,61 @@ public GetPartitionDescResponse getPartitionByPartitionName(RpcController contro
         }
     
         @Override
    -    public GetPartitionsResponse getPartitionsByTableName(RpcController controller, PartitionIdentifierProto request)
    +    public ReturnState existPartitionsByTableName(RpcController controller, TableIdentifierProto request)
    +      throws ServiceException {
    +      String dbName = request.getDatabaseName();
    +      String tbName = request.getTableName();
    +
    +      try {
    +        // linked meta data do not support partition.
    +        // So, the request that wants to get partitions in this db will be failed.
    +        if (linkedMetadataManager.existsDatabase(dbName)) {
    +          return errUndefinedPartitionMethod(tbName);
    +        }
    +      } catch (Throwable t) {
    +        printStackTraceIfError(LOG, t);
    +        return returnError(t);
    +      }
    +
    +      if (metaDictionary.isSystemDatabase(dbName)) {
    +        return errUndefinedPartitionMethod(tbName);
    +      }
    +
    +      rlock.lock();
    +      try {
    +        boolean contain;
    +
    +        contain = store.existDatabase(dbName);
    +        if (contain) {
    --- End diff --
    
    These tests can be improved like other methods. Please refer to existsDatabase() method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Make partition pruning based on cata...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/tajo/pull/624


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Make partition pruning based on cata...

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/624#issuecomment-140937939
  
    @hyunsik 
    
    Thank you very much for your kind review. I've just reflected your comment. After finishing the travis CI build, I'll ship it. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r38730640
  
    --- Diff: tajo-catalog/tajo-catalog-drivers/tajo-hive/src/main/java/org/apache/tajo/catalog/store/HiveCatalogStore.java ---
    @@ -846,10 +855,65 @@ public boolean existPartitionMethod(String databaseName, String tableName) throw
     
       @Override
       public List<CatalogProtos.PartitionDescProto> getPartitions(String databaseName,
    -                                                         String tableName) {
    +                                                         String tableName) throws UndefinedDatabaseException,
    +    UndefinedTableException, UndefinedPartitionMethodException, UndefinedPartitionException{
         throw new UnsupportedOperationException();
       }
     
    +  @Override
    +  public boolean existPartitions(String databaseName, String tableName) throws UndefinedDatabaseException,
    +    UndefinedTableException, UndefinedPartitionMethodException {
    +    throw new UnsupportedOperationException();
    --- End diff --
    
    UnsupportedException is proper here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Make partition pruning based on cata...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39639478
  
    --- Diff: tajo-core-tests/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java ---
    @@ -1325,4 +1332,421 @@ public final void testDuplicatedPartitions() throws Exception {
           executeString("DROP TABLE " + tableName + " PURGE");
         }
       }
    +
    +  @Test
    +  public final void testPatternMatchingPredicatesAndStringFunctions() throws Exception {
    +    ResultSet res = null;
    +    String tableName = CatalogUtil.normalizeIdentifier("testPatternMatchingPredicatesAndStringFunctions");
    +    String expectedResult;
    +
    +    if (nodeType == NodeType.INSERT) {
    +      executeString("create table " + tableName
    +        + " (col1 int4, col2 int4) partition by column(l_shipdate text, l_returnflag text) ").close();
    +
    +      assertTrue(catalog.existsTable(DEFAULT_DATABASE_NAME, tableName));
    +      assertEquals(2, catalog.getTableDesc(DEFAULT_DATABASE_NAME, tableName).getSchema().size());
    +      assertEquals(4, catalog.getTableDesc(DEFAULT_DATABASE_NAME, tableName).getLogicalSchema().size());
    +
    +      executeString(
    +        "insert overwrite into " + tableName + " select l_orderkey, l_partkey, l_shipdate, l_returnflag from lineitem");
    +    } else {
    +      executeString(
    +        "create table " + tableName + "(col1 int4, col2 int4) partition by column(l_shipdate text, l_returnflag text) "
    +          + " as select l_orderkey, l_partkey, l_shipdate, l_returnflag from lineitem");
    +    }
    +
    +    assertTrue(client.existTable(tableName));
    +
    +    // Like
    +    res = executeString("SELECT * FROM " + tableName
    +      + " WHERE l_shipdate LIKE '1996%' and l_returnflag = 'N' order by l_shipdate");
    +
    +    expectedResult = "col1,col2,l_shipdate,l_returnflag\n" +
    +      "-------------------------------\n" +
    +      "1,1,1996-03-13,N\n" +
    +      "1,1,1996-04-12,N\n";
    +
    +    assertEquals(expectedResult, resultSetToString(res));
    +    res.close();
    +
    +    // Not like
    +    res = executeString("SELECT * FROM " + tableName
    +      + " WHERE l_shipdate NOT LIKE '1996%' and l_returnflag IN ('R') order by l_shipdate");
    +
    +    expectedResult = "col1,col2,l_shipdate,l_returnflag\n" +
    +      "-------------------------------\n" +
    +      "3,3,1993-11-09,R\n" +
    +      "3,2,1994-02-02,R\n";
    +
    +    assertEquals(expectedResult, resultSetToString(res));
    +    res.close();
    +
    +    // In
    +    res = executeString("SELECT * FROM " + tableName
    +      + " WHERE l_shipdate IN ('1993-11-09', '1994-02-02', '1997-01-28') AND l_returnflag = 'R' order by l_shipdate");
    +
    +    expectedResult = "col1,col2,l_shipdate,l_returnflag\n" +
    +      "-------------------------------\n" +
    +      "3,3,1993-11-09,R\n" +
    +      "3,2,1994-02-02,R\n";
    +
    +    assertEquals(expectedResult, resultSetToString(res));
    +    res.close();
    +
    +    // Similar to
    +    res = executeString("SELECT * FROM " + tableName + " WHERE l_shipdate similar to '1993%' order by l_shipdate");
    +
    +    expectedResult = "col1,col2,l_shipdate,l_returnflag\n" +
    +      "-------------------------------\n" +
    +      "3,3,1993-11-09,R\n";
    +
    +    assertEquals(expectedResult, resultSetToString(res));
    +    res.close();
    +
    +    // Regular expression
    +    res = executeString("SELECT * FROM " + tableName
    +      + " WHERE l_shipdate regexp '[1-2][0-9][0-9][3-9]-[0-1][0-9]-[0-3][0-9]' "
    +      + " AND l_returnflag <> 'N' ORDER BY l_shipdate");
    +
    +    expectedResult = "col1,col2,l_shipdate,l_returnflag\n" +
    +      "-------------------------------\n" +
    +      "3,3,1993-11-09,R\n" +
    +      "3,2,1994-02-02,R\n";
    +
    +    assertEquals(expectedResult, resultSetToString(res));
    +    res.close();
    +
    +    // Concatenate
    +    res = executeString("SELECT * FROM " + tableName
    +      + " WHERE l_shipdate = ( '1996' || '-' || '03' || '-' || '13' ) order by l_shipdate");
    +
    +    expectedResult = "col1,col2,l_shipdate,l_returnflag\n" +
    +      "-------------------------------\n" +
    +      "1,1,1996-03-13,N\n";
    +
    +    assertEquals(expectedResult, resultSetToString(res));
    +    res.close();
    +
    +    executeString("DROP TABLE " + tableName + " PURGE").close();
    +    res.close();
    +  }
    +
    +  @Test
    +  public final void testDatePartitionColumn() throws Exception {
    +    ResultSet res = null;
    +    String tableName = CatalogUtil.normalizeIdentifier("testDatePartitionColumn");
    +    String expectedResult;
    +
    +    if (nodeType == NodeType.INSERT) {
    +      executeString("create table " + tableName + " (col1 int4, col2 int4) partition by column(key date) ").close();
    +
    +      assertTrue(catalog.existsTable(DEFAULT_DATABASE_NAME, tableName));
    +      assertEquals(2, catalog.getTableDesc(DEFAULT_DATABASE_NAME, tableName).getSchema().size());
    +      assertEquals(3, catalog.getTableDesc(DEFAULT_DATABASE_NAME, tableName).getLogicalSchema().size());
    +
    +      executeString(
    +        "insert overwrite into " + tableName + " select l_orderkey, l_partkey, l_shipdate from lineitem");
    +    } else {
    +      executeString(
    +        "create table " + tableName + "(col1 int4, col2 int4) partition by column(key date) "
    +          + " as select l_orderkey, l_partkey, l_shipdate::date from lineitem");
    +    }
    +
    +    assertTrue(client.existTable(tableName));
    +
    +    // LessThanOrEquals
    +    res = executeString("SELECT * FROM " + tableName + " WHERE key <= date '1995-09-01' order by col1, col2, key");
    +
    +    expectedResult = "col1,col2,key\n" +
    +      "-------------------------------\n" +
    +      "3,2,1994-02-02\n" +
    +      "3,3,1993-11-09\n";
    +
    +    assertEquals(expectedResult, resultSetToString(res));
    +    res.close();
    +
    +    // LessThan and GreaterThan
    +    res = executeString("SELECT * FROM " + tableName
    +      + " WHERE key > to_date('1993-01-01', 'YYYY-MM-DD') " +
    +      " and key < to_date('1996-01-01', 'YYYY-MM-DD') order by col1, col2, key desc");
    +
    +    expectedResult = "col1,col2,key\n" +
    +      "-------------------------------\n" +
    +      "3,2,1994-02-02\n" +
    +      "3,3,1993-11-09\n";
    +
    +    assertEquals(expectedResult, resultSetToString(res));
    +    res.close();
    +
    +    // Between
    +    res = executeString("SELECT * FROM " + tableName
    +      + " WHERE key between date '1993-01-01' and date '1997-01-01' order by col1, col2, key desc");
    +
    +    expectedResult = "col1,col2,key\n" +
    +      "-------------------------------\n" +
    +      "1,1,1996-04-12\n" +
    +      "1,1,1996-03-13\n" +
    +      "3,2,1994-02-02\n" +
    +      "3,3,1993-11-09\n";
    +
    +    assertEquals(expectedResult, resultSetToString(res));
    +    res.close();
    +
    +    // Cast
    +    res = executeString("SELECT * FROM " + tableName
    +      + " WHERE key > '1993-01-01'::date " +
    +      " and key < '1997-01-01'::timestamp order by col1, col2, key ");
    +
    +    expectedResult = "col1,col2,key\n" +
    +      "-------------------------------\n" +
    +      "1,1,1996-03-13\n" +
    +      "1,1,1996-04-12\n" +
    +      "3,2,1994-02-02\n" +
    +      "3,3,1993-11-09\n";
    +
    +    assertEquals(expectedResult, resultSetToString(res));
    +    res.close();
    +
    +    // Interval
    +    res = executeString("SELECT * FROM " + tableName
    +      + " WHERE key > '1993-01-01'::date " +
    +      " and key < date '1994-01-01' + interval '1 year' order by col1, col2, key ");
    +
    +    expectedResult = "col1,col2,key\n" +
    +      "-------------------------------\n" +
    +      "3,2,1994-02-02\n" +
    +      "3,3,1993-11-09\n";
    +
    +    assertEquals(expectedResult, resultSetToString(res));
    +    res.close();
    +
    +    // DateTime Function #1
    +    res = executeString("SELECT * FROM " + tableName
    +      + " WHERE key > '1993-01-01'::date " +
    +      " and key < add_months(date '1994-01-01', 12) order by col1, col2, key ");
    +
    +    assertEquals(expectedResult, resultSetToString(res));
    +    res.close();
    +
    +    // DateTime Function #2
    +    res = executeString("SELECT * FROM " + tableName
    +      + " WHERE key > '1993-01-01'::date " +
    +      " and key < add_months('1994-01-01'::timestamp, 12) order by col1, col2, key ");
    +
    +    assertEquals(expectedResult, resultSetToString(res));
    +    res.close();
    +
    +    executeString("DROP TABLE " + tableName + " PURGE").close();
    +    res.close();
    +  }
    +
    +  @Test
    +  public final void testTimeStampPartitionColumn() throws Exception {
    --- End diff --
    
    Could you rename TimeStamp to Timestamp?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r38730288
  
    --- Diff: tajo-catalog/tajo-catalog-client/src/main/java/org/apache/tajo/catalog/AbstractCatalogClient.java ---
    @@ -428,13 +429,12 @@ public final PartitionDescProto getPartition(final String databaseName, final St
       }
     
       @Override
    -  public final List<PartitionDescProto> getPartitions(final String databaseName, final String tableName) {
    +  public final List<PartitionDescProto> getPartitions(final String databaseName, final String tableName) throws
    +    UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException,
    --- End diff --
    
    To handle such exceptions, you should put ``throwsIfThisError`` method for the exceptions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39007741
  
    --- Diff: tajo-catalog/tajo-catalog-client/src/main/proto/CatalogProtocol.proto ---
    @@ -121,9 +121,12 @@ service CatalogProtocolService {
       rpc existPartitionMethod(TableIdentifierProto) returns (ReturnState);
     
       rpc getPartitionByPartitionName(PartitionIdentifierProto) returns (GetPartitionDescResponse);
    -  rpc getPartitionsByTableName(PartitionIdentifierProto) returns (GetPartitionsResponse);
    +  rpc getPartitionsByTableName(TableIdentifierProto) returns (GetPartitionsResponse);
    +  rpc existPartitionsByTableName(TableIdentifierProto) returns (ReturnState);
    --- End diff --
    
    As I mentioned, ``isPartitionedTable`` would be better. It may not need the suffix ``byPartitionName``.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/624#issuecomment-134106861
  
    Hi @hyunsik 
    
    I've implemented partition API using algebra expressions. For this feature, I implemented  some visitors as following:
    
    * ScanQualConverter: This converts Quals of ScanNode to algebra expressions.
    * PartitionFilterAlgebraVisitor: This build SQL statement for getting partition filters by visiting algebra expressions.
    * PartitionFilterAlgebraVisitor: This build SQL statement for getting partition filters by visiting EvalNodes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r36580037
  
    --- Diff: tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/CatalogServer.java ---
    @@ -1195,6 +1249,71 @@ public GetTablePartitionsResponse getAllPartitions(RpcController controller, Nul
         }
     
         @Override
    +    public GetTablePartitionsResponse getPartitionsByDirectSql(RpcController controller,
    --- End diff --
    
    Catalog API is public APIs. In terms of this point, this API design does not make sense due to the following problems:
     * Users should know its schema and underlying persistent storage.
      * So, users should make SQL statements for all persistent storages.
     * This approach is very prone to SQL injection.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Make partition pruning based on cata...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39271326
  
    --- Diff: tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/AbstractDBStore.java ---
    @@ -2172,6 +2178,252 @@ private void setPartitionKeys(int pid, PartitionDescProto.Builder partitionDesc)
         return partitions;
       }
     
    +  /**
    +   * Check if list of partitios exist on catalog.
    +   *
    +   *
    +   * @param databaseId
    +   * @param tableId
    +   * @return
    +   */
    +  public boolean existPartitionsOnCatalog(int databaseId, int tableId) {
    +    Connection conn = null;
    +    ResultSet res = null;
    +    PreparedStatement pstmt = null;
    +    boolean result = false;
    +
    +    try {
    +      String sql = "SELECT COUNT(*) CNT FROM "
    +        + TB_PARTTIONS +" WHERE " + COL_TABLES_PK + " = ?  ";
    +
    +      if (LOG.isDebugEnabled()) {
    +        LOG.debug(sql);
    +      }
    +
    +      conn = getConnection();
    +      pstmt = conn.prepareStatement(sql);
    +      pstmt.setInt(1, tableId);
    +      res = pstmt.executeQuery();
    +
    +      if (res.next()) {
    +        if (res.getInt("CNT") > 0) {
    +          result = true;
    +        }
    +      }
    +    } catch (SQLException se) {
    +      throw new TajoInternalError(se);
    +    } finally {
    +      CatalogUtil.closeQuietly(pstmt, res);
    +    }
    +    return result;
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByFilter(PartitionsByFilterProto request) {
    +    throw new TajoRuntimeException(new UnsupportedException());
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByAlgebra(PartitionsByAlgebraProto request) throws
    +      UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException {
    +    Connection conn = null;
    +    PreparedStatement pstmt = null;
    +    ResultSet res = null;
    +    int currentIndex = 1;
    +    String selectStatement = null;
    +
    +    List<PartitionDescProto> partitions = TUtil.newList();
    +    List<PartitionFilterSet> filterSets = TUtil.newList();
    +
    +    try {
    +      int databaseId = getDatabaseId(request.getDatabaseName());
    +      int tableId = getTableId(databaseId, request.getDatabaseName(), request.getTableName());
    +      if (!existPartitionMethod(request.getDatabaseName(), request.getTableName())) {
    +        throw new UndefinedPartitionMethodException(request.getTableName());
    +      }
    +
    +      if (!existPartitionsOnCatalog(databaseId, tableId)) {
    +        throw new PartitionNotFoundException(request.getTableName());
    --- End diff --
    
    Could you elaborate the purpose of this exception? Does it means that there is no partition? If so, it does not make sense because it returns a list of partitions. So, it should return an empty list if there is no partitions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r38730855
  
    --- Diff: tajo-catalog/tajo-catalog-drivers/tajo-hive/src/main/java/org/apache/tajo/catalog/store/HiveCatalogStore.java ---
    @@ -846,10 +855,65 @@ public boolean existPartitionMethod(String databaseName, String tableName) throw
     
       @Override
       public List<CatalogProtos.PartitionDescProto> getPartitions(String databaseName,
    -                                                         String tableName) {
    +                                                         String tableName) throws UndefinedDatabaseException,
    +    UndefinedTableException, UndefinedPartitionMethodException, UndefinedPartitionException{
         throw new UnsupportedOperationException();
       }
     
    +  @Override
    +  public boolean existPartitions(String databaseName, String tableName) throws UndefinedDatabaseException,
    +    UndefinedTableException, UndefinedPartitionMethodException {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByAlgebra(PartitionsByAlgebraProto request)
    +    throws UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException,
    +    UndefinedOperatorException {
    +    throw new UndefinedOperatorException("getPartitionsByAlgebra");
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByDirectSql(PartitionsByDirectSqlProto request)
    +      throws UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException,
    +      UndefinedOperatorException {
    +
    +    HiveCatalogStoreClientPool.HiveCatalogStoreClient client = null;
    +    List<PartitionDescProto> partitions = null;
    +
    +    try {
    +      String databaseName = request.getDatabaseName();
    +      String tableName = request.getTableName();
    +
    +      partitions = TUtil.newList();
    +
    +      client = clientPool.getClient();
    +
    +      List<Partition> hivePartitions = client.getHiveClient().listPartitionsByFilter(databaseName, tableName
    +        , request.getDirectSql(), (short) -1);
    +
    +      for (Partition hivePartition : hivePartitions) {
    +        CatalogProtos.PartitionDescProto.Builder builder = CatalogProtos.PartitionDescProto.newBuilder();
    +        builder.setPath(hivePartition.getSd().getLocation());
    +
    +        int startIndex = hivePartition.getSd().getLocation().indexOf(tableName) + tableName.length();
    +        String partitionName = hivePartition.getSd().getLocation().substring(startIndex+1);
    +        builder.setPartitionName(partitionName);
    +
    +        partitions.add(builder.build());
    +      }
    +    } catch (NoSuchObjectException e) {
    +      return null;
    --- End diff --
    
    This should throw TajoInternalError.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/624#issuecomment-127275276
  
    I found that this patch run as expected with HiveCatalogStore and MySQLStore on my testing cluster. And simple query response had been reported as following:
    
    * Table schema:
    ```
    create table partitioned_lineitem (L_SUPPKEY bigint, L_LINENUMBER bigint,
    L_QUANTITY double, L_EXTENDEDPRICE double, L_DISCOUNT double, L_TAX double, L_RETURNFLAG text, L_LINESTATUS text,
    L_SHIPDATE text, L_COMMITDATE text, L_RECEIPTDATE text, L_SHIPINSTRUCT text, L_SHIPMODE text, L_COMMENT text)
    partition by column (L_ORDERKEY bigint, L_PARTKEY bigint)
    ```
    * Partition numbers: 100,000
    * Select statement: select * from partitioned_lineitem limit 10;
    * Response time:
    - previous rewriter: 15 ~ 16 sec
    - improved rewriter: 12 ~ 13 sec
    
    Honestly, I didn't implement unit test cases for executing queries because current almost tajo unit cases operate on MemStore. If we apply DerbyStore to some unit test cases for physical operator, we would make a lot of effort. It seems not to be the scope of this patch. So, I just added unit test cases for verifying direct sql. But if you want to test this patch with build commands, you can test with `-Dtajo.catalog.store.class` parameter as following:
    ```
    mvn clean install -Pparallel-test -DLOG_LEVEL=WARN -Dmaven.fork.count=2 -Dtajo.catalog.store.class=org.apache.tajo.catalog.store.DerbyStore
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39008520
  
    --- Diff: tajo-catalog/tajo-catalog-drivers/tajo-hive/src/main/java/org/apache/tajo/catalog/store/HiveCatalogStore.java ---
    @@ -1052,7 +1122,7 @@ public void addPartitions(String databaseName, String tableName, List<CatalogPro
           }
     
           if (addPartitions.size() > 0) {
    -        client.getHiveClient().add_partitions(addPartitions, true, true);
    +        List<Partition> results = client.getHiveClient().add_partitions(addPartitions, true, true);
    --- End diff --
    
    ``List<Partition> results`` is not used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r38730473
  
    --- Diff: tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/CatalogService.java ---
    @@ -162,7 +162,19 @@ PartitionDescProto getPartition(String databaseName, String tableName, String pa
           throws UndefinedPartitionException, UndefinedPartitionMethodException, UndefinedDatabaseException,
           UndefinedTableException;
     
    -  List<PartitionDescProto> getPartitions(String databaseName, String tableName);
    +  List<PartitionDescProto> getPartitions(String databaseName, String tableName) throws UndefinedDatabaseException,
    --- End diff --
    
    It would be better if ``getPartitions()`` is renamed to ``getAllPartitions()``.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r38730495
  
    --- Diff: tajo-catalog/tajo-catalog-client/src/main/java/org/apache/tajo/catalog/AbstractCatalogClient.java ---
    @@ -428,13 +429,12 @@ public final PartitionDescProto getPartition(final String databaseName, final St
       }
     
       @Override
    -  public final List<PartitionDescProto> getPartitions(final String databaseName, final String tableName) {
    +  public final List<PartitionDescProto> getPartitions(final String databaseName, final String tableName) throws
    --- End diff --
    
    It would be better if ``getPartitions()`` is renamed to ``getAllPartitions()``.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39008058
  
    --- Diff: tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/CatalogService.java ---
    @@ -162,7 +162,17 @@ PartitionDescProto getPartition(String databaseName, String tableName, String pa
           throws UndefinedPartitionException, UndefinedPartitionMethodException, UndefinedDatabaseException,
           UndefinedTableException;
     
    -  List<PartitionDescProto> getPartitions(String databaseName, String tableName);
    +  List<PartitionDescProto> getAllPartitions(String databaseName, String tableName) throws UndefinedDatabaseException,
    +    UndefinedTableException, UndefinedPartitionMethodException, UndefinedPartitionException, UndefinedOperatorException;
    +
    +  boolean existPartitions(String databaseName, String tableName) throws UndefinedDatabaseException,
    +  UndefinedTableException, UndefinedPartitionMethodException;
    +
    +  List<PartitionDescProto> getPartitionsByAlgebra(PartitionsByAlgebraProto request) throws
    +    UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException;
    +
    +  List<PartitionDescProto> getPartitionsByDirectSql(PartitionsByDirectSqlProto request) throws
    --- End diff --
    
    Its name should be changed into {{PartitionByFilter}} or {{PartitionByPredicate}}. As far as I know, the suffix ``DirectSQL`` is inspired by Hive API. But, the Hive API does predefined filter expression instead of direct sql predicate. Direct SQL is always danger, and this is not our way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r38731587
  
    --- Diff: tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/CatalogServer.java ---
    @@ -1027,6 +1081,136 @@ public GetTablePartitionsResponse getAllPartitions(RpcController controller, Nul
         }
     
         @Override
    +    public GetPartitionsResponse getPartitionsByAlgebra(RpcController controller,
    +      PartitionsByAlgebraProto request) throws ServiceException {
    +      String dbName = request.getDatabaseName();
    +      String tbName = request.getTableName();
    +
    +      try {
    +        // linked meta data do not support partition.
    +        // So, the request that wants to get partitions in this db will be failed.
    +        if (linkedMetadataManager.existsDatabase(dbName)) {
    +          return GetPartitionsResponse.newBuilder().setState(errUndefinedPartitionMethod(tbName)).build();
    +        }
    +      } catch (Throwable t) {
    +        printStackTraceIfError(LOG, t);
    +        return GetPartitionsResponse.newBuilder()
    +          .setState(returnError(t))
    +          .build();
    +      }
    +
    +      if (metaDictionary.isSystemDatabase(dbName)) {
    +        return GetPartitionsResponse.newBuilder().setState(errUndefinedPartitionMethod(tbName)).build();
    +      }
    +
    +      rlock.lock();
    +      try {
    +        boolean contain;
    +
    +        contain = store.existDatabase(dbName);
    +        if (contain) {
    +          contain = store.existTable(dbName, tbName);
    +          if (contain) {
    +
    +            if (store.existPartitionMethod(dbName, tbName)) {
    +              GetPartitionsResponse.Builder builder = GetPartitionsResponse.newBuilder();
    +              List<PartitionDescProto> partitions = store.getPartitionsByAlgebra(request);
    +              builder.addAllPartition(partitions);
    +              builder.setState(OK);
    +              return builder.build();
    +            } else {
    +              return GetPartitionsResponse.newBuilder()
    +                .setState(errUndefinedPartitionMethod(tbName))
    +                .build();
    +            }
    +          } else {
    +            return GetPartitionsResponse.newBuilder()
    +              .setState(errUndefinedTable(tbName))
    +              .build();
    +          }
    +        } else {
    +          return GetPartitionsResponse.newBuilder()
    +            .setState(errUndefinedDatabase(dbName))
    +            .build();
    +        }
    +      } catch (Throwable t) {
    +        printStackTraceIfError(LOG, t);
    +
    +        return GetPartitionsResponse.newBuilder()
    +            .setState(returnError(t))
    +            .build();
    +
    +      } finally {
    +        rlock.unlock();
    +      }
    +    }
    +
    +    @Override
    +    public GetPartitionsResponse getPartitionsByDirectSql(RpcController controller,
    +                                                 PartitionsByDirectSqlProto request) throws ServiceException {
    +      String dbName = request.getDatabaseName();
    +      String tbName = request.getTableName();
    +
    +      try {
    +        // linked meta data do not support partition.
    +        // So, the request that wants to get partitions in this db will be failed.
    +        if (linkedMetadataManager.existsDatabase(dbName)) {
    +          return GetPartitionsResponse.newBuilder().setState(errUndefinedPartitionMethod(tbName)).build();
    +        }
    +      } catch (Throwable t) {
    +        printStackTraceIfError(LOG, t);
    +        return GetPartitionsResponse.newBuilder()
    +          .setState(returnError(t))
    +          .build();
    +      }
    +
    +      if (metaDictionary.isSystemDatabase(dbName)) {
    +        return GetPartitionsResponse.newBuilder().setState(errUndefinedPartitionMethod(tbName)).build();
    +      }
    +
    +      rlock.lock();
    +      try {
    +        boolean contain;
    +
    +        contain = store.existDatabase(dbName);
    +        if (contain) {
    --- End diff --
    
    These tests can be improved like other methods. Please refer to existsDatabase() method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r38730647
  
    --- Diff: tajo-catalog/tajo-catalog-drivers/tajo-hive/src/main/java/org/apache/tajo/catalog/store/HiveCatalogStore.java ---
    @@ -846,10 +855,65 @@ public boolean existPartitionMethod(String databaseName, String tableName) throw
     
       @Override
       public List<CatalogProtos.PartitionDescProto> getPartitions(String databaseName,
    -                                                         String tableName) {
    +                                                         String tableName) throws UndefinedDatabaseException,
    +    UndefinedTableException, UndefinedPartitionMethodException, UndefinedPartitionException{
         throw new UnsupportedOperationException();
    --- End diff --
    
    UnsupportedException is proper here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39008262
  
    --- Diff: tajo-catalog/tajo-catalog-drivers/tajo-hive/src/main/java/org/apache/tajo/catalog/store/HiveCatalogStore.java ---
    @@ -845,11 +854,72 @@ public boolean existPartitionMethod(String databaseName, String tableName) throw
       }
     
       @Override
    -  public List<CatalogProtos.PartitionDescProto> getPartitions(String databaseName,
    -                                                         String tableName) {
    -    throw new UnsupportedOperationException();
    +  public List<CatalogProtos.PartitionDescProto> getPartitions(String databaseName, String tableName) {
    +    throw new TajoRuntimeException(new UnsupportedException("getPartitions"));
    --- End diff --
    
    UnsupportedException automatically generates the error message including the method and class if it is used with a default constructor. SO, it should be ``new UnsupportedException()``.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Make partition pruning based on cata...

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/624#issuecomment-139737937
  
    Thanks @hyunsik 
    
    I reflected your comments and left some comments. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39002026
  
    --- Diff: tajo-catalog/tajo-catalog-client/src/main/java/org/apache/tajo/catalog/AbstractCatalogClient.java ---
    @@ -446,6 +450,52 @@ public final PartitionDescProto getPartition(final String databaseName, final St
       }
     
       @Override
    +  public boolean existPartitions(String databaseName, String tableName) {
    +    try {
    +      final BlockingInterface stub = getStub();
    +      final TableIdentifierProto request = buildTableIdentifier(databaseName, tableName);
    +      return isSuccess(stub.existPartitionsByTableName(null, request));
    +    } catch (ServiceException e) {
    +      throw new RuntimeException(e);
    +    }
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByAlgebra(PartitionsByAlgebraProto request) throws
    +    UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException {
    +    try {
    +      final BlockingInterface stub = getStub();
    +      GetPartitionsResponse response = stub.getPartitionsByAlgebra(null, request);
    +
    +      throwsIfThisError(response.getState(), UndefinedDatabaseException.class);
    +      throwsIfThisError(response.getState(), UndefinedTableException.class);
    +      throwsIfThisError(response.getState(), UndefinedPartitionMethodException.class);
    +      ensureOk(response.getState());
    +    return response.getPartitionList();
    --- End diff --
    
    mis indent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39009365
  
    --- Diff: tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/AbstractDBStore.java ---
    @@ -2132,7 +2137,8 @@ private void setPartitionKeys(int pid, PartitionDescProto.Builder partitionDesc)
     
       @Override
       public List<PartitionDescProto> getPartitions(String databaseName, String tableName)
    --- End diff --
    
    It should be getAllPartitions for naming consistency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Make partition pruning based on cata...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39270987
  
    --- Diff: tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/AbstractDBStore.java ---
    @@ -2172,6 +2178,252 @@ private void setPartitionKeys(int pid, PartitionDescProto.Builder partitionDesc)
         return partitions;
       }
     
    +  /**
    +   * Check if list of partitios exist on catalog.
    --- End diff --
    
    ``partitios`` seems to be a typo error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39008270
  
    --- Diff: tajo-catalog/tajo-catalog-drivers/tajo-hive/src/main/java/org/apache/tajo/catalog/store/HiveCatalogStore.java ---
    @@ -845,11 +854,72 @@ public boolean existPartitionMethod(String databaseName, String tableName) throw
       }
     
       @Override
    -  public List<CatalogProtos.PartitionDescProto> getPartitions(String databaseName,
    -                                                         String tableName) {
    -    throw new UnsupportedOperationException();
    +  public List<CatalogProtos.PartitionDescProto> getPartitions(String databaseName, String tableName) {
    +    throw new TajoRuntimeException(new UnsupportedException("getPartitions"));
    +  }
    +
    +  @Override
    +  public boolean existPartitions(String databaseName, String tableName) {
    +    throw new TajoRuntimeException(new UnsupportedException("existPartitions"));
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByAlgebra(PartitionsByAlgebraProto request) {
    +    throw new TajoRuntimeException(new UnsupportedException("getPartitionsByAlgebra"));
    --- End diff --
    
    UnsupportedException automatically generates the error message including the method and class if it is used with a default constructor. SO, it should be new UnsupportedException().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r38731577
  
    --- Diff: tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/CatalogServer.java ---
    @@ -1027,6 +1081,136 @@ public GetTablePartitionsResponse getAllPartitions(RpcController controller, Nul
         }
     
         @Override
    +    public GetPartitionsResponse getPartitionsByAlgebra(RpcController controller,
    +      PartitionsByAlgebraProto request) throws ServiceException {
    +      String dbName = request.getDatabaseName();
    +      String tbName = request.getTableName();
    +
    +      try {
    +        // linked meta data do not support partition.
    +        // So, the request that wants to get partitions in this db will be failed.
    +        if (linkedMetadataManager.existsDatabase(dbName)) {
    +          return GetPartitionsResponse.newBuilder().setState(errUndefinedPartitionMethod(tbName)).build();
    +        }
    +      } catch (Throwable t) {
    +        printStackTraceIfError(LOG, t);
    +        return GetPartitionsResponse.newBuilder()
    +          .setState(returnError(t))
    +          .build();
    +      }
    +
    +      if (metaDictionary.isSystemDatabase(dbName)) {
    +        return GetPartitionsResponse.newBuilder().setState(errUndefinedPartitionMethod(tbName)).build();
    +      }
    +
    +      rlock.lock();
    +      try {
    +        boolean contain;
    +
    +        contain = store.existDatabase(dbName);
    +        if (contain) {
    --- End diff --
    
    These tests can be improved like other methods. Please refer to existsDatabase() method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Make partition pruning based on cata...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39271781
  
    --- Diff: tajo-plan/src/main/java/org/apache/tajo/plan/util/ScanQualConverter.java ---
    @@ -0,0 +1,297 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.tajo.plan.util;
    +
    +import org.apache.tajo.algebra.*;
    +import org.apache.tajo.datum.DateDatum;
    +import org.apache.tajo.datum.Datum;
    +import org.apache.tajo.datum.TimeDatum;
    +import org.apache.tajo.datum.TimestampDatum;
    +import org.apache.tajo.plan.expr.*;
    +
    +import java.util.Stack;
    +
    +/**
    + * This converts Quals of ScanNode to Algebra expressions.
    + *
    + */
    +public class ScanQualConverter extends SimpleEvalNodeVisitor<Object> {
    --- End diff --
    
    I'd like to recommend renaming it to ``EvalNodeToExprConverter``. It appears to be more general in terms of converting any EvalNode tree into Expr tree.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Make partition pruning based on cata...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39270682
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/exception/ReturnStateUtil.java ---
    @@ -161,6 +161,10 @@ public static ReturnState errUndefinedPartition(String partitionName) {
         return returnError(ResultCode.UNDEFINED_PARTITION, partitionName);
       }
     
    +  public static ReturnState errPartitionNotFund(String tbName) {
    --- End diff --
    
    If it is not used, you may do not need to add this method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39012228
  
    --- Diff: tajo-plan/src/main/java/org/apache/tajo/plan/rewrite/rules/PartitionedTableRewriter.java ---
    @@ -118,32 +121,264 @@ public String toString() {
        * @return
        * @throws IOException
        */
    -  private Path [] findFilteredPaths(OverridableConf queryContext, Schema partitionColumns, EvalNode [] conjunctiveForms,
    -                                    Path tablePath)
    -      throws IOException {
    +  private Path [] findFilteredPaths(OverridableConf queryContext, String tableName,
    +                                    Schema partitionColumns, EvalNode [] conjunctiveForms, Path tablePath)
    +      throws IOException, UndefinedDatabaseException, UndefinedTableException,
    +      UndefinedPartitionMethodException, UndefinedOperatorException {
     
    +    Path [] filteredPaths = null;
         FileSystem fs = tablePath.getFileSystem(queryContext.getConf());
    +    String [] splits = CatalogUtil.splitFQTableName(tableName);
    +    List<PartitionDescProto> partitions = null;
     
    -    PathFilter [] filters;
    -    if (conjunctiveForms == null) {
    -      filters = buildAllAcceptingPathFilters(partitionColumns);
    -    } else {
    -      filters = buildPathFiltersForAllLevels(partitionColumns, conjunctiveForms);
    +    String store = queryContext.getConf().get(CatalogConstants.STORE_CLASS);
    +
    +    try {
    +      // HiveCatalogStore provides list of table partitions with where clause because hive just provides api using
    +      // the filter string. So, this rewriter need to differentiate HiveCatalogStore and other catalogs.
    +      if (store.equals("org.apache.tajo.catalog.store.HiveCatalogStore")) {
    --- End diff --
    
    A RewriteRule should not consider its catalog storage type. It does not make sense. CatalogService should just take an algebra, and HiveCatalogStore should generates a filter expression string representation from the algebra.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r36580081
  
    --- Diff: tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/AbstractDBStore.java ---
    @@ -2114,6 +2114,127 @@ private void setPartitionKeys(int pid, PartitionDescProto.Builder partitionDesc)
       }
     
       @Override
    +  public boolean existPartitions(String databaseName, String tableName) throws CatalogException {
    +    Connection conn = null;
    +    ResultSet res = null;
    +    PreparedStatement pstmt = null;
    +    boolean result = false;
    +
    +    try {
    +      String sql = "SELECT COUNT(*) CNT FROM "
    +        + TB_PARTTIONS +" WHERE " + COL_TABLES_PK + " = ?  ";
    +
    +      if (LOG.isDebugEnabled()) {
    +        LOG.debug(sql);
    +      }
    +
    +      int databaseId = getDatabaseId(databaseName);
    +      int tableId = getTableId(databaseId, databaseName, tableName);
    +
    +      conn = getConnection();
    +      pstmt = conn.prepareStatement(sql);
    +      pstmt.setInt(1, tableId);
    +      res = pstmt.executeQuery();
    +
    +      if (res.next()) {
    +        if (res.getInt("CNT") > 0) {
    +          result = true;
    +        }
    +      }
    +    } catch (SQLException se) {
    +      throw new TajoInternalError(se);
    +    } finally {
    +      CatalogUtil.closeQuietly(pstmt, res);
    +    }
    +    return result;
    +  }
    +
    +  @Override
    +  public List<TablePartitionProto> getPartitionsByDirectSql(GetPartitionsWithDirectSQLRequest request)
    --- End diff --
    
    Catalog API is public. In terms of this point, this API design does not make sense due to the following problems:
     * Users should know table schemas, their relationship, which underlying persistent storage.
      * Users should make SQL statements for all persistent storages.
     * This approach is very prone to SQL injection.
    
    The solution is that you can use some intermediate representation, using string or java objects. They should be transformed into SQL statements depending on persistent storages in each catalog driver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r38730836
  
    --- Diff: tajo-catalog/tajo-catalog-drivers/tajo-hive/src/main/java/org/apache/tajo/catalog/store/HiveCatalogStore.java ---
    @@ -846,10 +855,65 @@ public boolean existPartitionMethod(String databaseName, String tableName) throw
     
       @Override
       public List<CatalogProtos.PartitionDescProto> getPartitions(String databaseName,
    -                                                         String tableName) {
    +                                                         String tableName) throws UndefinedDatabaseException,
    +    UndefinedTableException, UndefinedPartitionMethodException, UndefinedPartitionException{
         throw new UnsupportedOperationException();
       }
     
    +  @Override
    +  public boolean existPartitions(String databaseName, String tableName) throws UndefinedDatabaseException,
    +    UndefinedTableException, UndefinedPartitionMethodException {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByAlgebra(PartitionsByAlgebraProto request)
    +    throws UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException,
    +    UndefinedOperatorException {
    +    throw new UndefinedOperatorException("getPartitionsByAlgebra");
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByDirectSql(PartitionsByDirectSqlProto request)
    +      throws UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException,
    +      UndefinedOperatorException {
    --- End diff --
    
    UndefinedOperator is not proper here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r38730634
  
    --- Diff: tajo-catalog/tajo-catalog-drivers/tajo-hive/src/main/java/org/apache/tajo/catalog/store/HiveCatalogStore.java ---
    @@ -846,10 +855,65 @@ public boolean existPartitionMethod(String databaseName, String tableName) throw
     
       @Override
       public List<CatalogProtos.PartitionDescProto> getPartitions(String databaseName,
    -                                                         String tableName) {
    +                                                         String tableName) throws UndefinedDatabaseException,
    +    UndefinedTableException, UndefinedPartitionMethodException, UndefinedPartitionException{
         throw new UnsupportedOperationException();
       }
     
    +  @Override
    +  public boolean existPartitions(String databaseName, String tableName) throws UndefinedDatabaseException,
    +    UndefinedTableException, UndefinedPartitionMethodException {
    +    throw new UnsupportedOperationException();
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByAlgebra(PartitionsByAlgebraProto request)
    +    throws UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException,
    +    UndefinedOperatorException {
    +    throw new UndefinedOperatorException("getPartitionsByAlgebra");
    --- End diff --
    
    UnsupportedException is proper here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39007654
  
    --- Diff: tajo-catalog/tajo-catalog-client/src/main/java/org/apache/tajo/catalog/AbstractCatalogClient.java ---
    @@ -428,15 +429,18 @@ public final PartitionDescProto getPartition(final String databaseName, final St
       }
     
       @Override
    -  public final List<PartitionDescProto> getPartitions(final String databaseName, final String tableName) {
    +  public final List<PartitionDescProto> getAllPartitions(final String databaseName, final String tableName) throws
    +    UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException,
    +    UndefinedPartitionException {
    --- End diff --
    
    It retrieves all partitions. So it may be inproper to throw ``UndefinedPartitionException``, which should be thrown in the case where there is no partition corresponding to a specific partition key.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r38730778
  
    --- Diff: tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/AbstractDBStore.java ---
    @@ -2173,6 +2179,244 @@ private void setPartitionKeys(int pid, PartitionDescProto.Builder partitionDesc)
       }
     
       @Override
    +  public boolean existPartitions(String databaseName, String tableName) throws UndefinedDatabaseException,
    +    UndefinedTableException, UndefinedPartitionMethodException {
    +    Connection conn = null;
    +    ResultSet res = null;
    +    PreparedStatement pstmt = null;
    +    boolean result = false;
    +
    +    try {
    +      String sql = "SELECT COUNT(*) CNT FROM "
    +        + TB_PARTTIONS +" WHERE " + COL_TABLES_PK + " = ?  ";
    +
    +      if (LOG.isDebugEnabled()) {
    +        LOG.debug(sql);
    +      }
    +
    +      int databaseId = getDatabaseId(databaseName);
    +      int tableId = getTableId(databaseId, databaseName, tableName);
    +
    +      conn = getConnection();
    +      pstmt = conn.prepareStatement(sql);
    +      pstmt.setInt(1, tableId);
    +      res = pstmt.executeQuery();
    +
    +      if (res.next()) {
    +        if (res.getInt("CNT") > 0) {
    +          result = true;
    +        }
    +      }
    +    } catch (SQLException se) {
    +      throw new TajoInternalError(se);
    +    } finally {
    +      CatalogUtil.closeQuietly(pstmt, res);
    +    }
    +    return result;
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByDirectSql(PartitionsByDirectSqlProto request)
    +    throws UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException,
    +    UndefinedOperatorException {
    +    throw new UndefinedOperatorException("getPartitionsByDirectSql");
    --- End diff --
    
    It should be UnsupportedException.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/624#issuecomment-139205154
  
    @hyunsik 
    
    Thank you for your detailed review and I've just reflected your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r39009348
  
    --- Diff: tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/AbstractDBStore.java ---
    @@ -2173,6 +2179,248 @@ private void setPartitionKeys(int pid, PartitionDescProto.Builder partitionDesc)
       }
     
       @Override
    +  public boolean existPartitions(String databaseName, String tableName) throws UndefinedDatabaseException,
    --- End diff --
    
    It would be better if it is renamed into ``isPartitionedTable``.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/624#issuecomment-137740521
  
    @hyunsik 
    
    Thank your for your detailed review.
    I've just updated this PR using your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/624#issuecomment-137522218
  
    I've finished tests successfully for this PR as following:
    
    * Data: TPC-H benchmark set (scale factor = 1)
    * CatalogStore: MySQLStore, HiveCatalogStore(with apache hive 1.1.0)
    * CTAS for creating partition tables (except partsupp)
    ```
    CREATE TABLE LINEITEM_P ( 
    L_ORDERKEY bigint, L_PARTKEY bigint, L_SUPPKEY bigint, L_LINENUMBER bigint, L_QUANTITY double, L_EXTENDEDPRICE double, L_DISCOUNT double, L_TAX double, L_RETURNFLAG text, L_LINESTATUS text,L_COMMITDATE date, L_RECEIPTDATE date, L_SHIPINSTRUCT text, L_SHIPMODE text, L_COMMENT text) 
    USING TEXT WITH ('text.delimiter'='|')
    PARTITION BY COLUMN (L_SHIPDATE date) AS 
    SELECT L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_SHIPDATE FROM LINEITEM;
    
    create table customer_p (C_CUSTKEY bigint, C_NAME text, C_ADDRESS text, C_PHONE text, C_ACCTBAL double, C_MKTSEGMENT text, C_COMMENT text) 
    USING TEXT WITH ('text.delimiter'='|')
    PARTITION BY COLUMN (C_NATIONKEY bigint) AS 
    SELECT C_CUSTKEY, C_NAME, C_ADDRESS, C_PHONE, C_ACCTBAL, C_MKTSEGMENT, C_COMMENT, C_NATIONKEY FROM customer;
    
    create  table supplier_p (S_SUPPKEY bigint, S_NAME text, S_ADDRESS text, S_PHONE text, S_ACCTBAL double, S_COMMENT text)
    USING TEXT WITH ('text.delimiter'='|')
    PARTITION BY COLUMN ( S_NATIONKEY bigint) AS 
    SELECT S_SUPPKEY, S_NAME, S_ADDRESS, S_PHONE, S_ACCTBAL, S_COMMENT, S_NATIONKEY FROM supplier;
    
    create  table part_p (P_PARTKEY bigint, P_NAME text, P_MFGR text, P_BRAND text, P_TYPE text, P_CONTAINER text, P_RETAILPRICE double, P_COMMENT text) 
    USING TEXT WITH ('text.delimiter'='|')
    PARTITION BY COLUMN (P_SIZE integer) AS 
    SELECT P_PARTKEY, P_NAME, P_MFGR, P_BRAND, P_TYPE, P_CONTAINER, P_RETAILPRICE, P_COMMENT, P_SIZE FROM part;
    
    create table orders_p (O_ORDERKEY bigint, O_CUSTKEY bigint, O_ORDERSTATUS text, O_TOTALPRICE double, O_ORDERPRIORITY text, O_CLERK text, O_SHIPPRIORITY int, O_COMMENT text)
    USING TEXT WITH ('text.delimiter'='|')
    PARTITION BY COLUMN (O_ORDERDATE date) AS 
    SELECT O_ORDERKEY, O_CUSTKEY, O_ORDERSTATUS, O_TOTALPRICE, O_ORDERPRIORITY, O_CLERK, O_SHIPPRIORITY, O_COMMENT, O_ORDERDATE FROM orders;
    
    create table nation_p (N_NATIONKEY bigint, N_NAME text, N_COMMENT text) 
    USING TEXT WITH ('text.delimiter'='|')
    PARTITION BY COLUMN (N_REGIONKEY bigint) AS 
    SELECT N_NATIONKEY, N_NAME, N_COMMENT, N_REGIONKEY FROM nation;
    
    create table region_p ( R_NAME text,  R_COMMENT text) 
    USING TEXT WITH ('text.delimiter'='|')
    PARTITION BY COLUMN (R_REGIONKEY bigint) AS 
    SELECT R_NAME,  R_COMMENT, R_REGIONKEY FROM region;
    ```
    * Test Queries : TPC-H Q1 ~ Q22
    
    After executing CTAS statements, I fount that list of table partitions had been printed on hive shell. All test results mostly accorded with results of queries for non-partitoned tables in MySQLStore and HiveCatalogStore. But some records for double type record aggregation were slightly different in the nearest whole number, To resolve above problem, we need to provide decimal column type. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1493: Add a method to get partition direct...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/624#discussion_r38730328
  
    --- Diff: tajo-catalog/tajo-catalog-client/src/main/java/org/apache/tajo/catalog/AbstractCatalogClient.java ---
    @@ -446,6 +446,46 @@ public final PartitionDescProto getPartition(final String databaseName, final St
       }
     
       @Override
    +  public boolean existPartitions(String databaseName, String tableName) {
    +    try {
    +      final BlockingInterface stub = getStub();
    +      final TableIdentifierProto request = buildTableIdentifier(databaseName, tableName);
    +      return isSuccess(stub.existPartitionsByTableName(null, request));
    +    } catch (ServiceException e) {
    +      throw new RuntimeException(e);
    +    }
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> getPartitionsByAlgebra(PartitionsByAlgebraProto request) throws
    +    UndefinedDatabaseException, UndefinedTableException, UndefinedPartitionMethodException,
    --- End diff --
    
    To handle such exceptions, you should put throwsIfThisError method for the exceptions. Please refer to https://github.com/apache/tajo/pull/624/files#diff-9ecb2222e062d0cc008fb1c3da61d4b0L483.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---