You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by blrunner <gi...@git.apache.org> on 2015/07/09 17:44:15 UTC

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

GitHub user blrunner opened a pull request:

    https://github.com/apache/tajo/pull/626

    TAJO-1673: Implement recover partitions

    This is still ongoing work. I'll do more tests and update DDLExecutor after committing TAJO-1345.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/blrunner/tajo TAJO-1673

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/626.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #626
    
----
commit c66ab61ae0cf66f42fc44fbdd4394f87ec5b5700
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-07-09T02:55:30Z

    Add msck repair table statment.

commit e3bd88f661722e38667f83b8d6cdafda03934a3b
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-07-09T15:39:55Z

    Implement msck repair table on DDLExecutor.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/626#issuecomment-140945055
  
    Rebased.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/626#discussion_r40112090
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/master/exec/DDLExecutor.java ---
    @@ -529,24 +536,137 @@ public void alterTable(TajoMaster.MasterContext context, final QueryContext quer
             catalog.alterTable(CatalogUtil.addOrDropPartition(qualifiedName, alterTable.getPartitionColumns(),
                 alterTable.getPartitionValues(), alterTable.getLocation(), AlterTableType.DROP_PARTITION));
     
    -        // When dropping partition on an managed table, the data will be delete from file system.
    -        if (!desc.isExternal()) {
    +        // When dropping partition on a table, the data in the table will NOT be deleted from the file system.
    +        // But if PURGE is specified, the partition data will be deleted.
    +        if (alterTable.isPurge()) {
               deletePartitionPath(partitionDescProto);
    -        } else {
    -          // When dropping partition on an external table, the data in the table will NOT be deleted from the file
    -          // system. But if PURGE is specified, the partition data will be deleted.
    -          if (alterTable.isPurge()) {
    -            deletePartitionPath(partitionDescProto);
    -          }
             }
           }
    -
    +      break;
    +    case REPAIR_PARTITION:
    +      repairPartition(context, queryContext, alterTable);
           break;
         default:
           throw new InternalError("alterTable cannot handle such query: \n" + alterTable.toJson());
         }
       }
     
    +  /**
    +   * Run ALTER TABLE table_name REPAIR TABLE  statement.
    +   * This will recovery all partitions which exists on table directory.
    +   *
    +   *
    +   * @param context
    +   * @param queryContext
    +   * @param alterTable
    +   * @throws IOException
    +   */
    +  public void repairPartition(TajoMaster.MasterContext context, final QueryContext queryContext,
    +                         final AlterTableNode alterTable) throws IOException, TajoException {
    +    final CatalogService catalog = context.getCatalog();
    +    final String tableName = alterTable.getTableName();
    +
    +    String databaseName;
    +    String simpleTableName;
    +    if (CatalogUtil.isFQTableName(tableName)) {
    +      String[] split = CatalogUtil.splitFQTableName(tableName);
    +      databaseName = split[0];
    +      simpleTableName = split[1];
    +    } else {
    +      databaseName = queryContext.getCurrentDatabase();
    +      simpleTableName = tableName;
    +    }
    +
    +    if (!catalog.existsTable(databaseName, simpleTableName)) {
    +      throw new UndefinedTableException(alterTable.getTableName());
    +    }
    +
    +    TableDesc tableDesc = catalog.getTableDesc(databaseName, simpleTableName);
    +
    +    if(tableDesc.getPartitionMethod() == null) {
    +      throw new UndefinedPartitionMethodException(simpleTableName);
    +    }
    +
    +    Path tablePath = new Path(tableDesc.getUri());
    +    FileSystem fs = tablePath.getFileSystem(context.getConf());
    +
    +    PartitionMethodDesc partitionDesc = tableDesc.getPartitionMethod();
    +    Schema partitionColumns = new Schema();
    +    for (Column column : partitionDesc.getExpressionSchema().getRootColumns()) {
    +      partitionColumns.addColumn(column);
    +    }
    +
    +    // Get the array of path filter, accepting all partition paths.
    +    PathFilter[] filters = PartitionedTableRewriter.buildAllAcceptingPathFilters(partitionColumns);
    +
    +    // loop from one to the number of partition columns
    +    Path [] filteredPaths = PartitionedTableRewriter.toPathArray(fs.listStatus(tablePath, filters[0]));
    +
    +    // Get all file status matched to a ith level path filter.
    +    for (int i = 1; i < partitionColumns.size(); i++) {
    +      filteredPaths = PartitionedTableRewriter.toPathArray(fs.listStatus(filteredPaths, filters[i]));
    +    }
    +
    +    // Find missing partitions from filesystem
    +    List<PartitionDescProto> existingPartitions = catalog.getPartitions(databaseName, simpleTableName);
    +    List<String> existingPartitionNames = TUtil.newList();
    +    Path existingPartitionPath = null;
    +
    +    for(PartitionDescProto existingPartition : existingPartitions) {
    +      existingPartitionPath = new Path(existingPartition.getPath());
    +      existingPartitionNames.add(existingPartition.getPartitionName());
    +      if (!fs.exists(existingPartitionPath)) {
    +        LOG.info("Partitions missing from Filesystem:" + existingPartition.getPartitionName());
    --- End diff --
    
    I think ```warn``` is the more appropriate logging level.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/626#discussion_r40204949
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/master/exec/DDLExecutor.java ---
    @@ -529,24 +536,137 @@ public void alterTable(TajoMaster.MasterContext context, final QueryContext quer
             catalog.alterTable(CatalogUtil.addOrDropPartition(qualifiedName, alterTable.getPartitionColumns(),
                 alterTable.getPartitionValues(), alterTable.getLocation(), AlterTableType.DROP_PARTITION));
     
    -        // When dropping partition on an managed table, the data will be delete from file system.
    -        if (!desc.isExternal()) {
    +        // When dropping partition on a table, the data in the table will NOT be deleted from the file system.
    +        // But if PURGE is specified, the partition data will be deleted.
    +        if (alterTable.isPurge()) {
               deletePartitionPath(partitionDescProto);
    -        } else {
    -          // When dropping partition on an external table, the data in the table will NOT be deleted from the file
    -          // system. But if PURGE is specified, the partition data will be deleted.
    -          if (alterTable.isPurge()) {
    -            deletePartitionPath(partitionDescProto);
    -          }
             }
           }
    -
    +      break;
    +    case REPAIR_PARTITION:
    +      repairPartition(context, queryContext, alterTable);
           break;
         default:
           throw new InternalError("alterTable cannot handle such query: \n" + alterTable.toJson());
         }
       }
     
    +  /**
    +   * Run ALTER TABLE table_name REPAIR TABLE  statement.
    +   * This will recovery all partitions which exists on table directory.
    +   *
    +   *
    +   * @param context
    +   * @param queryContext
    +   * @param alterTable
    +   * @throws IOException
    +   */
    +  public void repairPartition(TajoMaster.MasterContext context, final QueryContext queryContext,
    +                         final AlterTableNode alterTable) throws IOException, TajoException {
    +    final CatalogService catalog = context.getCatalog();
    +    final String tableName = alterTable.getTableName();
    +
    +    String databaseName;
    +    String simpleTableName;
    +    if (CatalogUtil.isFQTableName(tableName)) {
    +      String[] split = CatalogUtil.splitFQTableName(tableName);
    +      databaseName = split[0];
    +      simpleTableName = split[1];
    +    } else {
    +      databaseName = queryContext.getCurrentDatabase();
    +      simpleTableName = tableName;
    +    }
    +
    +    if (!catalog.existsTable(databaseName, simpleTableName)) {
    +      throw new UndefinedTableException(alterTable.getTableName());
    +    }
    +
    +    TableDesc tableDesc = catalog.getTableDesc(databaseName, simpleTableName);
    +
    +    if(tableDesc.getPartitionMethod() == null) {
    +      throw new UndefinedPartitionMethodException(simpleTableName);
    +    }
    +
    +    Path tablePath = new Path(tableDesc.getUri());
    +    FileSystem fs = tablePath.getFileSystem(context.getConf());
    +
    +    PartitionMethodDesc partitionDesc = tableDesc.getPartitionMethod();
    +    Schema partitionColumns = new Schema();
    +    for (Column column : partitionDesc.getExpressionSchema().getRootColumns()) {
    +      partitionColumns.addColumn(column);
    +    }
    +
    +    // Get the array of path filter, accepting all partition paths.
    +    PathFilter[] filters = PartitionedTableRewriter.buildAllAcceptingPathFilters(partitionColumns);
    +
    +    // loop from one to the number of partition columns
    +    Path [] filteredPaths = PartitionedTableRewriter.toPathArray(fs.listStatus(tablePath, filters[0]));
    +
    +    // Get all file status matched to a ith level path filter.
    +    for (int i = 1; i < partitionColumns.size(); i++) {
    +      filteredPaths = PartitionedTableRewriter.toPathArray(fs.listStatus(filteredPaths, filters[i]));
    +    }
    +
    +    // Find missing partitions from filesystem
    --- End diff --
    
    I misunderstood the below codes. Sorry.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/626#issuecomment-120840158
  
    Could you let me know the full name of msck?
    
    The below expression seems to be more consistent form for a full command, I think.
    ```
    ALTER TABLE [table name] REPAIR PARTITION
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/626#issuecomment-137649891
  
    I've finished partition repair test successfully with MySQLStore and HiveCatalogStore and fixed a few bugs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/626#issuecomment-142608568
  
    +1. Thank you for your work!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/626#issuecomment-137636805
  
    I added codes to check existing partitions from CatalogStore before adding missing partitions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/626#issuecomment-127478487
  
    I applied CatalogStore::addPartitions to DDLExecutor and implemented some unit test cases.
    When I implementing unit test cases, I found that drop partition operate on different way than expected. I think that drop partition need to delete data of table with `PURGE` option. But currently, when dropping partition on managed table without `PURGE` option, all data had been deleted. So, I've fixed this bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/tajo/pull/626


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/626#discussion_r40112331
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/master/exec/DDLExecutor.java ---
    @@ -529,24 +536,137 @@ public void alterTable(TajoMaster.MasterContext context, final QueryContext quer
             catalog.alterTable(CatalogUtil.addOrDropPartition(qualifiedName, alterTable.getPartitionColumns(),
                 alterTable.getPartitionValues(), alterTable.getLocation(), AlterTableType.DROP_PARTITION));
     
    -        // When dropping partition on an managed table, the data will be delete from file system.
    -        if (!desc.isExternal()) {
    +        // When dropping partition on a table, the data in the table will NOT be deleted from the file system.
    +        // But if PURGE is specified, the partition data will be deleted.
    +        if (alterTable.isPurge()) {
               deletePartitionPath(partitionDescProto);
    -        } else {
    -          // When dropping partition on an external table, the data in the table will NOT be deleted from the file
    -          // system. But if PURGE is specified, the partition data will be deleted.
    -          if (alterTable.isPurge()) {
    -            deletePartitionPath(partitionDescProto);
    -          }
             }
           }
    -
    +      break;
    +    case REPAIR_PARTITION:
    +      repairPartition(context, queryContext, alterTable);
           break;
         default:
           throw new InternalError("alterTable cannot handle such query: \n" + alterTable.toJson());
         }
       }
     
    +  /**
    +   * Run ALTER TABLE table_name REPAIR TABLE  statement.
    +   * This will recovery all partitions which exists on table directory.
    +   *
    +   *
    +   * @param context
    +   * @param queryContext
    +   * @param alterTable
    +   * @throws IOException
    +   */
    +  public void repairPartition(TajoMaster.MasterContext context, final QueryContext queryContext,
    +                         final AlterTableNode alterTable) throws IOException, TajoException {
    +    final CatalogService catalog = context.getCatalog();
    +    final String tableName = alterTable.getTableName();
    +
    +    String databaseName;
    +    String simpleTableName;
    +    if (CatalogUtil.isFQTableName(tableName)) {
    +      String[] split = CatalogUtil.splitFQTableName(tableName);
    +      databaseName = split[0];
    +      simpleTableName = split[1];
    +    } else {
    +      databaseName = queryContext.getCurrentDatabase();
    +      simpleTableName = tableName;
    +    }
    +
    +    if (!catalog.existsTable(databaseName, simpleTableName)) {
    +      throw new UndefinedTableException(alterTable.getTableName());
    +    }
    +
    +    TableDesc tableDesc = catalog.getTableDesc(databaseName, simpleTableName);
    +
    +    if(tableDesc.getPartitionMethod() == null) {
    +      throw new UndefinedPartitionMethodException(simpleTableName);
    +    }
    +
    +    Path tablePath = new Path(tableDesc.getUri());
    +    FileSystem fs = tablePath.getFileSystem(context.getConf());
    +
    +    PartitionMethodDesc partitionDesc = tableDesc.getPartitionMethod();
    +    Schema partitionColumns = new Schema();
    --- End diff --
    
    Why did you create a copy of the partition schema?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/626#issuecomment-120847030
  
    Hi @hyunsik 
    
    I'just referred hive's policy. But your suggestion looks more reasonable now. 
    I'll update the patch tonight. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/626#discussion_r40114921
  
    --- Diff: tajo-docs/src/main/sphinx/sql_language/alter_table.rst ---
    @@ -96,4 +96,31 @@ You can use ``ALTER TABLE ADD PARTITION`` to add partitions to a table. The loca
       ALTER TABLE table1 DROP PARTITION (col1 = '2015' , col2 = '01', col3 = '11' )
       ALTER TABLE table1 DROP PARTITION (col1 = 'TAJO' ) PURGE
     
    -You can use ``ALTER TABLE DROP PARTITION`` to drop a partition for a table. This removes the data for a managed table and this doesn't remove the data for an external table. But if ``PURGE`` is specified for an external table, the partition data will be removed. The metadata is completely lost in all cases. An error is thrown if the partition for the table doesn't exists. You can use ``IF EXISTS`` to skip the error.
    +You can use ``ALTER TABLE DROP PARTITION`` to drop a partition for a table. This doesn't remove the data for a table. But if ``PURGE`` is specified, the partition data will be removed. The metadata is completely lost in all cases. An error is thrown if the partition for the table doesn't exists. You can use ``IF EXISTS`` to skip the error.
    +
    +========================
    +REPAIR PARTITION
    +========================
    +
    +Tajo stores a list of partitions for each table in its catalog. If partitions are manually added to the distributed file system, the metastore is not aware of these partitions. Running the ``ALTER TABLE REPAIR PARTITION`` statement ensures that the tables are properly populated.
    +
    +*Synopsis*
    +
    +.. code-block:: sql
    +
    +  ALTER TABLE <table_name> REPAIR PARTITION
    +
    +*Examples*
    +
    +.. code-block:: sql
    +
    +  ALTER TABLE student REPAIR PARTITION;
    --- End diff --
    
    I think this example is so obvious, and thus unnecessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/626#discussion_r40110721
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/master/exec/DDLExecutor.java ---
    @@ -23,29 +23,36 @@
     import org.apache.hadoop.fs.FileStatus;
     import org.apache.hadoop.fs.FileSystem;
     import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.fs.PathFilter;
     import org.apache.tajo.algebra.AlterTableOpType;
     import org.apache.tajo.algebra.AlterTablespaceSetType;
     import org.apache.tajo.annotation.Nullable;
     import org.apache.tajo.catalog.*;
    +import org.apache.tajo.catalog.partition.PartitionMethodDesc;
     import org.apache.tajo.catalog.proto.CatalogProtos;
     import org.apache.tajo.catalog.proto.CatalogProtos.AlterTablespaceProto;
     import org.apache.tajo.catalog.proto.CatalogProtos.PartitionKeyProto;
    +import org.apache.tajo.catalog.proto.CatalogProtos.PartitionDescProto;
     import org.apache.tajo.conf.TajoConf;
     import org.apache.tajo.engine.query.QueryContext;
     import org.apache.tajo.exception.*;
     import org.apache.tajo.master.TajoMaster;
     import org.apache.tajo.plan.LogicalPlan;
     import org.apache.tajo.plan.logical.*;
    +import org.apache.tajo.plan.rewrite.rules.PartitionedTableRewriter;
     import org.apache.tajo.plan.util.PlannerUtil;
     import org.apache.tajo.storage.FileTablespace;
     import org.apache.tajo.storage.StorageUtil;
     import org.apache.tajo.storage.Tablespace;
     import org.apache.tajo.storage.TablespaceManager;
     import org.apache.tajo.util.Pair;
    +import org.apache.tajo.util.StringUtils;
    +import org.apache.tajo.util.TUtil;
     
     import java.io.IOException;
     import java.util.ArrayList;
     import java.util.List;
    +import java.util.Set;
    --- End diff --
    
    Unused import.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/626#discussion_r40113680
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/master/exec/DDLExecutor.java ---
    @@ -529,24 +536,137 @@ public void alterTable(TajoMaster.MasterContext context, final QueryContext quer
             catalog.alterTable(CatalogUtil.addOrDropPartition(qualifiedName, alterTable.getPartitionColumns(),
                 alterTable.getPartitionValues(), alterTable.getLocation(), AlterTableType.DROP_PARTITION));
     
    -        // When dropping partition on an managed table, the data will be delete from file system.
    -        if (!desc.isExternal()) {
    +        // When dropping partition on a table, the data in the table will NOT be deleted from the file system.
    +        // But if PURGE is specified, the partition data will be deleted.
    +        if (alterTable.isPurge()) {
               deletePartitionPath(partitionDescProto);
    -        } else {
    -          // When dropping partition on an external table, the data in the table will NOT be deleted from the file
    -          // system. But if PURGE is specified, the partition data will be deleted.
    -          if (alterTable.isPurge()) {
    -            deletePartitionPath(partitionDescProto);
    -          }
             }
           }
    -
    +      break;
    +    case REPAIR_PARTITION:
    +      repairPartition(context, queryContext, alterTable);
           break;
         default:
           throw new InternalError("alterTable cannot handle such query: \n" + alterTable.toJson());
         }
       }
     
    +  /**
    +   * Run ALTER TABLE table_name REPAIR TABLE  statement.
    +   * This will recovery all partitions which exists on table directory.
    +   *
    +   *
    +   * @param context
    +   * @param queryContext
    +   * @param alterTable
    +   * @throws IOException
    +   */
    +  public void repairPartition(TajoMaster.MasterContext context, final QueryContext queryContext,
    +                         final AlterTableNode alterTable) throws IOException, TajoException {
    +    final CatalogService catalog = context.getCatalog();
    +    final String tableName = alterTable.getTableName();
    +
    +    String databaseName;
    +    String simpleTableName;
    +    if (CatalogUtil.isFQTableName(tableName)) {
    +      String[] split = CatalogUtil.splitFQTableName(tableName);
    +      databaseName = split[0];
    +      simpleTableName = split[1];
    +    } else {
    +      databaseName = queryContext.getCurrentDatabase();
    +      simpleTableName = tableName;
    +    }
    +
    +    if (!catalog.existsTable(databaseName, simpleTableName)) {
    +      throw new UndefinedTableException(alterTable.getTableName());
    +    }
    +
    +    TableDesc tableDesc = catalog.getTableDesc(databaseName, simpleTableName);
    +
    +    if(tableDesc.getPartitionMethod() == null) {
    +      throw new UndefinedPartitionMethodException(simpleTableName);
    +    }
    +
    +    Path tablePath = new Path(tableDesc.getUri());
    +    FileSystem fs = tablePath.getFileSystem(context.getConf());
    +
    +    PartitionMethodDesc partitionDesc = tableDesc.getPartitionMethod();
    +    Schema partitionColumns = new Schema();
    +    for (Column column : partitionDesc.getExpressionSchema().getRootColumns()) {
    +      partitionColumns.addColumn(column);
    +    }
    +
    +    // Get the array of path filter, accepting all partition paths.
    +    PathFilter[] filters = PartitionedTableRewriter.buildAllAcceptingPathFilters(partitionColumns);
    +
    +    // loop from one to the number of partition columns
    +    Path [] filteredPaths = PartitionedTableRewriter.toPathArray(fs.listStatus(tablePath, filters[0]));
    +
    +    // Get all file status matched to a ith level path filter.
    +    for (int i = 1; i < partitionColumns.size(); i++) {
    +      filteredPaths = PartitionedTableRewriter.toPathArray(fs.listStatus(filteredPaths, filters[i]));
    +    }
    +
    +    // Find missing partitions from filesystem
    +    List<PartitionDescProto> existingPartitions = catalog.getPartitions(databaseName, simpleTableName);
    +    List<String> existingPartitionNames = TUtil.newList();
    +    Path existingPartitionPath = null;
    +
    +    for(PartitionDescProto existingPartition : existingPartitions) {
    +      existingPartitionPath = new Path(existingPartition.getPath());
    +      existingPartitionNames.add(existingPartition.getPartitionName());
    +      if (!fs.exists(existingPartitionPath)) {
    +        LOG.info("Partitions missing from Filesystem:" + existingPartition.getPartitionName());
    +      }
    +    }
    +
    +    // Find missing partitions from CatalogStore
    +    List<PartitionDescProto> targetPartitions = TUtil.newList();
    +    for(Path filteredPath : filteredPaths) {
    +      PartitionDescProto targetPartition = getPartitionDesc(simpleTableName, filteredPath);
    +      if (!existingPartitionNames.contains(targetPartition.getPartitionName())) {
    +        LOG.info("Partitions not in CatalogStore:" + targetPartition.getPartitionName());
    +        targetPartitions.add(targetPartition);
    +      }
    +    }
    +
    +    catalog.addPartitions(databaseName, simpleTableName, targetPartitions, true);
    +
    +    for(PartitionDescProto targetPartition: targetPartitions) {
    +      LOG.info("Repair: Added partition to CatalogStore " + tableName + ":" + targetPartition.getPartitionName());
    --- End diff --
    
    This will emit too many logs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/626#discussion_r40111028
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/master/exec/DDLExecutor.java ---
    @@ -529,24 +536,137 @@ public void alterTable(TajoMaster.MasterContext context, final QueryContext quer
             catalog.alterTable(CatalogUtil.addOrDropPartition(qualifiedName, alterTable.getPartitionColumns(),
                 alterTable.getPartitionValues(), alterTable.getLocation(), AlterTableType.DROP_PARTITION));
     
    -        // When dropping partition on an managed table, the data will be delete from file system.
    -        if (!desc.isExternal()) {
    +        // When dropping partition on a table, the data in the table will NOT be deleted from the file system.
    +        // But if PURGE is specified, the partition data will be deleted.
    --- End diff --
    
    How about changing the comment to as follows?
    ```When dropping a partition on a table, its data will NOT be deleted if the 'PURGE' option is not specified.```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/626#discussion_r40115745
  
    --- Diff: tajo-docs/src/main/sphinx/sql_language/alter_table.rst ---
    @@ -96,4 +96,31 @@ You can use ``ALTER TABLE ADD PARTITION`` to add partitions to a table. The loca
       ALTER TABLE table1 DROP PARTITION (col1 = '2015' , col2 = '01', col3 = '11' )
       ALTER TABLE table1 DROP PARTITION (col1 = 'TAJO' ) PURGE
     
    -You can use ``ALTER TABLE DROP PARTITION`` to drop a partition for a table. This removes the data for a managed table and this doesn't remove the data for an external table. But if ``PURGE`` is specified for an external table, the partition data will be removed. The metadata is completely lost in all cases. An error is thrown if the partition for the table doesn't exists. You can use ``IF EXISTS`` to skip the error.
    +You can use ``ALTER TABLE DROP PARTITION`` to drop a partition for a table. This doesn't remove the data for a table. But if ``PURGE`` is specified, the partition data will be removed. The metadata is completely lost in all cases. An error is thrown if the partition for the table doesn't exists. You can use ``IF EXISTS`` to skip the error.
    +
    +========================
    +REPAIR PARTITION
    +========================
    +
    +Tajo stores a list of partitions for each table in its catalog. If partitions are manually added to the distributed file system, the metastore is not aware of these partitions. Running the ``ALTER TABLE REPAIR PARTITION`` statement ensures that the tables are properly populated.
    +
    +*Synopsis*
    +
    +.. code-block:: sql
    +
    +  ALTER TABLE <table_name> REPAIR PARTITION
    +
    +*Examples*
    +
    +.. code-block:: sql
    +
    +  ALTER TABLE student REPAIR PARTITION;
    +
    +.. note::
    +
    +  If you recover partitions in the situation that partitions just exists on catalog and it doesn't exist on file system, Tajo would not make directories on the file system and would print messages to TajoMaster log as following:
    --- End diff --
    
    How about changing this paragraph to as follows?
    ```
    Even though an information of a partition is stored in the catalog, Tajo does not recover it when its partition directory doesn't exist in the file system.
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/626#discussion_r40113668
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/master/exec/DDLExecutor.java ---
    @@ -529,24 +536,137 @@ public void alterTable(TajoMaster.MasterContext context, final QueryContext quer
             catalog.alterTable(CatalogUtil.addOrDropPartition(qualifiedName, alterTable.getPartitionColumns(),
                 alterTable.getPartitionValues(), alterTable.getLocation(), AlterTableType.DROP_PARTITION));
     
    -        // When dropping partition on an managed table, the data will be delete from file system.
    -        if (!desc.isExternal()) {
    +        // When dropping partition on a table, the data in the table will NOT be deleted from the file system.
    +        // But if PURGE is specified, the partition data will be deleted.
    +        if (alterTable.isPurge()) {
               deletePartitionPath(partitionDescProto);
    -        } else {
    -          // When dropping partition on an external table, the data in the table will NOT be deleted from the file
    -          // system. But if PURGE is specified, the partition data will be deleted.
    -          if (alterTable.isPurge()) {
    -            deletePartitionPath(partitionDescProto);
    -          }
             }
           }
    -
    +      break;
    +    case REPAIR_PARTITION:
    +      repairPartition(context, queryContext, alterTable);
           break;
         default:
           throw new InternalError("alterTable cannot handle such query: \n" + alterTable.toJson());
         }
       }
     
    +  /**
    +   * Run ALTER TABLE table_name REPAIR TABLE  statement.
    +   * This will recovery all partitions which exists on table directory.
    +   *
    +   *
    +   * @param context
    +   * @param queryContext
    +   * @param alterTable
    +   * @throws IOException
    +   */
    +  public void repairPartition(TajoMaster.MasterContext context, final QueryContext queryContext,
    +                         final AlterTableNode alterTable) throws IOException, TajoException {
    +    final CatalogService catalog = context.getCatalog();
    +    final String tableName = alterTable.getTableName();
    +
    +    String databaseName;
    +    String simpleTableName;
    +    if (CatalogUtil.isFQTableName(tableName)) {
    +      String[] split = CatalogUtil.splitFQTableName(tableName);
    +      databaseName = split[0];
    +      simpleTableName = split[1];
    +    } else {
    +      databaseName = queryContext.getCurrentDatabase();
    +      simpleTableName = tableName;
    +    }
    +
    +    if (!catalog.existsTable(databaseName, simpleTableName)) {
    +      throw new UndefinedTableException(alterTable.getTableName());
    +    }
    +
    +    TableDesc tableDesc = catalog.getTableDesc(databaseName, simpleTableName);
    +
    +    if(tableDesc.getPartitionMethod() == null) {
    +      throw new UndefinedPartitionMethodException(simpleTableName);
    +    }
    +
    +    Path tablePath = new Path(tableDesc.getUri());
    +    FileSystem fs = tablePath.getFileSystem(context.getConf());
    +
    +    PartitionMethodDesc partitionDesc = tableDesc.getPartitionMethod();
    +    Schema partitionColumns = new Schema();
    +    for (Column column : partitionDesc.getExpressionSchema().getRootColumns()) {
    +      partitionColumns.addColumn(column);
    +    }
    +
    +    // Get the array of path filter, accepting all partition paths.
    +    PathFilter[] filters = PartitionedTableRewriter.buildAllAcceptingPathFilters(partitionColumns);
    +
    +    // loop from one to the number of partition columns
    +    Path [] filteredPaths = PartitionedTableRewriter.toPathArray(fs.listStatus(tablePath, filters[0]));
    +
    +    // Get all file status matched to a ith level path filter.
    +    for (int i = 1; i < partitionColumns.size(); i++) {
    +      filteredPaths = PartitionedTableRewriter.toPathArray(fs.listStatus(filteredPaths, filters[i]));
    +    }
    +
    +    // Find missing partitions from filesystem
    +    List<PartitionDescProto> existingPartitions = catalog.getPartitions(databaseName, simpleTableName);
    +    List<String> existingPartitionNames = TUtil.newList();
    +    Path existingPartitionPath = null;
    +
    +    for(PartitionDescProto existingPartition : existingPartitions) {
    +      existingPartitionPath = new Path(existingPartition.getPath());
    +      existingPartitionNames.add(existingPartition.getPartitionName());
    +      if (!fs.exists(existingPartitionPath)) {
    +        LOG.info("Partitions missing from Filesystem:" + existingPartition.getPartitionName());
    +      }
    +    }
    +
    +    // Find missing partitions from CatalogStore
    +    List<PartitionDescProto> targetPartitions = TUtil.newList();
    +    for(Path filteredPath : filteredPaths) {
    +      PartitionDescProto targetPartition = getPartitionDesc(simpleTableName, filteredPath);
    +      if (!existingPartitionNames.contains(targetPartition.getPartitionName())) {
    +        LOG.info("Partitions not in CatalogStore:" + targetPartition.getPartitionName());
    --- End diff --
    
    This will emit too many logs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/626#discussion_r40114567
  
    --- Diff: tajo-docs/src/main/sphinx/sql_language/alter_table.rst ---
    @@ -96,4 +96,31 @@ You can use ``ALTER TABLE ADD PARTITION`` to add partitions to a table. The loca
       ALTER TABLE table1 DROP PARTITION (col1 = '2015' , col2 = '01', col3 = '11' )
       ALTER TABLE table1 DROP PARTITION (col1 = 'TAJO' ) PURGE
     
    -You can use ``ALTER TABLE DROP PARTITION`` to drop a partition for a table. This removes the data for a managed table and this doesn't remove the data for an external table. But if ``PURGE`` is specified for an external table, the partition data will be removed. The metadata is completely lost in all cases. An error is thrown if the partition for the table doesn't exists. You can use ``IF EXISTS`` to skip the error.
    +You can use ``ALTER TABLE DROP PARTITION`` to drop a partition for a table. This doesn't remove the data for a table. But if ``PURGE`` is specified, the partition data will be removed. The metadata is completely lost in all cases. An error is thrown if the partition for the table doesn't exists. You can use ``IF EXISTS`` to skip the error.
    --- End diff --
    
    In ```if the partition for the table doesn't exists```, ```exists``` should be ```exist```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/626#discussion_r40114180
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/master/exec/DDLExecutor.java ---
    @@ -529,24 +536,137 @@ public void alterTable(TajoMaster.MasterContext context, final QueryContext quer
             catalog.alterTable(CatalogUtil.addOrDropPartition(qualifiedName, alterTable.getPartitionColumns(),
                 alterTable.getPartitionValues(), alterTable.getLocation(), AlterTableType.DROP_PARTITION));
     
    -        // When dropping partition on an managed table, the data will be delete from file system.
    -        if (!desc.isExternal()) {
    +        // When dropping partition on a table, the data in the table will NOT be deleted from the file system.
    +        // But if PURGE is specified, the partition data will be deleted.
    +        if (alterTable.isPurge()) {
               deletePartitionPath(partitionDescProto);
    -        } else {
    -          // When dropping partition on an external table, the data in the table will NOT be deleted from the file
    -          // system. But if PURGE is specified, the partition data will be deleted.
    -          if (alterTable.isPurge()) {
    -            deletePartitionPath(partitionDescProto);
    -          }
             }
           }
    -
    +      break;
    +    case REPAIR_PARTITION:
    +      repairPartition(context, queryContext, alterTable);
           break;
         default:
           throw new InternalError("alterTable cannot handle such query: \n" + alterTable.toJson());
         }
       }
     
    +  /**
    +   * Run ALTER TABLE table_name REPAIR TABLE  statement.
    +   * This will recovery all partitions which exists on table directory.
    +   *
    +   *
    +   * @param context
    +   * @param queryContext
    +   * @param alterTable
    +   * @throws IOException
    +   */
    +  public void repairPartition(TajoMaster.MasterContext context, final QueryContext queryContext,
    +                         final AlterTableNode alterTable) throws IOException, TajoException {
    +    final CatalogService catalog = context.getCatalog();
    +    final String tableName = alterTable.getTableName();
    +
    +    String databaseName;
    +    String simpleTableName;
    +    if (CatalogUtil.isFQTableName(tableName)) {
    +      String[] split = CatalogUtil.splitFQTableName(tableName);
    +      databaseName = split[0];
    +      simpleTableName = split[1];
    +    } else {
    +      databaseName = queryContext.getCurrentDatabase();
    +      simpleTableName = tableName;
    +    }
    +
    +    if (!catalog.existsTable(databaseName, simpleTableName)) {
    +      throw new UndefinedTableException(alterTable.getTableName());
    +    }
    +
    +    TableDesc tableDesc = catalog.getTableDesc(databaseName, simpleTableName);
    +
    +    if(tableDesc.getPartitionMethod() == null) {
    +      throw new UndefinedPartitionMethodException(simpleTableName);
    +    }
    +
    +    Path tablePath = new Path(tableDesc.getUri());
    +    FileSystem fs = tablePath.getFileSystem(context.getConf());
    +
    +    PartitionMethodDesc partitionDesc = tableDesc.getPartitionMethod();
    +    Schema partitionColumns = new Schema();
    +    for (Column column : partitionDesc.getExpressionSchema().getRootColumns()) {
    +      partitionColumns.addColumn(column);
    +    }
    +
    +    // Get the array of path filter, accepting all partition paths.
    +    PathFilter[] filters = PartitionedTableRewriter.buildAllAcceptingPathFilters(partitionColumns);
    +
    +    // loop from one to the number of partition columns
    +    Path [] filteredPaths = PartitionedTableRewriter.toPathArray(fs.listStatus(tablePath, filters[0]));
    +
    +    // Get all file status matched to a ith level path filter.
    +    for (int i = 1; i < partitionColumns.size(); i++) {
    +      filteredPaths = PartitionedTableRewriter.toPathArray(fs.listStatus(filteredPaths, filters[i]));
    +    }
    +
    +    // Find missing partitions from filesystem
    --- End diff --
    
    The below routine is too complex and includes unnecessary codes, I think.
    IMHO, the algorithm will be simple as follows.
    1. Getting PartitionDescProtos from the catalog
    2. While iterating PartitionDescProtos, call catalog.addPartition() if the partition directory exists. Maybe catalog.addPartitions() will be called instead.
    
    If I missed any points, please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/626#issuecomment-121148222
  
    I've updated MSCK TABLE statement to ALTER TABLE REPAIR TABLE statement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1673: Implement recover partitions

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/626#issuecomment-142637917
  
    Thanks @jihoonson 
    
    I'll ship it at once. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---