You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by parthchandra <gi...@git.apache.org> on 2016/09/20 17:02:26 UTC

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

GitHub user parthchandra opened a pull request:

    https://github.com/apache/drill/pull/592

    DRILL-4826: Query against INFORMATION_SCHEMA.TABLES degrades as the n\u2026

    \u2026umber of views increases
    
    Changed to get information for all views in a single call instead of of one by one

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/parthchandra/drill DRILL-4826

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/592.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #592
    
----
commit 07fbb0ac224e53299217263cf2d0510482a4c9b3
Author: Parth Chandra <pa...@apache.org>
Date:   2016-08-04T06:02:01Z

    DRILL-4826: Query against INFORMATION_SCHEMA.TABLES degrades as the number of views
    increases

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by parthchandra <gi...@git.apache.org>.

Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79719219
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java ---
    @@ -78,17 +79,34 @@ public String getTypeName() {
       }
     
       @Override
    -  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames) {
    +  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames, final int bulkSize) {
    +    final int totalTables = tableNames.size();
         final String schemaName = getName();
    -    final List<Pair<String, ? extends Table>> tableNameToTable = Lists.newArrayList();
    -    List<org.apache.hadoop.hive.metastore.api.Table> tables;
    -    try {
    -      tables = DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, tableNames);
    -    } catch (TException e) {
    -      logger.warn("Exception occurred while trying to list tables by names from {}: {}", schemaName, e.getCause());
    -      return tableNameToTable;
    +    final List<org.apache.hadoop.hive.metastore.api.Table> tables = Lists.newArrayList();
    +
    +    // In each round, Drill asks for a sub-list of all the requested tables
    +    for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) {
    +      final int toIndex = Math.min(fromIndex + bulkSize, totalTables);
    +      final List<String> eachBulkofTableNames = tableNames.subList(fromIndex, toIndex);
    +      List<org.apache.hadoop.hive.metastore.api.Table> eachBulkofTables;
    +      // Retries once if the first call to fetch the metadata fails
    +      synchronized(mClient) {
    --- End diff --
    
    This is refactored code from the fix for DRILL-4577. (https://github.com/apache/drill/pull/461)
    I didn't really change it.
    Going thru the code, it appears that m_client may be cached and reused and so probably should be synchronized.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by gparai <gi...@git.apache.org>.

Github user gparai commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79663027
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java ---
    @@ -108,7 +126,7 @@ public String getTypeName() {
         return tableNameToTable;
       }
     
    -  private static class HiveTableWithoutStatisticAndRowType implements Table {
    +   private static class HiveTableWithoutStatisticAndRowType implements Table {
    --- End diff --
    
    Extra space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by parthchandra <gi...@git.apache.org>.

Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79719241
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -738,5 +740,49 @@ public void dropTable(String table) {
                 .build(logger);
           }
         }
    +
    +    @Override public List<Pair<String, TableType>> getTableNamesAndTypes(boolean bulkLoad, int bulkSize) {
    --- End diff --
    
    IntelliJ keeps reformatting this to be on the same line ! Will fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by sudheeshkatkam <gi...@git.apache.org>.

Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79666399
  
    --- Diff: exec/jdbc/src/test/java/org/apache/drill/jdbc/test/TestJdbcQuery.java ---
    @@ -122,6 +122,7 @@ public void testLikeNotLike() throws Exception{
           );
       }
     
    +  @Ignore("Returns results in different order depeding on forkCount")
    --- End diff --
    
    Is this a regression due to this patch? Other wise, open a ticket for this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by ppadma <gi...@git.apache.org>.

Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r83323951
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractSchema.java ---
    @@ -231,4 +231,21 @@ public void dropTable(String tableName) {
         }
         return tables;
       }
    -}
    \ No newline at end of file
    +
    +  public List<Pair<String, Schema.TableType>> getTableNamesAndTypes(boolean bulkLoad, int bulkSize) {
    +    final List<String> tableNames = Lists.newArrayList(getTableNames());
    +    final List<Pair<String, Schema.TableType>> tableNamesAndTypes = Lists.newArrayList();
    +    final List<Pair<String, ? extends Table>> tables;
    +    if (bulkLoad) {
    +      tables = getTablesByNamesByBulkLoad(tableNames, bulkSize);
    --- End diff --
    
    why do we even have this option to do bulkLoad or not ? why not just do bulkLoad always ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES degrad...

Posted by parthchandra <gi...@git.apache.org>.

Github user parthchandra commented on the issue:

    https://github.com/apache/drill/pull/592
  
    Updated to address review comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by sudheeshkatkam <gi...@git.apache.org>.

Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79663056
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java ---
    @@ -78,17 +79,34 @@ public String getTypeName() {
       }
     
       @Override
    -  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames) {
    +  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames, final int bulkSize) {
    +    final int totalTables = tableNames.size();
         final String schemaName = getName();
    -    final List<Pair<String, ? extends Table>> tableNameToTable = Lists.newArrayList();
    -    List<org.apache.hadoop.hive.metastore.api.Table> tables;
    -    try {
    -      tables = DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, tableNames);
    -    } catch (TException e) {
    -      logger.warn("Exception occurred while trying to list tables by names from {}: {}", schemaName, e.getCause());
    -      return tableNameToTable;
    +    final List<org.apache.hadoop.hive.metastore.api.Table> tables = Lists.newArrayList();
    +
    +    // In each round, Drill asks for a sub-list of all the requested tables
    +    for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) {
    +      final int toIndex = Math.min(fromIndex + bulkSize, totalTables);
    +      final List<String> eachBulkofTableNames = tableNames.subList(fromIndex, toIndex);
    +      List<org.apache.hadoop.hive.metastore.api.Table> eachBulkofTables;
    +      // Retries once if the first call to fetch the metadata fails
    +      synchronized(mClient) {
    --- End diff --
    
    why synchronized?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/592


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by sudheeshkatkam <gi...@git.apache.org>.

Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79664962
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java ---
    @@ -78,17 +79,34 @@ public String getTypeName() {
       }
     
       @Override
    -  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames) {
    +  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames, final int bulkSize) {
    +    final int totalTables = tableNames.size();
         final String schemaName = getName();
    -    final List<Pair<String, ? extends Table>> tableNameToTable = Lists.newArrayList();
    -    List<org.apache.hadoop.hive.metastore.api.Table> tables;
    -    try {
    -      tables = DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, tableNames);
    -    } catch (TException e) {
    -      logger.warn("Exception occurred while trying to list tables by names from {}: {}", schemaName, e.getCause());
    -      return tableNameToTable;
    +    final List<org.apache.hadoop.hive.metastore.api.Table> tables = Lists.newArrayList();
    +
    +    // In each round, Drill asks for a sub-list of all the requested tables
    +    for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) {
    +      final int toIndex = Math.min(fromIndex + bulkSize, totalTables);
    +      final List<String> eachBulkofTableNames = tableNames.subList(fromIndex, toIndex);
    +      List<org.apache.hadoop.hive.metastore.api.Table> eachBulkofTables;
    +      // Retries once if the first call to fetch the metadata fails
    +      synchronized(mClient) {
    +        try {
    +          eachBulkofTables = mClient.getTableObjectsByName(schemaName, eachBulkofTableNames);
    --- End diff --
    
    + Why not use the helper? Exception handling and reconnecting logic is different in the helper methods in [DrillHiveMetaStoreClient](https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java#L222). 
    + Move this logic to a method in that class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by parthchandra <gi...@git.apache.org>.

Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79721387
  
    --- Diff: exec/jdbc/src/test/java/org/apache/drill/jdbc/test/TestJdbcQuery.java ---
    @@ -122,6 +122,7 @@ public void testLikeNotLike() throws Exception{
           );
       }
     
    +  @Ignore("Returns results in different order depeding on forkCount")
    --- End diff --
    
    Or maybe I should just order the results?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by parthchandra <gi...@git.apache.org>.

Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79721140
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaRecordGenerator.java ---
    @@ -290,28 +290,30 @@ public Tables(OptionManager optionManager) {
           return new PojoRecordReader<>(Records.Table.class, records.iterator());
         }
     
    -    @Override
    -    public void visitTables(String schemaPath, SchemaPlus schema) {
    +    @Override public void visitTables(String schemaPath, SchemaPlus schema) {
           final AbstractSchema drillSchema = schema.unwrap(AbstractSchema.class);
    +      final List<Pair<String, TableType>> tableNamesAndTypes = drillSchema
    +          .getTableNamesAndTypes(optionManager.getOption(ExecConstants.ENABLE_BULK_LOAD_TABLE_LIST),
    +              (int)optionManager.getOption(ExecConstants.BULK_LOAD_TABLE_LIST_BULK_SIZE));
     
    -      final List<String> tableNames = Lists.newArrayList(schema.getTableNames());
    -      final List<Pair<String, ? extends Table>> tableNameToTables;
    -      if(optionManager.getOption(ExecConstants.ENABLE_BULK_LOAD_TABLE_LIST)) {
    -        tableNameToTables = drillSchema.getTablesByNamesByBulkLoad(tableNames);
    -      } else {
    -        tableNameToTables = drillSchema.getTablesByNames(tableNames);
    -      }
    -
    -      for(Pair<String, ? extends Table> tableNameToTable : tableNameToTables) {
    -        final String tableName = tableNameToTable.getKey();
    -        final Table table = tableNameToTable.getValue();
    +      for (Pair<String, TableType> tableNameAndType : tableNamesAndTypes) {
    +        final String tableName = tableNameAndType.getKey();
    +        final TableType tableType = tableNameAndType.getValue();
             // Visit the table, and if requested ...
    -        if(shouldVisitTable(schemaPath, tableName)) {
    -          visitTable(schemaPath, tableName, table);
    +        if (shouldVisitTable(schemaPath, tableName)) {
    +          visitTableWithType(schemaPath, tableName, tableType);
             }
           }
         }
     
    +    public boolean visitTableWithType(String schemaName, String tableName, TableType type) {
    +      Preconditions
    +          .checkNotNull(type, "Error. Type information for table %s.%s provided is null.", schemaName,
    +              tableName);
    +      records.add(new Records.Table(IS_CATALOG_NAME, schemaName, tableName, type.toString()));
    +      return false;
    --- End diff --
    
    <sigh/> to keep it similar to visitTable which does the same, unnecessarily. I suppose I could change it to return void.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by sudheeshkatkam <gi...@git.apache.org>.

Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79665593
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaRecordGenerator.java ---
    @@ -55,6 +54,7 @@
      * schema, table or field.
      */
     public abstract class InfoSchemaRecordGenerator {
    +  static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(InfoSchemaRecordGenerator.class);
    --- End diff --
    
    private


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by sudheeshkatkam <gi...@git.apache.org>.

Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79667525
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaRecordGenerator.java ---
    @@ -290,28 +290,30 @@ public Tables(OptionManager optionManager) {
           return new PojoRecordReader<>(Records.Table.class, records.iterator());
         }
     
    -    @Override
    -    public void visitTables(String schemaPath, SchemaPlus schema) {
    +    @Override public void visitTables(String schemaPath, SchemaPlus schema) {
           final AbstractSchema drillSchema = schema.unwrap(AbstractSchema.class);
    +      final List<Pair<String, TableType>> tableNamesAndTypes = drillSchema
    +          .getTableNamesAndTypes(optionManager.getOption(ExecConstants.ENABLE_BULK_LOAD_TABLE_LIST),
    +              (int)optionManager.getOption(ExecConstants.BULK_LOAD_TABLE_LIST_BULK_SIZE));
     
    -      final List<String> tableNames = Lists.newArrayList(schema.getTableNames());
    -      final List<Pair<String, ? extends Table>> tableNameToTables;
    -      if(optionManager.getOption(ExecConstants.ENABLE_BULK_LOAD_TABLE_LIST)) {
    -        tableNameToTables = drillSchema.getTablesByNamesByBulkLoad(tableNames);
    -      } else {
    -        tableNameToTables = drillSchema.getTablesByNames(tableNames);
    -      }
    -
    -      for(Pair<String, ? extends Table> tableNameToTable : tableNameToTables) {
    -        final String tableName = tableNameToTable.getKey();
    -        final Table table = tableNameToTable.getValue();
    +      for (Pair<String, TableType> tableNameAndType : tableNamesAndTypes) {
    +        final String tableName = tableNameAndType.getKey();
    +        final TableType tableType = tableNameAndType.getValue();
             // Visit the table, and if requested ...
    -        if(shouldVisitTable(schemaPath, tableName)) {
    -          visitTable(schemaPath, tableName, table);
    +        if (shouldVisitTable(schemaPath, tableName)) {
    +          visitTableWithType(schemaPath, tableName, tableType);
             }
           }
         }
     
    +    public boolean visitTableWithType(String schemaName, String tableName, TableType type) {
    +      Preconditions
    +          .checkNotNull(type, "Error. Type information for table %s.%s provided is null.", schemaName,
    +              tableName);
    +      records.add(new Records.Table(IS_CATALOG_NAME, schemaName, tableName, type.toString()));
    +      return false;
    --- End diff --
    
    why return `false`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by gparai <gi...@git.apache.org>.

Github user gparai commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79664175
  
    --- Diff: exec/jdbc/src/test/java/org/apache/drill/jdbc/test/TestJdbcQuery.java ---
    @@ -122,6 +122,7 @@ public void testLikeNotLike() throws Exception{
           );
       }
     
    +  @Ignore("Returns results in different order depeding on forkCount")
    --- End diff --
    
    typo: depending


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by parthchandra <gi...@git.apache.org>.

Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79721752
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java ---
    @@ -78,17 +79,34 @@ public String getTypeName() {
       }
     
       @Override
    -  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames) {
    +  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames, final int bulkSize) {
    +    final int totalTables = tableNames.size();
         final String schemaName = getName();
    -    final List<Pair<String, ? extends Table>> tableNameToTable = Lists.newArrayList();
    -    List<org.apache.hadoop.hive.metastore.api.Table> tables;
    -    try {
    -      tables = DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, tableNames);
    -    } catch (TException e) {
    -      logger.warn("Exception occurred while trying to list tables by names from {}: {}", schemaName, e.getCause());
    -      return tableNameToTable;
    +    final List<org.apache.hadoop.hive.metastore.api.Table> tables = Lists.newArrayList();
    +
    +    // In each round, Drill asks for a sub-list of all the requested tables
    +    for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) {
    --- End diff --
    
    Where?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by gparai <gi...@git.apache.org>.

Github user gparai commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79689068
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
    @@ -291,6 +291,9 @@
       String ENABLE_BULK_LOAD_TABLE_LIST_KEY = "exec.enable_bulk_load_table_list";
       BooleanValidator ENABLE_BULK_LOAD_TABLE_LIST = new BooleanValidator(ENABLE_BULK_LOAD_TABLE_LIST_KEY, false);
     
    +  String BULK_LOAD_TABLE_LIST_BULK_SIZE_KEY = "exec.bulk_load_table_list.bulk_size";
    --- End diff --
    
    Maybe a comment to describe the option?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by ppadma <gi...@git.apache.org>.

Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r83324461
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaRecordGenerator.java ---
    @@ -290,28 +290,30 @@ public Tables(OptionManager optionManager) {
           return new PojoRecordReader<>(Records.Table.class, records.iterator());
         }
     
    -    @Override
    -    public void visitTables(String schemaPath, SchemaPlus schema) {
    +    @Override public void visitTables(String schemaPath, SchemaPlus schema) {
    --- End diff --
    
    why this change ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by gparai <gi...@git.apache.org>.

Github user gparai commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79687833
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java ---
    @@ -78,17 +79,34 @@ public String getTypeName() {
       }
     
       @Override
    -  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames) {
    +  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames, final int bulkSize) {
    +    final int totalTables = tableNames.size();
         final String schemaName = getName();
    -    final List<Pair<String, ? extends Table>> tableNameToTable = Lists.newArrayList();
    -    List<org.apache.hadoop.hive.metastore.api.Table> tables;
    -    try {
    -      tables = DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, tableNames);
    -    } catch (TException e) {
    -      logger.warn("Exception occurred while trying to list tables by names from {}: {}", schemaName, e.getCause());
    -      return tableNameToTable;
    +    final List<org.apache.hadoop.hive.metastore.api.Table> tables = Lists.newArrayList();
    +
    +    // In each round, Drill asks for a sub-list of all the requested tables
    +    for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) {
    +      final int toIndex = Math.min(fromIndex + bulkSize, totalTables);
    +      final List<String> eachBulkofTableNames = tableNames.subList(fromIndex, toIndex);
    +      List<org.apache.hadoop.hive.metastore.api.Table> eachBulkofTables;
    +      // Retries once if the first call to fetch the metadata fails
    +      synchronized(mClient) {
    --- End diff --
    
    why do we synchronize?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by ppadma <gi...@git.apache.org>.

Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r83324591
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java ---
    @@ -78,32 +79,49 @@ public String getTypeName() {
       }
     
       @Override
    -  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames) {
    +  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames,
    +      final int bulkSize) {
    +    final int totalTables = tableNames.size();
         final String schemaName = getName();
    -    final List<Pair<String, ? extends Table>> tableNameToTable = Lists.newArrayList();
    -    List<org.apache.hadoop.hive.metastore.api.Table> tables;
    -    try {
    -      tables = DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, tableNames);
    -    } catch (TException e) {
    -      logger.warn("Exception occurred while trying to list tables by names from {}: {}", schemaName, e.getCause());
    -      return tableNameToTable;
    +    final List<org.apache.hadoop.hive.metastore.api.Table> tables = Lists.newArrayList();
    +
    +    // In each round, Drill asks for a sub-list of all the requested tables
    +    for (int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) {
    +      final int toIndex = Math.min(fromIndex + bulkSize, totalTables);
    +      final List<String> eachBulkofTableNames = tableNames.subList(fromIndex, toIndex);
    +      List<org.apache.hadoop.hive.metastore.api.Table> eachBulkofTables;
    +      // Retries once if the first call to fetch the metadata fails
    +      synchronized (mClient) {
    +        try {
    +          eachBulkofTables = mClient.getTableObjectsByName(schemaName, eachBulkofTableNames);
    +        } catch (TException tException) {
    +          try {
    +            mClient.reconnect();
    +            eachBulkofTables = mClient.getTableObjectsByName(schemaName, eachBulkofTableNames);
    +          } catch (Exception e) {
    +            logger.warn("Exception occurred while trying to read tables from {}: {}", schemaName,
    +                e.getCause());
    +            return ImmutableList.of();
    +          }
    +        }
    +        tables.addAll(eachBulkofTables);
    +      }
         }
     
    -    for(final org.apache.hadoop.hive.metastore.api.Table table : tables) {
    -      if(table == null) {
    +    final List<Pair<String, ? extends Table>> tableNameToTable = Lists.newArrayList();
    +    for (final org.apache.hadoop.hive.metastore.api.Table table : tables) {
    +      if (table == null) {
    --- End diff --
    
    can this table be null ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by sudheeshkatkam <gi...@git.apache.org>.

Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r84967719
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java ---
    @@ -233,6 +236,30 @@ private DrillHiveMetaStoreClient(final HiveConf hiveConf) throws MetaException {
         }
       }
     
    +  public static List<Table> getTablesByNamesByBulkLoadHelper(
    +      final HiveMetaStoreClient mClient, final List<String> tableNames, final String schemaName,
    +      final int bulkSize) {
    +    final int totalTables = tableNames.size();
    +    final List<org.apache.hadoop.hive.metastore.api.Table> tables = Lists.newArrayList();
    +
    +    // In each round, Drill asks for a sub-list of all the requested tables
    +    for (int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) {
    +      final int toIndex = Math.min(fromIndex + bulkSize, totalTables);
    +      final List<String> eachBulkofTableNames = tableNames.subList(fromIndex, toIndex);
    +      List<org.apache.hadoop.hive.metastore.api.Table> eachBulkofTables;
    +      // Retries once if the first call to fetch the metadata fails
    +      try {
    +        eachBulkofTables =
    --- End diff --
    
    `eachBulkofTables = getTableObjectsByNameHelper(mClient, schemaName, eachBulkofTableNames);`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by sudheeshkatkam <gi...@git.apache.org>.

Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79664917
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -738,5 +740,49 @@ public void dropTable(String table) {
                 .build(logger);
           }
         }
    +
    +    @Override public List<Pair<String, TableType>> getTableNamesAndTypes(boolean bulkLoad, int bulkSize) {
    --- End diff --
    
    Add annotation in a line above?
    There are other places too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by gparai <gi...@git.apache.org>.

Github user gparai commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79688118
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java ---
    @@ -78,17 +79,34 @@ public String getTypeName() {
       }
     
       @Override
    -  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames) {
    +  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames, final int bulkSize) {
    +    final int totalTables = tableNames.size();
         final String schemaName = getName();
    -    final List<Pair<String, ? extends Table>> tableNameToTable = Lists.newArrayList();
    -    List<org.apache.hadoop.hive.metastore.api.Table> tables;
    -    try {
    -      tables = DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, tableNames);
    -    } catch (TException e) {
    -      logger.warn("Exception occurred while trying to list tables by names from {}: {}", schemaName, e.getCause());
    -      return tableNameToTable;
    +    final List<org.apache.hadoop.hive.metastore.api.Table> tables = Lists.newArrayList();
    +
    +    // In each round, Drill asks for a sub-list of all the requested tables
    +    for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) {
    --- End diff --
    
    Space?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #592: DRILL-4826: Query against INFORMATION_SCHEMA.TABLES...

Posted by parthchandra <gi...@git.apache.org>.

Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/592#discussion_r79721423
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java ---
    @@ -78,17 +79,34 @@ public String getTypeName() {
       }
     
       @Override
    -  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames) {
    +  public List<Pair<String, ? extends Table>> getTablesByNamesByBulkLoad(final List<String> tableNames, final int bulkSize) {
    +    final int totalTables = tableNames.size();
         final String schemaName = getName();
    -    final List<Pair<String, ? extends Table>> tableNameToTable = Lists.newArrayList();
    -    List<org.apache.hadoop.hive.metastore.api.Table> tables;
    -    try {
    -      tables = DrillHiveMetaStoreClient.getTableObjectsByNameHelper(mClient, schemaName, tableNames);
    -    } catch (TException e) {
    -      logger.warn("Exception occurred while trying to list tables by names from {}: {}", schemaName, e.getCause());
    -      return tableNameToTable;
    +    final List<org.apache.hadoop.hive.metastore.api.Table> tables = Lists.newArrayList();
    +
    +    // In each round, Drill asks for a sub-list of all the requested tables
    +    for(int fromIndex = 0; fromIndex < totalTables; fromIndex += bulkSize) {
    +      final int toIndex = Math.min(fromIndex + bulkSize, totalTables);
    +      final List<String> eachBulkofTableNames = tableNames.subList(fromIndex, toIndex);
    +      List<org.apache.hadoop.hive.metastore.api.Table> eachBulkofTables;
    +      // Retries once if the first call to fetch the metadata fails
    +      synchronized(mClient) {
    --- End diff --
    
    see previous comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---