You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by vdiravka <gi...@git.apache.org> on 2018/02/05 19:41:07 UTC

[GitHub] drill pull request #1111: Upgrade drill-hive libraries to 2.1.1 version.

GitHub user vdiravka opened a pull request:

    https://github.com/apache/drill/pull/1111

    Upgrade drill-hive libraries to 2.1.1 version.

    Updating hive properties for tests and resolving dependencies and API conflicts:
    
    * Allowing of using Hive's own calcite-core and avatica versions by hive-exec.
    Calcite version is removed from root POM Dependency Management
    * Fix for "hive.metastore.schema.verification", MetaException(message:
    Version information not found in metastore)
    https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool
    METASTORE_SCHEMA_VERIFICATION="false" property is added
    * Fix JSONException class is not found (excluded banned org.json dependency)
    * Added METASTORE_AUTO_CREATE_ALL="true", properties to tests, because some additional
    tables are necessary in Hive metastore
    * Disabling calcite CBO for (Hive's CalcitePlanner) for tests, because it is in conflict
    with Drill's Calcite version for Drill unit tests. HIVE_CBO_ENABLED="false" property
    * jackson and parquet libraries are relocated in hive-exec-shade module
    * org.apache.parquet:parquet-column Drill version is added to "hive-exec" to
    allow of using Parquet empty group on MessageType level (PARQUET-278)
    * Removing of commons-codec exclusion from hive core. This dependency is
    necessary for hive-exec and hive-metastore.
    * Setting Hive's internal properties for transactional scan:
    HiveConf.HIVE_TRANSACTIONAL_TABLE_SCAN and for schema evolution: HiveConf.HIVE_SCHEMA_EVOLUTION,
    IOConstants.SCHEMA_EVOLUTION_COLUMNS, IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vdiravka/drill DRILL-5978

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1111.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1111
    
----
commit 476c44cce7a38ff818c5e328e3db61773bab18a3
Author: Vitalii Diravka <vi...@...>
Date:   2017-11-13T16:04:03Z

    Upgrade drill-hive libraries to 2.1.1 version.
    
    Updating hive properties for tests and resolving dependencies and API conflicts:
    * Allowing of using Hive's own calcite-core and avatica versions by hive-exec.
    Calcite version is removed from root POM Dependency Management
    * Fix for "hive.metastore.schema.verification", MetaException(message:
    Version information not found in metastore)
    https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool
    METASTORE_SCHEMA_VERIFICATION="false" property is added
    * Fix JSONException class is not found (excluded banned org.json dependency)
    * Added METASTORE_AUTO_CREATE_ALL="true", properties to tests, because some additional
    tables are necessary in Hive metastore
    * Disabling calcite CBO for (Hive's CalcitePlanner) for tests, because it is in conflict
    with Drill's Calcite version for Drill unit tests. HIVE_CBO_ENABLED="false" property
    * jackson and parquet libraries are relocated in hive-exec-shade module
    * org.apache.parquet:parquet-column Drill version is added to "hive-exec" to
    allow of using Parquet empty group on MessageType level (PARQUET-278)
    * Removing of commons-codec exclusion from hive core. This dependency is
    necessary for hive-exec and hive-metastore.
    * Setting Hive's internal properties for transactional scan:
    HiveConf.HIVE_TRANSACTIONAL_TABLE_SCAN and for schema evolution: HiveConf.HIVE_SCHEMA_EVOLUTION,
    IOConstants.SCHEMA_EVOLUTION_COLUMNS, IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES

----


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166508171
  
    --- Diff: contrib/storage-hive/core/pom.xml ---
    @@ -101,6 +105,7 @@
         <dependency>
           <groupId>org.apache.calcite</groupId>
           <artifactId>calcite-core</artifactId>
    +      <version>${calcite.version}</version>
    --- End diff --
    
    The same question as for common/pom.xml.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166832879
  
    --- Diff: contrib/storage-hive/core/pom.xml ---
    @@ -58,6 +58,10 @@
               <groupId>commons-codec</groupId>
               <artifactId>commons-codec</artifactId>
             </exclusion>
    +        <exclusion>
    --- End diff --
    
    For a transitive dependency will it be better to use `DependencyManagement`? Please add comments with details.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166202560
  
    --- Diff: contrib/storage-hive/core/src/test/java/org/apache/drill/exec/impersonation/hive/TestSqlStdBasedAuthorization.java ---
    @@ -100,7 +103,9 @@ private static void setSqlStdBasedAuthorizationInHiveConf() {
         hiveConfig.put(METASTORE_EXECUTE_SET_UGI.varname, hiveConf.get(METASTORE_EXECUTE_SET_UGI.varname));
         hiveConfig.put(HIVE_AUTHORIZATION_ENABLED.varname, hiveConf.get(HIVE_AUTHORIZATION_ENABLED.varname));
         hiveConfig.put(HIVE_AUTHENTICATOR_MANAGER.varname, SessionStateUserAuthenticator.class.getName());
    -    hiveConfig.put(HIVE_AUTHORIZATION_MANAGER.varname, SQLStdHiveAuthorizerFactory.class.getName());
    --- End diff --
    
    why is this removed ? The test seem to be for Authorization.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r169397714
  
    --- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
    @@ -34,28 +34,47 @@
           <artifactId>hive-exec</artifactId>
           <scope>compile</scope>
           <exclusions>
    +        <!--Hive Calcite libraries are not required. When user submits query in Drill via Hive plugin, the query
    +        is validated and planned via Drill Calcite. Hive Calcite can be used only to setup Hive store for Drill unit
    +        testing, where a lot of Hive specific queries are performed. But Drill Calcite and Avatica versions have
    +        conflicts with Hive old Calcite and Avatica versions. That's why Calcite cost based optimizator
    +        (ConfVars.HIVE_CBO_ENABLED) is disabled for Drill Hive JUnit test cases. It can be enabled again once Hive
    +        will leverage the newest Calcite version. To do that check whether Drill Calcite and Avatica versions are
    +        suitable for hive-exec. If no, use Hive Calcite and Avatica versions.
    +        Note: Versions of Calcite libraries are controlled by "DependencyManagement" block in Drill's
    +        root POM file now-->
             <exclusion>
    -          <artifactId>log4j</artifactId>
    -          <groupId>log4j</groupId>
    +          <groupId>org.apache.calcite</groupId>
    +          <artifactId>calcite-core</artifactId>
             </exclusion>
             <exclusion>
    -          <groupId>commons-codec</groupId>
    -          <artifactId>commons-codec</artifactId>
    +          <groupId>org.apache.calcite</groupId>
    +          <artifactId>calcite-avatica</artifactId>
    +        </exclusion>
    +        <exclusion>
    +          <groupId>org.apache.calcite</groupId>
    +          <artifactId>calcite-linq4j</artifactId>
             </exclusion>
             <exclusion>
    -          <artifactId>calcite-avatica</artifactId>
               <groupId>org.apache.calcite</groupId>
    +          <artifactId>calcite-druid</artifactId>
             </exclusion>
           </exclusions>
         </dependency>
    +    <!--Once newer hive-exec version leverages parquet-column 1.9.0, this dependency can be deleted -->
    --- End diff --
    
    Done


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166508479
  
    --- Diff: contrib/storage-hive/core/pom.xml ---
    @@ -58,6 +58,10 @@
               <groupId>commons-codec</groupId>
               <artifactId>commons-codec</artifactId>
             </exclusion>
    +        <exclusion>
    --- End diff --
    
    Is the exclusion necessary due to a version conflict or the dependency is not required?


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166251399
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java ---
    @@ -507,5 +509,51 @@ public static boolean hasHeaderOrFooter(HiveTableWithColumnCache table) {
         int skipFooter = retrieveIntProperty(tableProperties, serdeConstants.FOOTER_COUNT, -1);
         return skipHeader > 0 || skipFooter > 0;
       }
    +
    +  /**
    +   * This method checks whether the schema evolution properties are set in job conf for the input format. If they
    +   * aren't set, method sets the column names and types from table/partition properties or storage descriptor.
    +   * @param job the job to update
    +   * @param properties table or partition properties
    +   * @param isAcidTable true if the table is transactional, false otherwise
    +   * @param sd storage descriptor
    +   */
    +  public static void setColumnTypes(JobConf job, Properties properties, boolean isAcidTable, StorageDescriptor sd) {
    +
    +    // No work is needed, if schema evolution is used
    +    if (Utilities.isSchemaEvolutionEnabled(job, isAcidTable) && job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS) != null &&
    +        job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES) != null) {
    +      return;
    +    }
    +
    +    String colNames;
    +    String colTypes;
    +
    +    // Try to get get column names and types from table or partition properties. If they are absent there, get columns
    +    // data from storage descriptor of the table
    +    if (properties.containsKey(serdeConstants.LIST_COLUMNS) && properties.containsKey(serdeConstants.LIST_COLUMN_TYPES)) {
    +      colNames = job.get(serdeConstants.LIST_COLUMNS);
    +      colTypes = job.get(serdeConstants.LIST_COLUMN_TYPES);
    +    } else {
    +      StringBuilder colNamesBuilder = new StringBuilder();
    +      StringBuilder colTypesBuilder = new StringBuilder();
    +      boolean isFirst = true;
    +      for(FieldSchema col: sd.getCols()) {
    +        if (isFirst) {
    +          isFirst = false;
    +        } else {
    +          colNamesBuilder.append(',');
    +          colTypesBuilder.append(',');
    +        }
    +        colNamesBuilder.append(col.getName());
    +        colTypesBuilder.append(col.getType());
    +      }
    +      colNames = colNamesBuilder.toString();
    +      colTypes = colTypesBuilder.toString();
    --- End diff --
    
    I like it. Thank you.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166650425
  
    --- Diff: contrib/storage-hive/core/pom.xml ---
    @@ -58,6 +58,10 @@
               <groupId>commons-codec</groupId>
               <artifactId>commons-codec</artifactId>
             </exclusion>
    +        <exclusion>
    --- End diff --
    
    version conflict.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r167946336
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java ---
    @@ -507,5 +512,49 @@ public static boolean hasHeaderOrFooter(HiveTableWithColumnCache table) {
         int skipFooter = retrieveIntProperty(tableProperties, serdeConstants.FOOTER_COUNT, -1);
         return skipHeader > 0 || skipFooter > 0;
       }
    +
    +  /**
    +   * This method checks whether the table is transactional and set necessary properties in {@link JobConf}.
    +   * If schema evolution properties aren't set in job conf for the input format, method sets the column names
    +   * and types from table/partition properties or storage descriptor.
    +   *
    +   * @param job the job to update
    +   * @param properties table or partition properties
    +   * @param sd storage descriptor
    +   */
    +  public static void verifyAndAddTransactionalProperties(JobConf job, StorageDescriptor sd) {
    +
    +    if (AcidUtils.isTablePropertyTransactional(job)) {
    +      AcidUtils.setTransactionalTableScan(job, true);
    +
    +      // No work is needed, if schema evolution is used
    +      if (Utilities.isSchemaEvolutionEnabled(job, true) && job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS) != null &&
    +          job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES) != null) {
    +        return;
    +      }
    +
    +      String colNames;
    +      String colTypes;
    +
    +      // Try to get get column names and types from table or partition properties. If they are absent there, get columns
    +      // data from storage descriptor of the table
    +      colNames = job.get(serdeConstants.LIST_COLUMNS);
    +      colTypes = job.get(serdeConstants.LIST_COLUMN_TYPES);
    +
    +      if (colNames == null || colTypes == null) {
    +        List<String> colNamesList = Lists.newArrayList();
    +        List<String> colTypesList = Lists.newArrayList();
    +        for (FieldSchema col: sd.getCols()) {
    +          colNamesList.add(col.getName());
    +          colTypesList.add(col.getType());
    +        }
    +        colNames = Joiner.on(",").join(colNamesList);
    --- End diff --
    
    I have changed it. But we need call `input.getName()` and `input.getType()`, that's why I use two Functions and code became bigger. Once we will use Java8 it can be smaller. Or did I miss something here?


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r167944958
  
    --- Diff: contrib/storage-hive/core/pom.xml ---
    @@ -58,6 +58,10 @@
               <groupId>commons-codec</groupId>
               <artifactId>commons-codec</artifactId>
             </exclusion>
    +        <exclusion>
    --- End diff --
    
    Both cases can resolve it. In details `io.dropwizard.metrics:metrics-core` is nowhere used in Drill. And this is a dependency for `tephra-core` and transitive for `hive-metastore`. But it conflicts with Drill's `com.codahale.metrics`. 
    `hive-metastore` uses 3.0.1 version of this dependency, but the last version in maven repository is 4.0.2.
    I added this dependency to `dependencyManagement` block with 4.0.2 version and conflict is resolved as well. I think it is a better decision, because it can help to avoid similar conflicts in future.
    
    Also `metrics-core` in `hive-hbase-handler` has not influence to Drill, so I've removed my exclusion of it.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166842824
  
    --- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
    @@ -39,23 +39,28 @@
               <groupId>log4j</groupId>
             </exclusion>
             <exclusion>
    -          <groupId>commons-codec</groupId>
    -          <artifactId>commons-codec</artifactId>
    -        </exclusion>
    -        <exclusion>
    -          <artifactId>calcite-avatica</artifactId>
    -          <groupId>org.apache.calcite</groupId>
    +          <groupId>org.json</groupId>
    +          <artifactId>json</artifactId>
             </exclusion>
           </exclusions>
         </dependency>
    +    <dependency>
    +      <groupId>org.apache.parquet</groupId>
    +      <artifactId>parquet-column</artifactId>
    +      <version>${parquet.version}</version>
    --- End diff --
    
    Why is it safe to change one library? Will it be safer to upgrade all hive dependencies on parquet to the same version?


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166651052
  
    --- Diff: common/pom.xml ---
    @@ -45,6 +45,7 @@
         <dependency>
           <groupId>org.apache.calcite</groupId>
           <artifactId>calcite-core</artifactId>
    +      <version>${calcite.version}</version>
    --- End diff --
    
    hive-exec needs own calcite 1.6 version.
    Calcite version in DependencyManagement of root pom leads to using Drill Calcite version over the whole project.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166683550
  
    --- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
    @@ -39,23 +39,28 @@
               <groupId>log4j</groupId>
             </exclusion>
             <exclusion>
    -          <groupId>commons-codec</groupId>
    -          <artifactId>commons-codec</artifactId>
    -        </exclusion>
    -        <exclusion>
    -          <artifactId>calcite-avatica</artifactId>
    -          <groupId>org.apache.calcite</groupId>
    +          <groupId>org.json</groupId>
    +          <artifactId>json</artifactId>
             </exclusion>
           </exclusions>
         </dependency>
    +    <dependency>
    +      <groupId>org.apache.parquet</groupId>
    +      <artifactId>parquet-column</artifactId>
    +      <version>${parquet.version}</version>
    +    </dependency>
    +    <dependency>
    +      <groupId>com.tdunning</groupId>
    +      <artifactId>json</artifactId>
    +    </dependency>
       </dependencies>
     
       <build>
         <plugins>
           <plugin>
             <groupId>org.apache.maven.plugins</groupId>
             <artifactId>maven-shade-plugin</artifactId>
    -        <version>2.1</version>
    +        <version>3.1.0</version>
    --- End diff --
    
    It is a [last stable](https://maven.apache.org/plugins/maven-shade-plugin/index.html) version of this plugin. Update is not important. 
    We use 2.4 version around the project. But here was an older version.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166513525
  
    --- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
    @@ -39,23 +39,28 @@
               <groupId>log4j</groupId>
             </exclusion>
             <exclusion>
    -          <groupId>commons-codec</groupId>
    -          <artifactId>commons-codec</artifactId>
    -        </exclusion>
    -        <exclusion>
    -          <artifactId>calcite-avatica</artifactId>
    -          <groupId>org.apache.calcite</groupId>
    +          <groupId>org.json</groupId>
    +          <artifactId>json</artifactId>
             </exclusion>
           </exclusions>
         </dependency>
    +    <dependency>
    +      <groupId>org.apache.parquet</groupId>
    +      <artifactId>parquet-column</artifactId>
    +      <version>${parquet.version}</version>
    +    </dependency>
    +    <dependency>
    +      <groupId>com.tdunning</groupId>
    +      <artifactId>json</artifactId>
    +    </dependency>
       </dependencies>
     
       <build>
         <plugins>
           <plugin>
             <groupId>org.apache.maven.plugins</groupId>
             <artifactId>maven-shade-plugin</artifactId>
    -        <version>2.1</version>
    +        <version>3.1.0</version>
    --- End diff --
    
    What is the reason for the version change and should it be applied to other modules where shading is used?


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r169397509
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/readers/HiveAbstractReader.java ---
    @@ -86,7 +86,7 @@
       protected List<Object> selectedPartitionValues = Lists.newArrayList();
     
       // SerDe of the reading partition (or table if the table is non-partitioned)
    -  protected SerDe partitionSerDe;
    +  protected Deserializer partitionSerDe;
    --- End diff --
    
    Done


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166213941
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java ---
    @@ -507,5 +509,51 @@ public static boolean hasHeaderOrFooter(HiveTableWithColumnCache table) {
         int skipFooter = retrieveIntProperty(tableProperties, serdeConstants.FOOTER_COUNT, -1);
         return skipHeader > 0 || skipFooter > 0;
       }
    +
    +  /**
    +   * This method checks whether the schema evolution properties are set in job conf for the input format. If they
    +   * aren't set, method sets the column names and types from table/partition properties or storage descriptor.
    +   * @param job the job to update
    +   * @param properties table or partition properties
    +   * @param isAcidTable true if the table is transactional, false otherwise
    +   * @param sd storage descriptor
    +   */
    +  public static void setColumnTypes(JobConf job, Properties properties, boolean isAcidTable, StorageDescriptor sd) {
    +
    +    // No work is needed, if schema evolution is used
    +    if (Utilities.isSchemaEvolutionEnabled(job, isAcidTable) && job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS) != null &&
    +        job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES) != null) {
    +      return;
    +    }
    +
    +    String colNames;
    +    String colTypes;
    +
    +    // Try to get get column names and types from table or partition properties. If they are absent there, get columns
    +    // data from storage descriptor of the table
    +    if (properties.containsKey(serdeConstants.LIST_COLUMNS) && properties.containsKey(serdeConstants.LIST_COLUMN_TYPES)) {
    +      colNames = job.get(serdeConstants.LIST_COLUMNS);
    +      colTypes = job.get(serdeConstants.LIST_COLUMN_TYPES);
    +    } else {
    +      StringBuilder colNamesBuilder = new StringBuilder();
    +      StringBuilder colTypesBuilder = new StringBuilder();
    +      boolean isFirst = true;
    +      for(FieldSchema col: sd.getCols()) {
    +        if (isFirst) {
    +          isFirst = false;
    +        } else {
    +          colNamesBuilder.append(',');
    +          colTypesBuilder.append(',');
    +        }
    +        colNamesBuilder.append(col.getName());
    +        colTypesBuilder.append(col.getType());
    +      }
    +      colNames = colNamesBuilder.toString();
    +      colTypes = colTypesBuilder.toString();
    --- End diff --
    
    how about changing the loop as below:
    
    ```
    final StringBuilder colNamesBuilder = new StringBuilder();
    final StringBuilder colTypesBuilder = new StringBuilder();
    
         for(FieldSchema col: sd.getCols()) {
               colNamesBuilder.append(col.getName());
               colTypesBuilder.append(col.getType());
               colNamesBuilder.append(',');
               colTypesBuilder.append(',');
          }
          colNames = colNamesBuilder.substring(0, colNamesBuilder.length() - 1);
          colTypes = colTypesBuilder.substring(0, colTypesBuilder.length() - 1);
    ```



---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166660444
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java ---
    @@ -507,5 +510,52 @@ public static boolean hasHeaderOrFooter(HiveTableWithColumnCache table) {
         int skipFooter = retrieveIntProperty(tableProperties, serdeConstants.FOOTER_COUNT, -1);
         return skipHeader > 0 || skipFooter > 0;
       }
    +
    +  /**
    +   * This method checks whether the table is transactional and set necessary properties in {@link JobConf}.
    +   * If schema evolution properties aren't set in job conf for the input format, method sets the column names
    +   * and types from table/partition properties or storage descriptor.
    +   *
    +   * @param job the job to update
    +   * @param properties table or partition properties
    +   * @param sd storage descriptor
    +   */
    +  public static void verifyAndAddTransactionalProperties(JobConf job, Properties properties, StorageDescriptor sd) {
    +
    +    if (AcidUtils.isTablePropertyTransactional(properties)) {
    +      AcidUtils.setTransactionalTableScan(job, true);
    +
    +      // No work is needed, if schema evolution is used
    +      if (Utilities.isSchemaEvolutionEnabled(job, true) && job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS) != null &&
    +          job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES) != null) {
    +        return;
    +      }
    +
    +      String colNames;
    +      String colTypes;
    +
    +      // Try to get get column names and types from table or partition properties. If they are absent there, get columns
    +      // data from storage descriptor of the table
    +      if (properties.containsKey(serdeConstants.LIST_COLUMNS) && properties.containsKey(serdeConstants.LIST_COLUMN_TYPES)) {
    --- End diff --
    
    Agree. Changed.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166669844
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java ---
    @@ -507,5 +510,52 @@ public static boolean hasHeaderOrFooter(HiveTableWithColumnCache table) {
         int skipFooter = retrieveIntProperty(tableProperties, serdeConstants.FOOTER_COUNT, -1);
         return skipHeader > 0 || skipFooter > 0;
       }
    +
    +  /**
    +   * This method checks whether the table is transactional and set necessary properties in {@link JobConf}.
    +   * If schema evolution properties aren't set in job conf for the input format, method sets the column names
    +   * and types from table/partition properties or storage descriptor.
    +   *
    +   * @param job the job to update
    +   * @param properties table or partition properties
    +   * @param sd storage descriptor
    +   */
    +  public static void verifyAndAddTransactionalProperties(JobConf job, Properties properties, StorageDescriptor sd) {
    +
    +    if (AcidUtils.isTablePropertyTransactional(properties)) {
    +      AcidUtils.setTransactionalTableScan(job, true);
    +
    +      // No work is needed, if schema evolution is used
    +      if (Utilities.isSchemaEvolutionEnabled(job, true) && job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS) != null &&
    +          job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES) != null) {
    +        return;
    +      }
    +
    +      String colNames;
    +      String colTypes;
    +
    +      // Try to get get column names and types from table or partition properties. If they are absent there, get columns
    +      // data from storage descriptor of the table
    +      if (properties.containsKey(serdeConstants.LIST_COLUMNS) && properties.containsKey(serdeConstants.LIST_COLUMN_TYPES)) {
    +        colNames = job.get(serdeConstants.LIST_COLUMNS);
    +        colTypes = job.get(serdeConstants.LIST_COLUMN_TYPES);
    +      } else {
    +        final StringBuilder colNamesBuilder = new StringBuilder();
    --- End diff --
    
    Thank you, it's cleaner.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166513403
  
    --- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
    @@ -39,23 +39,28 @@
               <groupId>log4j</groupId>
             </exclusion>
             <exclusion>
    -          <groupId>commons-codec</groupId>
    -          <artifactId>commons-codec</artifactId>
    -        </exclusion>
    -        <exclusion>
    -          <artifactId>calcite-avatica</artifactId>
    -          <groupId>org.apache.calcite</groupId>
    +          <groupId>org.json</groupId>
    +          <artifactId>json</artifactId>
             </exclusion>
           </exclusions>
         </dependency>
    +    <dependency>
    +      <groupId>org.apache.parquet</groupId>
    +      <artifactId>parquet-column</artifactId>
    +      <version>${parquet.version}</version>
    --- End diff --
    
    Any other parquet dependencies? If they are not needed, why the explicit dependency on org.apache.parquet:parquet-column is necessary?


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166651134
  
    --- Diff: contrib/storage-hive/core/pom.xml ---
    @@ -101,6 +105,7 @@
         <dependency>
           <groupId>org.apache.calcite</groupId>
           <artifactId>calcite-core</artifactId>
    +      <version>${calcite.version}</version>
    --- End diff --
    
    Answered.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166507937
  
    --- Diff: common/pom.xml ---
    @@ -45,6 +45,7 @@
         <dependency>
           <groupId>org.apache.calcite</groupId>
           <artifactId>calcite-core</artifactId>
    +      <version>${calcite.version}</version>
    --- End diff --
    
    Why is this change necessary? The version should come from `<dependencyManagement>` of the parent pom.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r169360386
  
    --- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
    @@ -39,23 +39,28 @@
               <groupId>log4j</groupId>
             </exclusion>
             <exclusion>
    -          <groupId>commons-codec</groupId>
    -          <artifactId>commons-codec</artifactId>
    -        </exclusion>
    -        <exclusion>
    -          <artifactId>calcite-avatica</artifactId>
    -          <groupId>org.apache.calcite</groupId>
    +          <groupId>org.json</groupId>
    +          <artifactId>json</artifactId>
             </exclusion>
           </exclusions>
         </dependency>
    +    <dependency>
    +      <groupId>org.apache.parquet</groupId>
    +      <artifactId>parquet-column</artifactId>
    +      <version>${parquet.version}</version>
    +    </dependency>
    +    <dependency>
    +      <groupId>com.tdunning</groupId>
    +      <artifactId>json</artifactId>
    +    </dependency>
       </dependencies>
     
       <build>
         <plugins>
           <plugin>
             <groupId>org.apache.maven.plugins</groupId>
             <artifactId>maven-shade-plugin</artifactId>
    -        <version>2.1</version>
    +        <version>3.1.0</version>
    --- End diff --
    
    Can it be unified in the plugin management of the Drill root pom?


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166512379
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java ---
    @@ -507,5 +510,52 @@ public static boolean hasHeaderOrFooter(HiveTableWithColumnCache table) {
         int skipFooter = retrieveIntProperty(tableProperties, serdeConstants.FOOTER_COUNT, -1);
         return skipHeader > 0 || skipFooter > 0;
       }
    +
    +  /**
    +   * This method checks whether the table is transactional and set necessary properties in {@link JobConf}.
    +   * If schema evolution properties aren't set in job conf for the input format, method sets the column names
    +   * and types from table/partition properties or storage descriptor.
    +   *
    +   * @param job the job to update
    +   * @param properties table or partition properties
    +   * @param sd storage descriptor
    +   */
    +  public static void verifyAndAddTransactionalProperties(JobConf job, Properties properties, StorageDescriptor sd) {
    +
    +    if (AcidUtils.isTablePropertyTransactional(properties)) {
    +      AcidUtils.setTransactionalTableScan(job, true);
    +
    +      // No work is needed, if schema evolution is used
    +      if (Utilities.isSchemaEvolutionEnabled(job, true) && job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS) != null &&
    +          job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES) != null) {
    +        return;
    +      }
    +
    +      String colNames;
    +      String colTypes;
    +
    +      // Try to get get column names and types from table or partition properties. If they are absent there, get columns
    +      // data from storage descriptor of the table
    +      if (properties.containsKey(serdeConstants.LIST_COLUMNS) && properties.containsKey(serdeConstants.LIST_COLUMN_TYPES)) {
    --- End diff --
    
    avoid double `get()` (`containsKey()`) if possible.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r169351388
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/readers/HiveAbstractReader.java ---
    @@ -143,7 +143,7 @@ private void init() throws ExecutionSetupException {
                   HiveUtilities.getPartitionMetadata(partition, table);
           HiveUtilities.addConfToJob(job, partitionProperties);
     
    -      final SerDe tableSerDe = createSerDe(job, table.getSd().getSerdeInfo().getSerializationLib(), tableProperties);
    +      final Deserializer tableSerDe = createSerDe(job, table.getSd().getSerdeInfo().getSerializationLib(), tableProperties);
    --- End diff --
    
    rename both function and the variable.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166209232
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveMetadataProvider.java ---
    @@ -264,6 +265,10 @@ private HiveStats getStatsEstimateFromInputSplits(final List<LogicalInputSplit>
               final List<LogicalInputSplit> splits = Lists.newArrayList();
               final JobConf job = new JobConf(hiveConf);
               HiveUtilities.addConfToJob(job, properties);
    +          if (AcidUtils.isTablePropertyTransactional(properties)) {
    +            AcidUtils.setTransactionalTableScan(job, true);
    +            HiveUtilities.setColumnTypes(job, properties, true, sd);
    +          }
    --- End diff --
    
    How about refactoring this block of code to a new method in HiveUtilities ? Like `verifyAndAddTransactionalProperty()`. Then just call that method from both here and HiveAbstractReader


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166247024
  
    --- Diff: pom.xml ---
    @@ -884,13 +884,33 @@
                 <groupId>io.netty</groupId>
                 <artifactId>netty-all</artifactId>
               </exclusion>
    +          <exclusion>
    +            <groupId>org.mortbay.jetty</groupId>
    +            <artifactId>servlet-api</artifactId>
    --- End diff --
    
    You are right. Exclusion is removed. Thank you.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r169397851
  
    --- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
    @@ -39,23 +39,28 @@
               <groupId>log4j</groupId>
             </exclusion>
             <exclusion>
    -          <groupId>commons-codec</groupId>
    -          <artifactId>commons-codec</artifactId>
    -        </exclusion>
    -        <exclusion>
    -          <artifactId>calcite-avatica</artifactId>
    -          <groupId>org.apache.calcite</groupId>
    +          <groupId>org.json</groupId>
    +          <artifactId>json</artifactId>
             </exclusion>
           </exclusions>
         </dependency>
    +    <dependency>
    +      <groupId>org.apache.parquet</groupId>
    +      <artifactId>parquet-column</artifactId>
    +      <version>${parquet.version}</version>
    +    </dependency>
    +    <dependency>
    +      <groupId>com.tdunning</groupId>
    +      <artifactId>json</artifactId>
    +    </dependency>
       </dependencies>
     
       <build>
         <plugins>
           <plugin>
             <groupId>org.apache.maven.plugins</groupId>
             <artifactId>maven-shade-plugin</artifactId>
    -        <version>2.1</version>
    +        <version>3.1.0</version>
    --- End diff --
    
    Good idea. Done


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166512235
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java ---
    @@ -507,5 +510,52 @@ public static boolean hasHeaderOrFooter(HiveTableWithColumnCache table) {
         int skipFooter = retrieveIntProperty(tableProperties, serdeConstants.FOOTER_COUNT, -1);
         return skipHeader > 0 || skipFooter > 0;
       }
    +
    +  /**
    +   * This method checks whether the table is transactional and set necessary properties in {@link JobConf}.
    +   * If schema evolution properties aren't set in job conf for the input format, method sets the column names
    +   * and types from table/partition properties or storage descriptor.
    +   *
    +   * @param job the job to update
    +   * @param properties table or partition properties
    +   * @param sd storage descriptor
    +   */
    +  public static void verifyAndAddTransactionalProperties(JobConf job, Properties properties, StorageDescriptor sd) {
    +
    +    if (AcidUtils.isTablePropertyTransactional(properties)) {
    +      AcidUtils.setTransactionalTableScan(job, true);
    +
    +      // No work is needed, if schema evolution is used
    +      if (Utilities.isSchemaEvolutionEnabled(job, true) && job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS) != null &&
    +          job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES) != null) {
    +        return;
    +      }
    +
    +      String colNames;
    +      String colTypes;
    +
    +      // Try to get get column names and types from table or partition properties. If they are absent there, get columns
    +      // data from storage descriptor of the table
    +      if (properties.containsKey(serdeConstants.LIST_COLUMNS) && properties.containsKey(serdeConstants.LIST_COLUMN_TYPES)) {
    +        colNames = job.get(serdeConstants.LIST_COLUMNS);
    +        colTypes = job.get(serdeConstants.LIST_COLUMN_TYPES);
    +      } else {
    +        final StringBuilder colNamesBuilder = new StringBuilder();
    --- End diff --
    
    consider using `Joiner`.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166832638
  
    --- Diff: common/pom.xml ---
    @@ -45,6 +45,7 @@
         <dependency>
           <groupId>org.apache.calcite</groupId>
           <artifactId>calcite-core</artifactId>
    +      <version>${calcite.version}</version>
    --- End diff --
    
    Will it be better to explicitly specify the version of calcite-core in hive-exec?


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166672700
  
    --- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
    @@ -39,23 +39,28 @@
               <groupId>log4j</groupId>
             </exclusion>
             <exclusion>
    -          <groupId>commons-codec</groupId>
    -          <artifactId>commons-codec</artifactId>
    -        </exclusion>
    -        <exclusion>
    -          <artifactId>calcite-avatica</artifactId>
    -          <groupId>org.apache.calcite</groupId>
    +          <groupId>org.json</groupId>
    --- End diff --
    
    The master branch of Hive doesn't have org.json.
    But Hive2.1 includes this:
    https://github.com/apache/hive/blob/branch-2.1/pom.xml#L608
    https://github.com/apache/hive/blob/branch-2.1/pom.xml#L608
    and so on...


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166510396
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java ---
    @@ -507,5 +510,52 @@ public static boolean hasHeaderOrFooter(HiveTableWithColumnCache table) {
         int skipFooter = retrieveIntProperty(tableProperties, serdeConstants.FOOTER_COUNT, -1);
         return skipHeader > 0 || skipFooter > 0;
       }
    +
    +  /**
    +   * This method checks whether the table is transactional and set necessary properties in {@link JobConf}.
    +   * If schema evolution properties aren't set in job conf for the input format, method sets the column names
    +   * and types from table/partition properties or storage descriptor.
    +   *
    +   * @param job the job to update
    +   * @param properties table or partition properties
    +   * @param sd storage descriptor
    +   */
    +  public static void verifyAndAddTransactionalProperties(JobConf job, Properties properties, StorageDescriptor sd) {
    --- End diff --
    
    Is it necessary to pass both `JobConf` and `Properties`? As far as I can see `job` is always populated using passed `properties`.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r167947402
  
    --- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
    @@ -39,23 +39,28 @@
               <groupId>log4j</groupId>
             </exclusion>
             <exclusion>
    -          <groupId>commons-codec</groupId>
    -          <artifactId>commons-codec</artifactId>
    -        </exclusion>
    -        <exclusion>
    -          <artifactId>calcite-avatica</artifactId>
    -          <groupId>org.apache.calcite</groupId>
    +          <groupId>org.json</groupId>
    +          <artifactId>json</artifactId>
             </exclusion>
           </exclusions>
         </dependency>
    +    <dependency>
    +      <groupId>org.apache.parquet</groupId>
    +      <artifactId>parquet-column</artifactId>
    +      <version>${parquet.version}</version>
    --- End diff --
    
    `hive-exec` uses only two parquet dependencies: [parquet-column](https://github.com/apache/hive/blob/branch-2.1/ql/pom.xml#L444) and [parquet-hadoop-bundle](https://github.com/apache/hive/blob/branch-2.1/ql/pom.xml#L109).
    But Drill doesn't use own version of `parquet-hadoop-bundle` and moreover Drill version for it is absent in maven repository.
    
    It appears that Hive 2.3.2 uses `parquet-column` 1.8.1 version as well. But last Apache Hive master is updated to 1.9.0 version.
    I have added comment about it into `drill-hive-exec-shaded` POM to update it in future.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r167945192
  
    --- Diff: common/pom.xml ---
    @@ -45,6 +45,7 @@
         <dependency>
           <groupId>org.apache.calcite</groupId>
           <artifactId>calcite-core</artifactId>
    +      <version>${calcite.version}</version>
    --- End diff --
    
    I have returned calcite-core version into `DependencyManagement` block. Drill Calcite version libraries are included into "drill-hive-exec-shaded" module. 
    
    I will tell you why it works for now:
    When user submits query in Drill via Hive plugin, the query is validated and planned via Drill Calcite. So Hive Calcite isn't necessary for it. 
    Hive Calcite is used only in the process of Drill unit testing, where a lot of Hive specific queries are performed to setup Hive store for testing. But Drill Calcite and Avatica versions have conflicts with Hive old Calcite and Avatica versions. That's why I have disabled Calcite cost based optimizator `conf.set(ConfVars.HIVE_CBO_ENABLED.varname, "false");`. We can enable it again once Hive will leverage the newest Calcite version. The comment about it is added into `drill-hive-exec-shaded` POM.



---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166679633
  
    --- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
    @@ -39,23 +39,28 @@
               <groupId>log4j</groupId>
             </exclusion>
             <exclusion>
    -          <groupId>commons-codec</groupId>
    -          <artifactId>commons-codec</artifactId>
    -        </exclusion>
    -        <exclusion>
    -          <artifactId>calcite-avatica</artifactId>
    -          <groupId>org.apache.calcite</groupId>
    +          <groupId>org.json</groupId>
    +          <artifactId>json</artifactId>
             </exclusion>
           </exclusions>
         </dependency>
    +    <dependency>
    +      <groupId>org.apache.parquet</groupId>
    +      <artifactId>parquet-column</artifactId>
    +      <version>${parquet.version}</version>
    --- End diff --
    
    Hive parquet dependencies cause an issues in Drill. So relocating is decision.
    But in this case one Drill test [DRILL-3938](https://github.com/apache/drill/pull/1111/files#diff-0f8deff45b06e709ba2ae35f96274076R395) is failed, because Parquet 1.8.0 version started to prohibit empty struct/groups on MessageType level [PARQUET-278](https://issues.apache.org/jira/browse/PARQUET-278). 
    But from 1.8.1 version it is [allowed](https://github.com/apache/parquet-mr/pull/263/files#diff-1ae59a3502695f53cb2d0371e1116c63L92) again - [PARQUET-363](https://issues.apache.org/jira/browse/PARQUET-363).
    
    So Drill parquet-column version without throwing an Exception solves the issue. 
    For Hive2.3 it can be changed.



---

[GitHub] drill issue #1111: DRILL-5978: Updating of Apache and MapR Hive libraries to...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on the issue:

    https://github.com/apache/drill/pull/1111
  
    +1.   merged the PR in commit 27aa236. 


---

[GitHub] drill issue #1111: DRILL-5978: Upgrade Hive libraries to 2.3.2 version

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on the issue:

    https://github.com/apache/drill/pull/1111
  
    @vrozov Thank you for CR. 
    Commits are squashed, Jira and PR are renamed to 2.3.2 version upgrade.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r169348315
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/readers/HiveAbstractReader.java ---
    @@ -86,7 +86,7 @@
       protected List<Object> selectedPartitionValues = Lists.newArrayList();
     
       // SerDe of the reading partition (or table if the table is non-partitioned)
    -  protected SerDe partitionSerDe;
    +  protected Deserializer partitionSerDe;
    --- End diff --
    
    rename to 'partitionDeserializer'


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166198314
  
    --- Diff: pom.xml ---
    @@ -884,13 +884,33 @@
                 <groupId>io.netty</groupId>
                 <artifactId>netty-all</artifactId>
               </exclusion>
    +          <exclusion>
    +            <groupId>org.mortbay.jetty</groupId>
    +            <artifactId>servlet-api</artifactId>
    --- End diff --
    
    duplicate exclusion. Already excluded at the top


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166838809
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java ---
    @@ -507,5 +512,49 @@ public static boolean hasHeaderOrFooter(HiveTableWithColumnCache table) {
         int skipFooter = retrieveIntProperty(tableProperties, serdeConstants.FOOTER_COUNT, -1);
         return skipHeader > 0 || skipFooter > 0;
       }
    +
    +  /**
    +   * This method checks whether the table is transactional and set necessary properties in {@link JobConf}.
    +   * If schema evolution properties aren't set in job conf for the input format, method sets the column names
    +   * and types from table/partition properties or storage descriptor.
    +   *
    +   * @param job the job to update
    +   * @param properties table or partition properties
    +   * @param sd storage descriptor
    +   */
    +  public static void verifyAndAddTransactionalProperties(JobConf job, StorageDescriptor sd) {
    +
    +    if (AcidUtils.isTablePropertyTransactional(job)) {
    +      AcidUtils.setTransactionalTableScan(job, true);
    +
    +      // No work is needed, if schema evolution is used
    +      if (Utilities.isSchemaEvolutionEnabled(job, true) && job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS) != null &&
    +          job.get(IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES) != null) {
    +        return;
    +      }
    +
    +      String colNames;
    +      String colTypes;
    +
    +      // Try to get get column names and types from table or partition properties. If they are absent there, get columns
    +      // data from storage descriptor of the table
    +      colNames = job.get(serdeConstants.LIST_COLUMNS);
    +      colTypes = job.get(serdeConstants.LIST_COLUMN_TYPES);
    +
    +      if (colNames == null || colTypes == null) {
    +        List<String> colNamesList = Lists.newArrayList();
    +        List<String> colTypesList = Lists.newArrayList();
    +        for (FieldSchema col: sd.getCols()) {
    +          colNamesList.add(col.getName());
    +          colTypesList.add(col.getType());
    +        }
    +        colNames = Joiner.on(",").join(colNamesList);
    --- End diff --
    
    Consider `Joiner.on(",").join(Iterables.transform(sd.getCols(), toName));` where `toName` is 
    ```
      private static Function<FieldSchema, String> toName = new Function<FieldSchema, String>()
      {
        @Nullable
        @Override
        public String apply(@Nullable FieldSchema input)
        {
          return input.getName();
        }
      };
    ```



---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r169397544
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/readers/HiveAbstractReader.java ---
    @@ -143,7 +143,7 @@ private void init() throws ExecutionSetupException {
                   HiveUtilities.getPartitionMetadata(partition, table);
           HiveUtilities.addConfToJob(job, partitionProperties);
     
    -      final SerDe tableSerDe = createSerDe(job, table.getSd().getSerdeInfo().getSerializationLib(), tableProperties);
    +      final Deserializer tableSerDe = createSerDe(job, table.getSd().getSerdeInfo().getSerializationLib(), tableProperties);
    --- End diff --
    
    Done. `tableSerDe` -> `tableDeserializer` and `createSerDe` -> `createDeserializer`
    If you meant `getSerdeInfo()` as well, that function is a part of hive code.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r169354613
  
    --- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
    @@ -34,28 +34,47 @@
           <artifactId>hive-exec</artifactId>
           <scope>compile</scope>
           <exclusions>
    +        <!--Hive Calcite libraries are not required. When user submits query in Drill via Hive plugin, the query
    +        is validated and planned via Drill Calcite. Hive Calcite can be used only to setup Hive store for Drill unit
    +        testing, where a lot of Hive specific queries are performed. But Drill Calcite and Avatica versions have
    +        conflicts with Hive old Calcite and Avatica versions. That's why Calcite cost based optimizator
    +        (ConfVars.HIVE_CBO_ENABLED) is disabled for Drill Hive JUnit test cases. It can be enabled again once Hive
    +        will leverage the newest Calcite version. To do that check whether Drill Calcite and Avatica versions are
    +        suitable for hive-exec. If no, use Hive Calcite and Avatica versions.
    +        Note: Versions of Calcite libraries are controlled by "DependencyManagement" block in Drill's
    +        root POM file now-->
             <exclusion>
    -          <artifactId>log4j</artifactId>
    -          <groupId>log4j</groupId>
    +          <groupId>org.apache.calcite</groupId>
    +          <artifactId>calcite-core</artifactId>
             </exclusion>
             <exclusion>
    -          <groupId>commons-codec</groupId>
    -          <artifactId>commons-codec</artifactId>
    +          <groupId>org.apache.calcite</groupId>
    +          <artifactId>calcite-avatica</artifactId>
    +        </exclusion>
    +        <exclusion>
    +          <groupId>org.apache.calcite</groupId>
    +          <artifactId>calcite-linq4j</artifactId>
             </exclusion>
             <exclusion>
    -          <artifactId>calcite-avatica</artifactId>
               <groupId>org.apache.calcite</groupId>
    +          <artifactId>calcite-druid</artifactId>
             </exclusion>
           </exclusions>
         </dependency>
    +    <!--Once newer hive-exec version leverages parquet-column 1.9.0, this dependency can be deleted -->
    --- End diff --
    
    Can it be moved to the dependency management if this is still necessary?


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166513062
  
    --- Diff: contrib/storage-hive/hive-exec-shade/pom.xml ---
    @@ -39,23 +39,28 @@
               <groupId>log4j</groupId>
             </exclusion>
             <exclusion>
    -          <groupId>commons-codec</groupId>
    -          <artifactId>commons-codec</artifactId>
    -        </exclusion>
    -        <exclusion>
    -          <artifactId>calcite-avatica</artifactId>
    -          <groupId>org.apache.calcite</groupId>
    +          <groupId>org.json</groupId>
    --- End diff --
    
    Has new version of hive introduced the dependency on org.json:json?


---

[GitHub] drill pull request #1111: DRILL-5978: Updating of Apache and MapR Hive libra...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/1111


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166246336
  
    --- Diff: contrib/storage-hive/core/src/test/java/org/apache/drill/exec/impersonation/hive/TestSqlStdBasedAuthorization.java ---
    @@ -100,7 +103,9 @@ private static void setSqlStdBasedAuthorizationInHiveConf() {
         hiveConfig.put(METASTORE_EXECUTE_SET_UGI.varname, hiveConf.get(METASTORE_EXECUTE_SET_UGI.varname));
         hiveConfig.put(HIVE_AUTHORIZATION_ENABLED.varname, hiveConf.get(HIVE_AUTHORIZATION_ENABLED.varname));
         hiveConfig.put(HIVE_AUTHENTICATOR_MANAGER.varname, SessionStateUserAuthenticator.class.getName());
    -    hiveConfig.put(HIVE_AUTHORIZATION_MANAGER.varname, SQLStdHiveAuthorizerFactory.class.getName());
    --- End diff --
    
    Did it accidentally. I've returned this string.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166282137
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveMetadataProvider.java ---
    @@ -264,6 +265,10 @@ private HiveStats getStatsEstimateFromInputSplits(final List<LogicalInputSplit>
               final List<LogicalInputSplit> splits = Lists.newArrayList();
               final JobConf job = new JobConf(hiveConf);
               HiveUtilities.addConfToJob(job, properties);
    +          if (AcidUtils.isTablePropertyTransactional(properties)) {
    +            AcidUtils.setTransactionalTableScan(job, true);
    +            HiveUtilities.setColumnTypes(job, properties, true, sd);
    +          }
    --- End diff --
    
    It makes sense. Moreover since schema_evolution is required for acid tables [HIVE-12799](https://issues.apache.org/jira/browse/HIVE-12799) I've combined it with setColumnTypes() helper method.


---

[GitHub] drill pull request #1111: DRILL-5978: Upgrade Hive libraries to 2.1.1 versio...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1111#discussion_r166660333
  
    --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java ---
    @@ -507,5 +510,52 @@ public static boolean hasHeaderOrFooter(HiveTableWithColumnCache table) {
         int skipFooter = retrieveIntProperty(tableProperties, serdeConstants.FOOTER_COUNT, -1);
         return skipHeader > 0 || skipFooter > 0;
       }
    +
    +  /**
    +   * This method checks whether the table is transactional and set necessary properties in {@link JobConf}.
    +   * If schema evolution properties aren't set in job conf for the input format, method sets the column names
    +   * and types from table/partition properties or storage descriptor.
    +   *
    +   * @param job the job to update
    +   * @param properties table or partition properties
    +   * @param sd storage descriptor
    +   */
    +  public static void verifyAndAddTransactionalProperties(JobConf job, Properties properties, StorageDescriptor sd) {
    --- End diff --
    
    `job` involves table `properties`, but `JobConf` hasn't any method to return that properties. It is possible to get only one property. But looks like this is not an issue.
    There is in AcidUtils the `isTablePropertyTransactional` method with `Configuration` input parameter.
    Changed. Thank you.


---

[GitHub] drill issue #1111: Upgrade drill-hive libraries to 2.1.1 version.

Posted by priteshm <gi...@git.apache.org>.
Github user priteshm commented on the issue:

    https://github.com/apache/drill/pull/1111
  
    @vrozov can you please review this change?


---