You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by jinfengni <gi...@git.apache.org> on 2017/07/05 23:48:39 UTC

[GitHub] drill pull request #805: Drill-4139: Exception while trying to prune partiti...

Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/805#discussion_r125784066
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java ---
    @@ -1008,8 +1008,24 @@ public void setMax(Object max) {
           return nulls;
         }
     
    -    @Override public boolean hasSingleValue() {
    -      return (max != null && min != null && max.equals(min));
    +    /**
    +     * Checks that the column chunk has single value.
    +     * Returns true if min and max are the same, but not null.
    +     * Returns true if min and max are null and the number of null values
    +     * in the column chunk is greater than 0.
    +     *
    +     * @return true if column has single value
    --- End diff --
    
    My understanding is hasSingleValue() return true if the column meta data shows only one single value.  A null value is also counted as a different value from other non-null value.
    
    Therefore, for the case of  column has min != null && max !=null && min.equals(max) && nulls!=null && nulls > 0, it should return false. However, in both the implementation of v1 and v3, it would return true. 
    
    That would actually lead to wrong query result.  A simple reproduce:
    
    ```
    create table dfs.tmp.`t5/a` as select 100 as mykey from cp.`tpch/nation.parquet` union all select col_notexist from cp.`tpch/region.parquet`;
    
    create table dfs.tmp.`t5/b` as select 200 as mykey from cp.`tpch/nation.parquet` union all select col_notexist from cp.`tpch/region.parquet`;
    ```
    
    We got two files, each having one single unique non-null value, plus null values. Now query the two files:
    
    ```
    select mykey from dfs.tmp.`t5` where mykey = 100;
    +--------+
    | mykey  |
    +--------+
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | 100    |
    | null   |
    | null   |
    | null   |
    | null   |
    | null   |
    +--------+
    30 rows selected (0.246 seconds)
    
    ```
    Apparently, those 5 nulls should not be returned. 
    
    I applied the 3 commits in this PR on top of today's master branch.
    
    ```
    select * from sys.version;
    +------------------+-------------------------------------------+-------------------------------------------------------------------------------+----------------------------+-----------------+----------------------------+
    |     version      |                 commit_id                 |                                commit_message                                 |        commit_time         |   build_email   |         build_time         |
    +------------------+-------------------------------------------+-------------------------------------------------------------------------------+----------------------------+-----------------+----------------------------+
    | 1.11.0-SNAPSHOT  | cad6e4dc950aa4a95ad20515ce5abd9c546d3e5d  | DRILL-4139: Fix loss of scale value for DECIMAL in parquet partition pruning  | 05.07.2017 @ 12:05:25 PDT  | jni@apache.org  | 05.07.2017 @ 12:06:07 PDT  |
    +------------------+-------------------------------------------+-----
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---