You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by prasadns14 <gi...@git.apache.org> on 2017/10/05 22:26:12 UTC

[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...

GitHub user prasadns14 opened a pull request:

    https://github.com/apache/drill/pull/975

    DRILL-5743: Handling column family and column scan for hbase

    This PR handles the scenario where the projected column list contains both a column family and a column within the same family.
    
    @paul-rogers please review

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/prasadns14/drill DRILL-5743

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/975.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #975
    
----
commit 0469197e18b466dcfbb26b4c95c47c22b683416e
Author: Prasad Nagaraj Subramanya <pr...@gmail.com>
Date:   2017-10-05T22:18:43Z

    DRILL-5743: Handling column family and column scan for hbase

----


---

[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/975#discussion_r143266651
  
    --- Diff: contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestTableGenerator.java ---
    @@ -133,6 +133,43 @@ public static void generateHBaseDataset1(Connection conn, Admin admin, TableName
         table.close();
       }
     
    +  public static void generateHBaseDatasetSingleSchema(Connection conn, Admin admin, TableName tableName, int numberRegions) throws Exception {
    +    if (admin.tableExists(tableName)) {
    +      admin.disableTable(tableName);
    +      admin.deleteTable(tableName);
    +    }
    +
    +    HTableDescriptor desc = new HTableDescriptor(tableName);
    +    desc.addFamily(new HColumnDescriptor("f"));
    +    if (numberRegions > 1) {
    +      admin.createTable(desc, Arrays.copyOfRange(SPLIT_KEYS, 0, numberRegions - 1));
    +    } else {
    +      admin.createTable(desc);
    +    }
    +
    +    BufferedMutator table = conn.getBufferedMutator(tableName);
    +
    +    Put p = new Put("a1".getBytes());
    +    p.addColumn("f".getBytes(), "c1".getBytes(), "21".getBytes());
    +    p.addColumn("f".getBytes(), "c2".getBytes(), "22".getBytes());
    +    p.addColumn("f".getBytes(), "c3".getBytes(), "23".getBytes());
    +    table.mutate(p);
    +
    +    p = new Put("a2".getBytes());
    +    p.addColumn("f".getBytes(), "c1".getBytes(), "11".getBytes());
    --- End diff --
    
    Here, we are deciding to encode names as UTF-8. Is this a standard? Or, is it our own convention? Could we have used some other encoding? If we do, how do we tell the code above what encoding we chose?


---

[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...

Posted by prasadns14 <gi...@git.apache.org>.

Github user prasadns14 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/975#discussion_r143322352
  
    --- Diff: contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestTableGenerator.java ---
    @@ -133,6 +133,43 @@ public static void generateHBaseDataset1(Connection conn, Admin admin, TableName
         table.close();
       }
     
    +  public static void generateHBaseDatasetSingleSchema(Connection conn, Admin admin, TableName tableName, int numberRegions) throws Exception {
    +    if (admin.tableExists(tableName)) {
    +      admin.disableTable(tableName);
    +      admin.deleteTable(tableName);
    +    }
    +
    +    HTableDescriptor desc = new HTableDescriptor(tableName);
    +    desc.addFamily(new HColumnDescriptor("f"));
    +    if (numberRegions > 1) {
    +      admin.createTable(desc, Arrays.copyOfRange(SPLIT_KEYS, 0, numberRegions - 1));
    +    } else {
    +      admin.createTable(desc);
    +    }
    +
    +    BufferedMutator table = conn.getBufferedMutator(tableName);
    +
    +    Put p = new Put("a1".getBytes());
    +    p.addColumn("f".getBytes(), "c1".getBytes(), "21".getBytes());
    +    p.addColumn("f".getBytes(), "c2".getBytes(), "22".getBytes());
    +    p.addColumn("f".getBytes(), "c3".getBytes(), "23".getBytes());
    +    table.mutate(p);
    +
    +    p = new Put("a2".getBytes());
    +    p.addColumn("f".getBytes(), "c1".getBytes(), "11".getBytes());
    --- End diff --
    
    We currently assume encoding to be UTF-8. Support for different encoding can be addressed through DRILL-5825.


---

[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...

Posted by prasadns14 <gi...@git.apache.org>.

Github user prasadns14 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/975#discussion_r143322351
  
    --- Diff: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java ---
    @@ -97,6 +97,7 @@ public HBaseRecordReader(Connection connection, HBaseSubScan.HBaseSubScanSpec su
       @Override
       protected Collection<SchemaPath> transformColumns(Collection<SchemaPath> columns) {
         Set<SchemaPath> transformed = Sets.newLinkedHashSet();
    +    Set<String> completeFamilies = Sets.newHashSet();
    --- End diff --
    
    I observed that the planner takes care of it. It returns a single column family if there are more than one column family with same name but different case.
    I still made the change to make it case insensitive.


---

[GitHub] drill issue #975: DRILL-5743: Handling column family and column scan for hba...

Posted by prasadns14 <gi...@git.apache.org>.

Github user prasadns14 commented on the issue:

    https://github.com/apache/drill/pull/975
  
    @paul-rogers please review


---

[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/975#discussion_r143264890
  
    --- Diff: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java ---
    @@ -97,6 +97,7 @@ public HBaseRecordReader(Connection connection, HBaseSubScan.HBaseSubScanSpec su
       @Override
       protected Collection<SchemaPath> transformColumns(Collection<SchemaPath> columns) {
         Set<SchemaPath> transformed = Sets.newLinkedHashSet();
    +    Set<String> completeFamilies = Sets.newHashSet();
    --- End diff --
    
    Do we need to worry about case sensitivity here?


---

[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/975


---

[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/975#discussion_r143265321
  
    --- Diff: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java ---
    @@ -109,11 +110,14 @@ public HBaseRecordReader(Connection connection, HBaseSubScan.HBaseSubScanSpec su
             byte[] family = root.getPath().getBytes();
             transformed.add(SchemaPath.getSimplePath(root.getPath()));
             PathSegment child = root.getChild();
    -        if (child != null && child.isNamed()) {
    -          byte[] qualifier = child.getNameSegment().getPath().getBytes();
    -          hbaseScan.addColumn(family, qualifier);
    -        } else {
    -          hbaseScan.addFamily(family);
    +        if (!completeFamilies.contains(new String(family))) {
    +          if (child != null && child.isNamed()) {
    +            byte[] qualifier = child.getNameSegment().getPath().getBytes();
    --- End diff --
    
    This assumes UTF-8 encoding for the name. Can we be sure that HBase always uses UTF-8 for its encoding? Or, does HBase only support ASCII names so that we need only the ASCII subset of UTF-8? What happens if the user puts a non-ASCII character into the name in this case?


---

[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...

Posted by paul-rogers <gi...@git.apache.org>.

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/975#discussion_r143265009
  
    --- Diff: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java ---
    @@ -109,11 +110,14 @@ public HBaseRecordReader(Connection connection, HBaseSubScan.HBaseSubScanSpec su
             byte[] family = root.getPath().getBytes();
             transformed.add(SchemaPath.getSimplePath(root.getPath()));
             PathSegment child = root.getChild();
    -        if (child != null && child.isNamed()) {
    -          byte[] qualifier = child.getNameSegment().getPath().getBytes();
    -          hbaseScan.addColumn(family, qualifier);
    -        } else {
    -          hbaseScan.addFamily(family);
    +        if (!completeFamilies.contains(new String(family))) {
    +          if (child != null && child.isNamed()) {
    +            byte[] qualifier = child.getNameSegment().getPath().getBytes();
    +            hbaseScan.addColumn(family, qualifier);
    +          } else {
    +            hbaseScan.addFamily(family);
    +            completeFamilies.add(new String(family));
    +          }
    --- End diff --
    
    This code would greatly benefit from a comment to explain what's happening. Would suggest a Javadoc comment for the function explaining the transform rules.


---