You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by prasadns14 <gi...@git.apache.org> on 2017/10/05 22:26:12 UTC
[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...
GitHub user prasadns14 opened a pull request:
https://github.com/apache/drill/pull/975
DRILL-5743: Handling column family and column scan for hbase
This PR handles the scenario where the projected column list contains both a column family and a column within the same family.
@paul-rogers please review
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/prasadns14/drill DRILL-5743
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/975.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #975
----
commit 0469197e18b466dcfbb26b4c95c47c22b683416e
Author: Prasad Nagaraj Subramanya <pr...@gmail.com>
Date: 2017-10-05T22:18:43Z
DRILL-5743: Handling column family and column scan for hbase
----
---
[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...
Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/975#discussion_r143266651
--- Diff: contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestTableGenerator.java ---
@@ -133,6 +133,43 @@ public static void generateHBaseDataset1(Connection conn, Admin admin, TableName
table.close();
}
+ public static void generateHBaseDatasetSingleSchema(Connection conn, Admin admin, TableName tableName, int numberRegions) throws Exception {
+ if (admin.tableExists(tableName)) {
+ admin.disableTable(tableName);
+ admin.deleteTable(tableName);
+ }
+
+ HTableDescriptor desc = new HTableDescriptor(tableName);
+ desc.addFamily(new HColumnDescriptor("f"));
+ if (numberRegions > 1) {
+ admin.createTable(desc, Arrays.copyOfRange(SPLIT_KEYS, 0, numberRegions - 1));
+ } else {
+ admin.createTable(desc);
+ }
+
+ BufferedMutator table = conn.getBufferedMutator(tableName);
+
+ Put p = new Put("a1".getBytes());
+ p.addColumn("f".getBytes(), "c1".getBytes(), "21".getBytes());
+ p.addColumn("f".getBytes(), "c2".getBytes(), "22".getBytes());
+ p.addColumn("f".getBytes(), "c3".getBytes(), "23".getBytes());
+ table.mutate(p);
+
+ p = new Put("a2".getBytes());
+ p.addColumn("f".getBytes(), "c1".getBytes(), "11".getBytes());
--- End diff --
Here, we are deciding to encode names as UTF-8. Is this a standard? Or, is it our own convention? Could we have used some other encoding? If we do, how do we tell the code above what encoding we chose?
---
[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...
Posted by prasadns14 <gi...@git.apache.org>.
Github user prasadns14 commented on a diff in the pull request:
https://github.com/apache/drill/pull/975#discussion_r143322352
--- Diff: contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestTableGenerator.java ---
@@ -133,6 +133,43 @@ public static void generateHBaseDataset1(Connection conn, Admin admin, TableName
table.close();
}
+ public static void generateHBaseDatasetSingleSchema(Connection conn, Admin admin, TableName tableName, int numberRegions) throws Exception {
+ if (admin.tableExists(tableName)) {
+ admin.disableTable(tableName);
+ admin.deleteTable(tableName);
+ }
+
+ HTableDescriptor desc = new HTableDescriptor(tableName);
+ desc.addFamily(new HColumnDescriptor("f"));
+ if (numberRegions > 1) {
+ admin.createTable(desc, Arrays.copyOfRange(SPLIT_KEYS, 0, numberRegions - 1));
+ } else {
+ admin.createTable(desc);
+ }
+
+ BufferedMutator table = conn.getBufferedMutator(tableName);
+
+ Put p = new Put("a1".getBytes());
+ p.addColumn("f".getBytes(), "c1".getBytes(), "21".getBytes());
+ p.addColumn("f".getBytes(), "c2".getBytes(), "22".getBytes());
+ p.addColumn("f".getBytes(), "c3".getBytes(), "23".getBytes());
+ table.mutate(p);
+
+ p = new Put("a2".getBytes());
+ p.addColumn("f".getBytes(), "c1".getBytes(), "11".getBytes());
--- End diff --
We currently assume encoding to be UTF-8. Support for different encoding can be addressed through DRILL-5825.
---
[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...
Posted by prasadns14 <gi...@git.apache.org>.
Github user prasadns14 commented on a diff in the pull request:
https://github.com/apache/drill/pull/975#discussion_r143322351
--- Diff: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java ---
@@ -97,6 +97,7 @@ public HBaseRecordReader(Connection connection, HBaseSubScan.HBaseSubScanSpec su
@Override
protected Collection<SchemaPath> transformColumns(Collection<SchemaPath> columns) {
Set<SchemaPath> transformed = Sets.newLinkedHashSet();
+ Set<String> completeFamilies = Sets.newHashSet();
--- End diff --
I observed that the planner takes care of it. It returns a single column family if there are more than one column family with same name but different case.
I still made the change to make it case insensitive.
---
[GitHub] drill issue #975: DRILL-5743: Handling column family and column scan for hba...
Posted by prasadns14 <gi...@git.apache.org>.
Github user prasadns14 commented on the issue:
https://github.com/apache/drill/pull/975
@paul-rogers please review
---
[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...
Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/975#discussion_r143264890
--- Diff: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java ---
@@ -97,6 +97,7 @@ public HBaseRecordReader(Connection connection, HBaseSubScan.HBaseSubScanSpec su
@Override
protected Collection<SchemaPath> transformColumns(Collection<SchemaPath> columns) {
Set<SchemaPath> transformed = Sets.newLinkedHashSet();
+ Set<String> completeFamilies = Sets.newHashSet();
--- End diff --
Do we need to worry about case sensitivity here?
---
[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/drill/pull/975
---
[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...
Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/975#discussion_r143265321
--- Diff: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java ---
@@ -109,11 +110,14 @@ public HBaseRecordReader(Connection connection, HBaseSubScan.HBaseSubScanSpec su
byte[] family = root.getPath().getBytes();
transformed.add(SchemaPath.getSimplePath(root.getPath()));
PathSegment child = root.getChild();
- if (child != null && child.isNamed()) {
- byte[] qualifier = child.getNameSegment().getPath().getBytes();
- hbaseScan.addColumn(family, qualifier);
- } else {
- hbaseScan.addFamily(family);
+ if (!completeFamilies.contains(new String(family))) {
+ if (child != null && child.isNamed()) {
+ byte[] qualifier = child.getNameSegment().getPath().getBytes();
--- End diff --
This assumes UTF-8 encoding for the name. Can we be sure that HBase always uses UTF-8 for its encoding? Or, does HBase only support ASCII names so that we need only the ASCII subset of UTF-8? What happens if the user puts a non-ASCII character into the name in this case?
---
[GitHub] drill pull request #975: DRILL-5743: Handling column family and column scan ...
Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/975#discussion_r143265009
--- Diff: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java ---
@@ -109,11 +110,14 @@ public HBaseRecordReader(Connection connection, HBaseSubScan.HBaseSubScanSpec su
byte[] family = root.getPath().getBytes();
transformed.add(SchemaPath.getSimplePath(root.getPath()));
PathSegment child = root.getChild();
- if (child != null && child.isNamed()) {
- byte[] qualifier = child.getNameSegment().getPath().getBytes();
- hbaseScan.addColumn(family, qualifier);
- } else {
- hbaseScan.addFamily(family);
+ if (!completeFamilies.contains(new String(family))) {
+ if (child != null && child.isNamed()) {
+ byte[] qualifier = child.getNameSegment().getPath().getBytes();
+ hbaseScan.addColumn(family, qualifier);
+ } else {
+ hbaseScan.addFamily(family);
+ completeFamilies.add(new String(family));
+ }
--- End diff --
This code would greatly benefit from a comment to explain what's happening. Would suggest a Javadoc comment for the function explaining the transform rules.
---