You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@metamodel.apache.org by ka...@apache.org on 2019/04/01 17:36:23 UTC

[metamodel] branch master updated (cb581b9 -> 6cc2857)

This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git.


    from cb581b9  Upgraded to Apache POI 4.0.1.
     new 0c078a9  Usually hbase has a lot of data to write, in order to spread the write pressure, the pre-partition will be created in advance, so I add a method of creating a table with splitKey.
     new d37aaa3  change doc of test case method
     new a146e93  Spell out "column families" instead of the "cf" abbreviation
     new c122d50  Empty commit to trigger rebuild
     new a5c826f  Merge remote-tracking branch 'upstream/master'
     new 7250ba1  A filter that will only return the first KV from each row.This filter can be used to more efficiently perform row count operations.
     new d0454b8  reset
     new 6648ff0  Merge branch 'master' into q977734161_master
     new 6c5182a  Merge pull request #1 from kaspersorensen/q977734161_master
     new e6f8d20  Merge remote-tracking branch 'origin/master'
     new b3663f6  A filter that will only return the first KV from each row.This filter can be used to more efficiently perform row count operations.
     new 164fdfd  Empty commit to trigger rebuild
     new 44e6ace  Empty commit to trigger rebuild
     new 00807d3  Empty commit to trigger rebuild
     new 9f9cb8c  Merge remote-tracking branch 'upstream/master'
     new 6cc2857  Updated CHANGES.md

The 16 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.md                                                           | 1 +
 hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)


[metamodel] 06/16: Merge remote-tracking branch 'upstream/master'

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit a5c826fb79e04f3d378285fd57d4c2da9895b266
Merge: c122d50 9c99985
Author: 李小保 <li...@mininglamp.com>
AuthorDate: Fri Mar 29 11:03:44 2019 +0800

    Merge remote-tracking branch 'upstream/master'

 CHANGES.md                                         |    3 +
 {csv => arff}/pom.xml                              |   31 +-
 .../org/apache/metamodel/arff/ArffDataContext.java |  221 +
 .../org/apache/metamodel/arff/ArffDataSet.java     |  130 +
 .../apache/metamodel/arff/ArffDataContextTest.java |  134 +
 .../test/resources/weka-data/ReutersCorn-test.arff |  611 +++
 .../resources/weka-data/ReutersCorn-train.arff     | 1561 +++++++
 .../resources/weka-data/ReutersGrain-test.arff     |  611 +++
 .../resources/weka-data/ReutersGrain-train.arff    | 1561 +++++++
 arff/src/test/resources/weka-data/airline.arff     |  152 +
 .../test/resources/weka-data/breast-cancer.arff    |  394 ++
 .../test/resources/weka-data/contact-lenses.arff   |   85 +
 arff/src/test/resources/weka-data/cpu.arff         |  226 +
 .../test/resources/weka-data/cpu.with.vendor.arff  |  225 +
 arff/src/test/resources/weka-data/credit-g.arff    | 1301 ++++++
 arff/src/test/resources/weka-data/diabetes.arff    |  863 ++++
 arff/src/test/resources/weka-data/glass.arff       |  332 ++
 arff/src/test/resources/weka-data/hypothyroid.arff | 3887 ++++++++++++++++
 arff/src/test/resources/weka-data/ionosphere.arff  |  458 ++
 arff/src/test/resources/weka-data/iris.2D.arff     |  157 +
 arff/src/test/resources/weka-data/iris.arff        |  225 +
 arff/src/test/resources/weka-data/labor.arff       |  164 +
 .../resources/weka-data/segment-challenge.arff     | 1607 +++++++
 .../src/test/resources/weka-data/segment-test.arff |  916 ++++
 arff/src/test/resources/weka-data/soybean.arff     |  818 ++++
 arff/src/test/resources/weka-data/supermarket.arff | 4846 ++++++++++++++++++++
 arff/src/test/resources/weka-data/unbalanced.arff  |  893 ++++
 arff/src/test/resources/weka-data/vote.arff        |  651 +++
 .../test/resources/weka-data/weather.nominal.arff  |   23 +
 .../test/resources/weka-data/weather.numeric.arff  |   23 +
 .../apache/metamodel/query/MapValueFunction.java   |    8 +-
 .../java/org/apache/metamodel/schema/Column.java   |    5 +-
 .../org/apache/metamodel/util/CollectionUtils.java |   81 +-
 .../java/org/apache/metamodel/util/FileHelper.java |  165 +-
 .../org/apache/metamodel/util/UnicodeWriter.java   |  131 +-
 .../metamodel/query/MapValueFunctionTest.java      |   28 +
 .../apache/metamodel/util/CollectionUtilsTest.java |   80 +
 .../org/apache/metamodel/util/FileHelperTest.java  |    2 +-
 csv/pom.xml                                        |    1 -
 kafka/pom.xml                                      |   28 +-
 pom.xml                                            |    5 +
 41 files changed, 23479 insertions(+), 164 deletions(-)


[metamodel] 07/16: A filter that will only return the first KV from each row.This filter can be used to more efficiently perform row count operations.

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit 7250ba1e5d968fa49d81b02ffaa4aaa685c39a7a
Author: 李小保 <li...@mininglamp.com>
AuthorDate: Fri Mar 29 13:49:25 2019 +0800

    A filter that will only return the first KV from each row.This filter can be used to more efficiently perform row count operations.
---
 hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java b/hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java
index 1d0db49..fd2125e 100644
--- a/hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java
+++ b/hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java
@@ -31,6 +31,7 @@ import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.client.ResultScanner;
 import org.apache.hadoop.hbase.client.Scan;
 import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
 import org.apache.hadoop.hbase.filter.PageFilter;
 import org.apache.metamodel.DataContext;
 import org.apache.metamodel.MetaModelException;
@@ -171,7 +172,9 @@ public class HBaseDataContext extends QueryPostprocessDataContext implements Upd
         long result = 0;
         final org.apache.hadoop.hbase.client.Table hTable = getHTable(table.getName());
         try {
-            ResultScanner scanner = hTable.getScanner(new Scan());
+            Scan scan = new Scan();
+            scan.setFilter(new FirstKeyOnlyFilter());
+            ResultScanner scanner = hTable.getScanner(scan);
             try {
                 while (scanner.next() != null) {
                     result++;


[metamodel] 04/16: Merge branch 'master' into q977734161_master

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit 6648ff0327219c53e0969c019dca8afd8877f9e9
Merge: a146e93 8997ba5
Author: Kasper Sørensen <i....@gmail.com>
AuthorDate: Sun Mar 24 19:35:28 2019 -0700

    Merge branch 'master' into q977734161_master

 CHANGES.md                                         |    2 +
 {csv => arff}/pom.xml                              |   31 +-
 .../org/apache/metamodel/arff/ArffDataContext.java |  221 +
 .../org/apache/metamodel/arff/ArffDataSet.java     |  130 +
 .../apache/metamodel/arff/ArffDataContextTest.java |  134 +
 .../test/resources/weka-data/ReutersCorn-test.arff |  611 +++
 .../resources/weka-data/ReutersCorn-train.arff     | 1561 +++++++
 .../resources/weka-data/ReutersGrain-test.arff     |  611 +++
 .../resources/weka-data/ReutersGrain-train.arff    | 1561 +++++++
 arff/src/test/resources/weka-data/airline.arff     |  152 +
 .../test/resources/weka-data/breast-cancer.arff    |  394 ++
 .../test/resources/weka-data/contact-lenses.arff   |   85 +
 arff/src/test/resources/weka-data/cpu.arff         |  226 +
 .../test/resources/weka-data/cpu.with.vendor.arff  |  225 +
 arff/src/test/resources/weka-data/credit-g.arff    | 1301 ++++++
 arff/src/test/resources/weka-data/diabetes.arff    |  863 ++++
 arff/src/test/resources/weka-data/glass.arff       |  332 ++
 arff/src/test/resources/weka-data/hypothyroid.arff | 3887 ++++++++++++++++
 arff/src/test/resources/weka-data/ionosphere.arff  |  458 ++
 arff/src/test/resources/weka-data/iris.2D.arff     |  157 +
 arff/src/test/resources/weka-data/iris.arff        |  225 +
 arff/src/test/resources/weka-data/labor.arff       |  164 +
 .../resources/weka-data/segment-challenge.arff     | 1607 +++++++
 .../src/test/resources/weka-data/segment-test.arff |  916 ++++
 arff/src/test/resources/weka-data/soybean.arff     |  818 ++++
 arff/src/test/resources/weka-data/supermarket.arff | 4846 ++++++++++++++++++++
 arff/src/test/resources/weka-data/unbalanced.arff  |  893 ++++
 arff/src/test/resources/weka-data/vote.arff        |  651 +++
 .../test/resources/weka-data/weather.nominal.arff  |   23 +
 .../test/resources/weka-data/weather.numeric.arff  |   23 +
 .../apache/metamodel/query/MapValueFunction.java   |    8 +-
 .../java/org/apache/metamodel/schema/Column.java   |    5 +-
 .../org/apache/metamodel/util/CollectionUtils.java |   81 +-
 .../java/org/apache/metamodel/util/FileHelper.java |  165 +-
 .../org/apache/metamodel/util/UnicodeWriter.java   |  131 +-
 .../metamodel/query/MapValueFunctionTest.java      |   28 +
 .../apache/metamodel/util/CollectionUtilsTest.java |   80 +
 .../org/apache/metamodel/util/FileHelperTest.java  |    2 +-
 csv/pom.xml                                        |    1 -
 kafka/pom.xml                                      |   28 +-
 pom.xml                                            |    5 +
 41 files changed, 23478 insertions(+), 164 deletions(-)


[metamodel] 11/16: A filter that will only return the first KV from each row.This filter can be used to more efficiently perform row count operations.

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit b3663f6b6cc24712444c4af5d4748e9d983966b5
Author: 李小保 <li...@mininglamp.com>
AuthorDate: Fri Mar 29 14:24:37 2019 +0800

    A filter that will only return the first KV from each row.This filter can be used to more efficiently perform row count operations.
---
 hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java b/hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java
index 1d0db49..fd2125e 100644
--- a/hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java
+++ b/hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java
@@ -31,6 +31,7 @@ import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.client.ResultScanner;
 import org.apache.hadoop.hbase.client.Scan;
 import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
 import org.apache.hadoop.hbase.filter.PageFilter;
 import org.apache.metamodel.DataContext;
 import org.apache.metamodel.MetaModelException;
@@ -171,7 +172,9 @@ public class HBaseDataContext extends QueryPostprocessDataContext implements Upd
         long result = 0;
         final org.apache.hadoop.hbase.client.Table hTable = getHTable(table.getName());
         try {
-            ResultScanner scanner = hTable.getScanner(new Scan());
+            Scan scan = new Scan();
+            scan.setFilter(new FirstKeyOnlyFilter());
+            ResultScanner scanner = hTable.getScanner(scan);
             try {
                 while (scanner.next() != null) {
                     result++;


[metamodel] 15/16: Merge remote-tracking branch 'upstream/master'

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit 9f9cb8c6ae655fb86c01c7a45fdda14bbfeb878c
Merge: 00807d3 cb581b9
Author: 李小保 <li...@mininglamp.com>
AuthorDate: Sat Mar 30 11:48:48 2019 +0800

    Merge remote-tracking branch 'upstream/master'

 CHANGES.md                                         |  2 +-
 .../metamodel/arff/ArffDataContextFactory.java     | 49 +++++++++++++++++++++
 ...org.apache.metamodel.factory.DataContextFactory |  1 +
 .../nativeclient/ElasticSearchDataContextTest.java | 50 +++++++++++-----------
 excel/pom.xml                                      |  2 +-
 full/pom.xml                                       |  5 +++
 pom.xml                                            |  1 +
 7 files changed, 84 insertions(+), 26 deletions(-)


[metamodel] 08/16: Merge pull request #1 from kaspersorensen/q977734161_master

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit 6c5182ae3653fa52e1886b5838f8c02972ebebe5
Merge: 7250ba1 6648ff0
Author: lixiaobao <li...@gmail.com>
AuthorDate: Fri Mar 29 13:51:32 2019 +0800

    Merge pull request #1 from kaspersorensen/q977734161_master
    
    Merge in master to your master



[metamodel] 03/16: Spell out "column families" instead of the "cf" abbreviation

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit a146e93253be13c60daf5c30ab58aa4623154acd
Author: 李小保 <li...@mininglamp.com>
AuthorDate: Sat Mar 23 12:00:03 2019 +0800

    Spell out "column families" instead of the "cf" abbreviation
---
 hbase/src/main/java/org/apache/metamodel/hbase/HBaseClient.java | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hbase/src/main/java/org/apache/metamodel/hbase/HBaseClient.java b/hbase/src/main/java/org/apache/metamodel/hbase/HBaseClient.java
index d1cd260..07005e4 100644
--- a/hbase/src/main/java/org/apache/metamodel/hbase/HBaseClient.java
+++ b/hbase/src/main/java/org/apache/metamodel/hbase/HBaseClient.java
@@ -151,7 +151,7 @@ final class HBaseClient {
      * @throws MetaModelException when a {@link IOException} is caught
      */
     public void createTable(final String tableName, final Set<String> columnFamilies) {
-        checkTableAndCf(tableName, columnFamilies);
+        checkTableAndColumnFamilies(tableName, columnFamilies);
         try (final Admin admin = _connection.getAdmin()) {
             final TableDescriptorBuilder tableBuilder = getTableDescriptorBuilder(tableName, columnFamilies);
             admin.createTable(tableBuilder.build());
@@ -169,7 +169,7 @@ final class HBaseClient {
      * @throws MetaModelException when a {@link IOException} is caught
      */
     public void createTable(final String tableName, final Set<String> columnFamilies, byte[][] splitKeys) {
-        checkTableAndCf(tableName, columnFamilies);
+        checkTableAndColumnFamilies(tableName, columnFamilies);
         try (final Admin admin = _connection.getAdmin()) {
             final TableDescriptorBuilder tableBuilder = getTableDescriptorBuilder(tableName, columnFamilies);
             admin.createTable(tableBuilder.build(),splitKeys);
@@ -192,7 +192,7 @@ final class HBaseClient {
         return tableBuilder;
     }
 
-    private void checkTableAndCf(String tableName, Set<String> columnFamilies) {
+    private void checkTableAndColumnFamilies(String tableName, Set<String> columnFamilies) {
         if (tableName == null || columnFamilies == null || columnFamilies.isEmpty()) {
             throw new IllegalArgumentException("Can't create a table without having the tableName or columnFamilies");
         }


[metamodel] 12/16: Empty commit to trigger rebuild

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit 164fdfd15f955bf700b42dc0904d1eb6f2afbb47
Author: 李小保 <li...@mininglamp.com>
AuthorDate: Fri Mar 29 21:51:03 2019 +0800

    Empty commit to trigger rebuild


[metamodel] 16/16: Updated CHANGES.md

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit 6cc2857323192003e94b9692886196f255a3689c
Author: Kasper Sørensen <i....@gmail.com>
AuthorDate: Mon Apr 1 10:36:10 2019 -0700

    Updated CHANGES.md
---
 CHANGES.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGES.md b/CHANGES.md
index 00490c9..c51a5fd 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -4,6 +4,7 @@
  * [METAMODEL-1207] - Fix JDBC Database version parser edge cases.
  * [METAMODEL-1172] - Made MAP_VALUE function capable of also navigating lists using square bracket notations.
  * Added ability to specify "split key" attribute for new tables in HBase.
+ * Improved performance of "count" queries on HBase by applying FirstKeyOnlyFilter.
 
 ### Apache MetaModel 5.2.1
 


[metamodel] 02/16: change doc of test case method

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit d37aaa30997421d8eebd6f72ad63261d6bec14cf
Author: 李小保 <li...@mininglamp.com>
AuthorDate: Fri Mar 22 11:41:52 2019 +0800

    change doc of test case method
---
 hbase/src/test/java/org/apache/metamodel/hbase/CreateTableTest.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hbase/src/test/java/org/apache/metamodel/hbase/CreateTableTest.java b/hbase/src/test/java/org/apache/metamodel/hbase/CreateTableTest.java
index c0779f9..4e6adb8 100644
--- a/hbase/src/test/java/org/apache/metamodel/hbase/CreateTableTest.java
+++ b/hbase/src/test/java/org/apache/metamodel/hbase/CreateTableTest.java
@@ -113,7 +113,7 @@ public class CreateTableTest extends HBaseUpdateCallbackTest {
     }
 
     /**
-     * Goodflow. Create a table without the ID-Column, should work
+     * Goodflow. Create a table with splitKey, should work
      *
      * @throws IOException
      */


[metamodel] 09/16: reset

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit d0454b854abdb86150d844d29bd6546dde0e8c3c
Author: 李小保 <li...@mininglamp.com>
AuthorDate: Fri Mar 29 14:22:10 2019 +0800

    reset
---
 hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java b/hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java
index fd2125e..1d0db49 100644
--- a/hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java
+++ b/hbase/src/main/java/org/apache/metamodel/hbase/HBaseDataContext.java
@@ -31,7 +31,6 @@ import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.client.ResultScanner;
 import org.apache.hadoop.hbase.client.Scan;
 import org.apache.hadoop.hbase.client.TableDescriptor;
-import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
 import org.apache.hadoop.hbase.filter.PageFilter;
 import org.apache.metamodel.DataContext;
 import org.apache.metamodel.MetaModelException;
@@ -172,9 +171,7 @@ public class HBaseDataContext extends QueryPostprocessDataContext implements Upd
         long result = 0;
         final org.apache.hadoop.hbase.client.Table hTable = getHTable(table.getName());
         try {
-            Scan scan = new Scan();
-            scan.setFilter(new FirstKeyOnlyFilter());
-            ResultScanner scanner = hTable.getScanner(scan);
+            ResultScanner scanner = hTable.getScanner(new Scan());
             try {
                 while (scanner.next() != null) {
                     result++;


[metamodel] 13/16: Empty commit to trigger rebuild

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit 44e6ace5552ad776336123159bd1f42373f54323
Author: 李小保 <li...@mininglamp.com>
AuthorDate: Fri Mar 29 23:15:58 2019 +0800

    Empty commit to trigger rebuild


[metamodel] 14/16: Empty commit to trigger rebuild

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit 00807d3002fa2865d620a75e6c611a354d336d3e
Author: 李小保 <li...@mininglamp.com>
AuthorDate: Fri Mar 29 23:31:28 2019 +0800

    Empty commit to trigger rebuild


[metamodel] 10/16: Merge remote-tracking branch 'origin/master'

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit e6f8d20d75df863b1621878f05ae569d91ea333f
Merge: d0454b8 6c5182a
Author: 李小保 <li...@mininglamp.com>
AuthorDate: Fri Mar 29 14:22:26 2019 +0800

    Merge remote-tracking branch 'origin/master'



[metamodel] 05/16: Empty commit to trigger rebuild

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit c122d50964331a384f40f77aef05913291c01ed7
Author: 李小保 <li...@mininglamp.com>
AuthorDate: Mon Mar 25 10:50:29 2019 +0800

    Empty commit to trigger rebuild


[metamodel] 01/16: Usually hbase has a lot of data to write, in order to spread the write pressure, the pre-partition will be created in advance, so I add a method of creating a table with splitKey.

Posted by ka...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kaspersor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metamodel.git

commit 0c078a9d054ec8cd5a55a4a519c6446a743a8049
Author: 李小保 <li...@mininglamp.com>
AuthorDate: Fri Mar 22 11:38:45 2019 +0800

    Usually hbase has a lot of data to write, in order to spread the write pressure, the pre-partition will be created in advance, so I add a method of creating a table with splitKey.
---
 .../org/apache/metamodel/hbase/HBaseClient.java    | 53 ++++++++++++++++------
 .../metamodel/hbase/HBaseCreateTableBuilder.java   | 21 +++++++--
 .../apache/metamodel/hbase/CreateTableTest.java    | 33 ++++++++++++++
 3 files changed, 90 insertions(+), 17 deletions(-)

diff --git a/hbase/src/main/java/org/apache/metamodel/hbase/HBaseClient.java b/hbase/src/main/java/org/apache/metamodel/hbase/HBaseClient.java
index 2b25d84..d1cd260 100644
--- a/hbase/src/main/java/org/apache/metamodel/hbase/HBaseClient.java
+++ b/hbase/src/main/java/org/apache/metamodel/hbase/HBaseClient.java
@@ -151,20 +151,9 @@ final class HBaseClient {
      * @throws MetaModelException when a {@link IOException} is caught
      */
     public void createTable(final String tableName, final Set<String> columnFamilies) {
-        if (tableName == null || columnFamilies == null || columnFamilies.isEmpty()) {
-            throw new IllegalArgumentException("Can't create a table without having the tableName or columnFamilies");
-        }
+        checkTableAndCf(tableName, columnFamilies);
         try (final Admin admin = _connection.getAdmin()) {
-            final TableName hBasetableName = TableName.valueOf(tableName);
-            final TableDescriptorBuilder tableBuilder = TableDescriptorBuilder.newBuilder(hBasetableName);
-            // Add all columnFamilies to the tableDescriptor.
-            for (final String columnFamily : columnFamilies) {
-                // The ID-column isn't needed because, it will automatically be created.
-                if (!columnFamily.equals(HBaseDataContext.FIELD_ID)) {
-                    final ColumnFamilyDescriptor columnDescriptor = ColumnFamilyDescriptorBuilder.of(columnFamily);
-                    tableBuilder.setColumnFamily(columnDescriptor);
-                }
-            }
+            final TableDescriptorBuilder tableBuilder = getTableDescriptorBuilder(tableName, columnFamilies);
             admin.createTable(tableBuilder.build());
         } catch (IOException e) {
             throw new MetaModelException(e);
@@ -172,6 +161,44 @@ final class HBaseClient {
     }
 
     /**
+     * Creates a HBase table based on a tableName and it's columnFamilies and splitKeys
+     * @param tableName
+     * @param columnFamilies
+     * @param splitKeys
+     * @throws IllegalArgumentException when any parameter is null
+     * @throws MetaModelException when a {@link IOException} is caught
+     */
+    public void createTable(final String tableName, final Set<String> columnFamilies, byte[][] splitKeys) {
+        checkTableAndCf(tableName, columnFamilies);
+        try (final Admin admin = _connection.getAdmin()) {
+            final TableDescriptorBuilder tableBuilder = getTableDescriptorBuilder(tableName, columnFamilies);
+            admin.createTable(tableBuilder.build(),splitKeys);
+        } catch (IOException e) {
+            throw new MetaModelException(e);
+        }
+    }
+
+    private TableDescriptorBuilder getTableDescriptorBuilder(String tableName, Set<String> columnFamilies) {
+        final TableName hBasetableName = TableName.valueOf(tableName);
+        final TableDescriptorBuilder tableBuilder = TableDescriptorBuilder.newBuilder(hBasetableName);
+        // Add all columnFamilies to the tableDescriptor.
+        for (final String columnFamily : columnFamilies) {
+            // The ID-column isn't needed because, it will automatically be created.
+            if (!columnFamily.equals(HBaseDataContext.FIELD_ID)) {
+                final ColumnFamilyDescriptor columnDescriptor = ColumnFamilyDescriptorBuilder.of(columnFamily);
+                tableBuilder.setColumnFamily(columnDescriptor);
+            }
+        }
+        return tableBuilder;
+    }
+
+    private void checkTableAndCf(String tableName, Set<String> columnFamilies) {
+        if (tableName == null || columnFamilies == null || columnFamilies.isEmpty()) {
+            throw new IllegalArgumentException("Can't create a table without having the tableName or columnFamilies");
+        }
+    }
+
+    /**
      * Disable and drop a table from a HBase datastore
      * @param tableName
      * @throws IllegalArgumentException when tableName is null
diff --git a/hbase/src/main/java/org/apache/metamodel/hbase/HBaseCreateTableBuilder.java b/hbase/src/main/java/org/apache/metamodel/hbase/HBaseCreateTableBuilder.java
index aa722c3..37bafd4 100644
--- a/hbase/src/main/java/org/apache/metamodel/hbase/HBaseCreateTableBuilder.java
+++ b/hbase/src/main/java/org/apache/metamodel/hbase/HBaseCreateTableBuilder.java
@@ -33,13 +33,14 @@ import org.apache.metamodel.util.SimpleTableDef;
  */
 class HBaseCreateTableBuilder extends AbstractTableCreationBuilder<HBaseUpdateCallback> {
 
+    private byte[][] splitKeys;
+
     /**
      * Create a {@link HBaseCreateTableBuilder}.
      * Throws an {@link IllegalArgumentException} if the schema isn't a {@link MutableSchema}.
      * @param updateCallback
      * @param schema
      * @param name
-     * @param columnFamilies
      */
     public HBaseCreateTableBuilder(final HBaseUpdateCallback updateCallback, final Schema schema, final String name) {
         super(updateCallback, schema, name);
@@ -48,6 +49,14 @@ class HBaseCreateTableBuilder extends AbstractTableCreationBuilder<HBaseUpdateCa
         }
     }
 
+    public byte[][] getSplitKeys() {
+        return splitKeys;
+    }
+
+    public void setSplitKeys(byte[][] splitKeys) {
+        this.splitKeys = splitKeys;
+    }
+
     @Override
     public Table execute() {
         Set<String> columnFamilies = getColumnFamilies();
@@ -59,8 +68,13 @@ class HBaseCreateTableBuilder extends AbstractTableCreationBuilder<HBaseUpdateCa
         final Table table = getTable();
 
         // Add the table to the datastore
-        ((HBaseDataContext) getUpdateCallback().getDataContext()).getHBaseClient().createTable(table.getName(),
-                columnFamilies);
+        if (this.getSplitKeys() != null) {
+            ((HBaseDataContext) getUpdateCallback().getDataContext()).getHBaseClient().createTable(table.getName(),
+                    columnFamilies,this.getSplitKeys());
+        } else {
+            ((HBaseDataContext) getUpdateCallback().getDataContext()).getHBaseClient().createTable(table.getName(),
+                    columnFamilies);
+        }
 
         // Update the schema
         addNewTableToSchema(table);
@@ -87,7 +101,6 @@ class HBaseCreateTableBuilder extends AbstractTableCreationBuilder<HBaseUpdateCa
     /**
      * Add the new {@link Table} to the {@link MutableSchema}
      * @param table
-     * @param data.updateCallback
      * @return {@link MutableSchema}
      */
     private void addNewTableToSchema(final Table table) {
diff --git a/hbase/src/test/java/org/apache/metamodel/hbase/CreateTableTest.java b/hbase/src/test/java/org/apache/metamodel/hbase/CreateTableTest.java
index 4c71a47..c0779f9 100644
--- a/hbase/src/test/java/org/apache/metamodel/hbase/CreateTableTest.java
+++ b/hbase/src/test/java/org/apache/metamodel/hbase/CreateTableTest.java
@@ -23,6 +23,7 @@ import static org.junit.Assert.assertTrue;
 
 import java.io.IOException;
 
+import org.apache.hadoop.hbase.TableName;
 import org.apache.metamodel.MetaModelException;
 import org.apache.metamodel.schema.ImmutableSchema;
 import org.apache.metamodel.schema.Table;
@@ -112,6 +113,38 @@ public class CreateTableTest extends HBaseUpdateCallbackTest {
     }
 
     /**
+     * Goodflow. Create a table without the ID-Column, should work
+     *
+     * @throws IOException
+     */
+    @Test
+    public void testCreateTableWithSplitRegion() throws IOException {
+
+        int numOfRegions = 3;
+        int keyLength = 5;
+        byte[][] splitRegion = new byte[numOfRegions-1][keyLength];
+        for (int i = 0 ; i< numOfRegions-1 ; i++) {
+            splitRegion[i] = ("0000"+i).getBytes();
+        }
+
+        final HBaseCreateTableBuilder hBaseCreateTableBuilder = (HBaseCreateTableBuilder) getUpdateCallback()
+                .createTable(getSchema(), TABLE_NAME);
+
+        hBaseCreateTableBuilder.withColumn(CF_FOO);
+        hBaseCreateTableBuilder.withColumn(CF_BAR);
+        hBaseCreateTableBuilder.setSplitKeys(splitRegion);
+        hBaseCreateTableBuilder.execute();
+        checkSuccesfullyInsertedTable();
+
+        final Table table = getDataContext().getDefaultSchema().getTableByName(TABLE_NAME);
+        assertTrue(table instanceof HBaseTable);
+
+        // Assert that the Table has 3 column families, a default "_id" one, and two based on the column families for
+        // the columns.
+        assertEquals(numOfRegions,getDataContext().getAdmin().getRegions(TableName.valueOf(TABLE_NAME)).size());
+    }
+
+    /**
      * Goodflow. Create a table including the ID-Column (columnFamilies not in constructor), should work
      *
      * @throws IOException