You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by GitBox <gi...@apache.org> on 2019/07/24 18:21:45 UTC

[GitHub] [metron] mmiklavc commented on a change in pull request #1458: METRON-2177 Upgrade Profiler for HBase 2.0.2

mmiklavc commented on a change in pull request #1458: METRON-2177 Upgrade Profiler for HBase 2.0.2
URL: https://github.com/apache/metron/pull/1458#discussion_r306954597
 
 

 ##########
 File path: metron-analytics/metron-profiler-client/src/main/java/org/apache/metron/profiler/client/HBaseProfilerClientFactory.java
 ##########
 @@ -0,0 +1,98 @@
+/*
+ *
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ */
+package org.apache.metron.profiler.client;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.metron.hbase.client.HBaseClient;
+import org.apache.metron.hbase.client.HBaseClientFactory;
+import org.apache.metron.hbase.client.HBaseConnectionFactory;
+import org.apache.metron.hbase.client.HBaseTableClientFactory;
+import org.apache.metron.profiler.hbase.ColumnBuilder;
+import org.apache.metron.profiler.hbase.RowKeyBuilder;
+import org.apache.metron.profiler.hbase.SaltyRowKeyBuilder;
+import org.apache.metron.profiler.hbase.ValueOnlyColumnBuilder;
+
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.metron.profiler.client.stellar.ProfilerClientConfig.PROFILER_COLUMN_FAMILY;
+import static org.apache.metron.profiler.client.stellar.ProfilerClientConfig.PROFILER_HBASE_TABLE;
+import static org.apache.metron.profiler.client.stellar.ProfilerClientConfig.PROFILER_HBASE_CONNECTION_FACTORY;
+import static org.apache.metron.profiler.client.stellar.ProfilerClientConfig.PROFILER_SALT_DIVISOR;
+import static org.apache.metron.profiler.client.stellar.Util.getPeriodDurationInMillis;
+
+/**
+ * Creates an {@link HBaseProfilerClient}.
+ */
+public class HBaseProfilerClientFactory implements ProfilerClientFactory {
+
+  private HBaseClientFactory hBaseClientFactory;
+
+  public HBaseProfilerClientFactory() {
+    this(new HBaseTableClientFactory());
+  }
+
+  public HBaseProfilerClientFactory(HBaseClientFactory hBaseClientFactory) {
+    this.hBaseClientFactory = hBaseClientFactory;
+  }
+
+  @Override
+  public HBaseProfilerClient create(Map<String, Object> globals) {
 
 Review comment:
   The abstractions you've come up with are making sense. I didn't catch this initially, but it looks like one subtlety here is that your new connection interface is relevant on a per-table basis, so if you need to modify 2 tables then you'd need to construct 2 connections to handle that - https://github.com/apache/metron/pull/1456/files#diff-24d29985fc6561348b435e97d176e21fR51. Not that we do that really, but it may have implications in the coprocessor implementation.
   
   I was thinking through this a bit and was considering whether it makes sense to make the table operations more ad-hoc. So the setup methods like `public void addMutation(byte[] rowKey, ColumnList cols)` could still look the same, but maybe we defer getting the table from the connection until we call the execution methods, e.g.
   
   ```
   public int mutate() {
   ...
   }
   ```
   
   becomes something like this:
   
   ```
   public int mutate(String tableName) {
       try(Table table = connection.getTable(TableName.valueOf(tableName))) {
           doMutate(table);
       }
   }
   
     private void doMutate(Table table) {
       Object[] result = new Object[mutations.size()];
       try {
         table.batch(mutations, result);
   
       } catch (Exception e) {
         String msg = String.format("'%d' HBase write(s) failed on table '%s'", size(mutations), tableName(table));
         LOG.error(msg, e);
         throw new RuntimeException(msg, e);
   
       } finally {
         mutations.clear();
       }
     }
   ```
   
   The rationale being that we have more flexibility around opening and closing the HBase connections.
   
   For posterity and reference from the `Connection` javadoc:
   
   > Connection creation is a heavy-weight operation. Connection implementations are thread-safe, so that the client can create a connection once, and share it with different threads. Table and Admin instances, on the other hand, are light-weight and are not thread-safe. Typically, a single connection per client application is instantiated and every thread will obtain its own Table instance. Caching or pooling of Table and Admin is not recommended
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services