You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/16 00:33:26 UTC

[GitHub] [hudi] alexeykudinkin commented on a change in pull request #5004: [HUDI-1180] Upgrade HBase to 2.4.9

alexeykudinkin commented on a change in pull request #5004:
URL: https://github.com/apache/hudi/pull/5004#discussion_r827515042



##########
File path: hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -80,45 +85,49 @@
   public HoodieHFileReader(Configuration configuration, Path path, CacheConfig cacheConfig) throws IOException {
     this.conf = configuration;
     this.path = path;
-    this.reader = HFile.createReader(FSUtils.getFs(path.toString(), configuration), path, cacheConfig, conf);
+    this.reader = HFile.createReader(FSUtils.getFs(path.toString(), configuration), path, cacheConfig, true, conf);
   }
 
   public HoodieHFileReader(Configuration configuration, Path path, CacheConfig cacheConfig, FileSystem fs) throws IOException {
     this.conf = configuration;
     this.path = path;
     this.fsDataInputStream = fs.open(path);
-    this.reader = HFile.createReader(fs, path, cacheConfig, configuration);
+    this.reader = HFile.createReader(fs, path, cacheConfig, true, configuration);
   }
 
   public HoodieHFileReader(byte[] content) throws IOException {
     Configuration conf = new Configuration();
     Path path = new Path("hoodie");
     SeekableByteArrayInputStream bis = new SeekableByteArrayInputStream(content);
     FSDataInputStream fsdis = new FSDataInputStream(bis);
-    this.reader = HFile.createReader(FSUtils.getFs("hoodie", conf), path, new FSDataInputStreamWrapper(fsdis),
-        content.length, new CacheConfig(conf), conf);
+    FSDataInputStreamWrapper stream = new FSDataInputStreamWrapper(fsdis);
+    FileSystem fs = FSUtils.getFs("hoodie", conf);

Review comment:
       nit: Can we extract this initialization seq into a method (returning just the Reader)

##########
File path: hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -250,7 +259,7 @@ public BloomFilter readBloomFilter() {
    */
   public List<Pair<String, R>> readRecords(List<String> keys, Schema schema) throws IOException {
     this.schema = schema;
-    reader.loadFileInfo();
+    reader.getHFileInfo();

Review comment:
       From what i see those are returning cached `fileInfo` object do we really need those here?

##########
File path: hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -263,7 +272,7 @@ public BloomFilter readBloomFilter() {
 
   public ClosableIterator<R> getRecordIterator(List<String> keys, Schema schema) throws IOException {
     this.schema = schema;
-    reader.loadFileInfo();
+    reader.getHFileInfo();

Review comment:
       Same as above

##########
File path: hudi-common/src/test/java/org/apache/hudi/common/fs/inline/TestInLineFileSystem.java
##########
@@ -369,7 +369,8 @@ private Path getRandomInlinePath() {
   private void verifyFileStatus(FileStatus expected, Path inlinePath, long expectedLength, FileStatus actual) {
     assertEquals(inlinePath, actual.getPath());
     assertEquals(expectedLength, actual.getLen());
-    assertEquals(expected.getAccessTime(), actual.getAccessTime());
+    // removing below assertion as it is flaky on rare occasion (difference is in single-digit ms)

Review comment:
       Let's either create a ticket to follow-up or remove completely

##########
File path: hudi-common/src/main/resources/hbase-site.xml
##########
@@ -0,0 +1,2185 @@
+<?xml version="1.0"?>
+<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+<!--
+In Hudi bundles, we include this hbase-site.xml containing HBase
+default configs from the hbase-common 2.4.9 we use, to override the
+default configs loaded from hbase-default.xml from an older HBase
+version and to ensure correct default configs for Hudi HBase usage.
+In Hive, the Hive server loads all lib jars including HBase jars
+with its corresponding hbase-default.xml into class path (e.g.,
+HBase 1.1.1), and that can cause conflict with the hbase-default.xml
+in Hudi bundles (HBase 2.4.9).  The exception is thrown as follows:
+
+Caused by: java.lang.RuntimeException: hbase-default.xml file
+seems to be for an older version of HBase (1.1.1), this version is 2.4.9
+
+Relevant logic causing such exception can be found in
+HBaseConfiguration::addHbaseResources().  To get around this issue,
+since HBase loads "hbase-site.xml" after "hbase-default.xml", we
+provide hbase-site.xml from the bundle so that HBaseConfiguration
+can pick it up and ensure the right defaults.

Review comment:
       Thanks for adding explainer!

##########
File path: hudi-common/src/test/java/org/apache/hudi/common/fs/inline/TestInLineFileSystemHFileInLining.java
##########
@@ -121,21 +123,24 @@ public void testSimpleInlineFileSystem() throws IOException {
 
     Set<Integer> rowIdsToSearch = getRandomValidRowIds(10);
     for (int rowId : rowIdsToSearch) {
-      assertEquals(0, scanner.seekTo(KeyValue.createKeyValueFromKey(getSomeKey(rowId))),
+      KeyValue keyValue = new KeyValue.KeyOnlyKeyValue(getSomeKey(rowId));
+      assertEquals(0, scanner.seekTo(keyValue),
           "location lookup failed");
       // read the key and see if it matches
-      ByteBuffer readKey = scanner.getKey();
-      assertArrayEquals(getSomeKey(rowId), Bytes.toBytes(readKey), "seeked key does not match");
-      scanner.seekTo(KeyValue.createKeyValueFromKey(getSomeKey(rowId)));
+      Cell cell = scanner.getCell();
+      byte[] key = Arrays.copyOfRange(cell.getRowArray(), cell.getRowOffset(), cell.getRowOffset() + cell.getRowLength());
+      assertArrayEquals(Arrays.copyOfRange(keyValue.getRowArray(), keyValue.getRowOffset(), keyValue.getRowOffset() + keyValue.getRowLength()), key,

Review comment:
       nit: Please create a var for expected value to make assertion more readable

##########
File path: hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -80,45 +85,49 @@
   public HoodieHFileReader(Configuration configuration, Path path, CacheConfig cacheConfig) throws IOException {
     this.conf = configuration;
     this.path = path;
-    this.reader = HFile.createReader(FSUtils.getFs(path.toString(), configuration), path, cacheConfig, conf);
+    this.reader = HFile.createReader(FSUtils.getFs(path.toString(), configuration), path, cacheConfig, true, conf);
   }
 
   public HoodieHFileReader(Configuration configuration, Path path, CacheConfig cacheConfig, FileSystem fs) throws IOException {
     this.conf = configuration;
     this.path = path;
     this.fsDataInputStream = fs.open(path);
-    this.reader = HFile.createReader(fs, path, cacheConfig, configuration);
+    this.reader = HFile.createReader(fs, path, cacheConfig, true, configuration);
   }
 
   public HoodieHFileReader(byte[] content) throws IOException {
     Configuration conf = new Configuration();
     Path path = new Path("hoodie");
     SeekableByteArrayInputStream bis = new SeekableByteArrayInputStream(content);
     FSDataInputStream fsdis = new FSDataInputStream(bis);
-    this.reader = HFile.createReader(FSUtils.getFs("hoodie", conf), path, new FSDataInputStreamWrapper(fsdis),
-        content.length, new CacheConfig(conf), conf);
+    FSDataInputStreamWrapper stream = new FSDataInputStreamWrapper(fsdis);
+    FileSystem fs = FSUtils.getFs("hoodie", conf);
+    HFileSystem hfs = (fs instanceof HFileSystem) ? (HFileSystem) fs : new HFileSystem(fs);
+    ReaderContext context = new ReaderContextBuilder()
+        .withFilePath(path)
+        .withInputStreamWrapper(stream)
+        .withFileSize(content.length)
+        .withFileSystem(hfs)
+        .withPrimaryReplicaReader(true)
+        .withReaderType(ReaderContext.ReaderType.STREAM)
+        .build();
+    HFileInfo fileInfo = new HFileInfo(context, conf);
+    this.reader = HFile.createReader(context, fileInfo, new CacheConfig(conf), conf);
+    fileInfo.initMetaAndIndex(reader);
   }
 
   @Override
   public String[] readMinMaxRecordKeys() {
-    try {
-      Map<byte[], byte[]> fileInfo = reader.loadFileInfo();
-      return new String[] { new String(fileInfo.get(KEY_MIN_RECORD.getBytes())),
-          new String(fileInfo.get(KEY_MAX_RECORD.getBytes()))};
-    } catch (IOException e) {

Review comment:
       IOException is checked exception, so if new HFile API doesn't throw it we can't catch it here

##########
File path: packaging/hudi-flink-bundle/pom.xml
##########
@@ -147,10 +148,18 @@
 
                   <include>org.apache.hbase:hbase-common</include>
                   <include>org.apache.hbase:hbase-client</include>
+                  <include>org.apache.hbase:hbase-hadoop-compat</include>

Review comment:
       Need not to take as part of this PR, but i actually want to suggest one step further: 
   Since we're mostly reliant on HFile and the classes it's dependent on, can we try to filter out packages that won't break it? 
   
   My hunch is that we can greatly reduce 16Mb overhead number by just cleaning up all the stuff that is bolted onto HBase.

##########
File path: hudi-common/src/test/java/org/apache/hudi/common/fs/inline/TestInLineFileSystemHFileInLining.java
##########
@@ -56,11 +58,12 @@
  */
 public class TestInLineFileSystemHFileInLining {
 
+  private static final String LOCAL_FORMATTER = "%010d";
+  private static final String VALUE_PREFIX = "value";
+  private static final int MIN_BLOCK_SIZE = 1024;

Review comment:
       Can we add suffix to clarify what measure this is (bytes, kbytes, etc)

##########
File path: packaging/hudi-hadoop-mr-bundle/pom.xml
##########
@@ -110,6 +135,74 @@
                   <pattern>com.google.common.</pattern>
                   <shadedPattern>org.apache.hudi.com.google.common.</shadedPattern>
                 </relocation>
+                <!-- The classes below in org.apache.hadoop.metrics2 package come from

Review comment:
       +1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org