You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by na...@apache.org on 2019/11/01 04:54:34 UTC

[incubator-hudi] branch master updated (eda472a -> ee0fd06)

This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


    from eda472a  [MINOR] Fix avro schema warnings in build
     new 3251d62  [HUDI-313] Fix select count star error when querying a realtime table
     new ee0fd06  synchronized lock on conf object instead of class

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../realtime/HoodieParquetRealtimeInputFormat.java   | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)


[incubator-hudi] 02/02: synchronized lock on conf object instead of class

Posted by na...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git

commit ee0fd06de73e0191365549fa7c4f6c71c1bbc08d
Author: Wenning Ding <we...@amazon.com>
AuthorDate: Wed Oct 30 11:48:21 2019 -0700

    synchronized lock on conf object instead of class
---
 .../realtime/HoodieParquetRealtimeInputFormat.java      | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
index 3e42724..ba325e1 100644
--- a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
+++ b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
@@ -200,14 +200,17 @@ public class HoodieParquetRealtimeInputFormat extends HoodieParquetInputFormat i
   /**
    * Hive will append read columns' ids to old columns' ids during getRecordReader. In some cases, e.g. SELECT COUNT(*),
    * the read columns' id is an empty string and Hive will combine it with Hoodie required projection ids and becomes
-   * e.g. ",2,0,3" and will cause an error. This method is used to avoid this situation.
+   * e.g. ",2,0,3" and will cause an error. Actually this method is a temporary solution because the real bug is from
+   * Hive. Hive has fixed this bug after 3.0.0, but the version before that would still face this problem. (HIVE-22438)
    */
-  private static synchronized Configuration cleanProjectionColumnIds(Configuration conf) {
-    String columnIds = conf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR);
-    if (!columnIds.isEmpty() && columnIds.charAt(0) == ',') {
-      conf.set(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR, columnIds.substring(1));
-      if (LOG.isDebugEnabled()) {
-        LOG.debug("The projection Ids: {" + columnIds + "} start with ','. First comma is removed");
+  private static Configuration cleanProjectionColumnIds(Configuration conf) {
+    synchronized (conf) {
+      String columnIds = conf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR);
+      if (!columnIds.isEmpty() && columnIds.charAt(0) == ',') {
+        conf.set(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR, columnIds.substring(1));
+        if (LOG.isDebugEnabled()) {
+          LOG.debug("The projection Ids: {" + columnIds + "} start with ','. First comma is removed");
+        }
       }
     }
     return conf;


[incubator-hudi] 01/02: [HUDI-313] Fix select count star error when querying a realtime table

Posted by na...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git

commit 3251d62bd3c740b25139029a1913d1cf5a57173f
Author: Wenning Ding <we...@amazon.com>
AuthorDate: Wed Oct 23 13:53:57 2019 -0700

    [HUDI-313] Fix select count star error when querying a realtime table
---
 .../realtime/HoodieParquetRealtimeInputFormat.java      | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
index d37ae2a..3e42724 100644
--- a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
+++ b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
@@ -197,10 +197,27 @@ public class HoodieParquetRealtimeInputFormat extends HoodieParquetInputFormat i
     return configuration;
   }
 
+  /**
+   * Hive will append read columns' ids to old columns' ids during getRecordReader. In some cases, e.g. SELECT COUNT(*),
+   * the read columns' id is an empty string and Hive will combine it with Hoodie required projection ids and becomes
+   * e.g. ",2,0,3" and will cause an error. This method is used to avoid this situation.
+   */
+  private static synchronized Configuration cleanProjectionColumnIds(Configuration conf) {
+    String columnIds = conf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR);
+    if (!columnIds.isEmpty() && columnIds.charAt(0) == ',') {
+      conf.set(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR, columnIds.substring(1));
+      if (LOG.isDebugEnabled()) {
+        LOG.debug("The projection Ids: {" + columnIds + "} start with ','. First comma is removed");
+      }
+    }
+    return conf;
+  }
+
   @Override
   public RecordReader<NullWritable, ArrayWritable> getRecordReader(final InputSplit split, final JobConf job,
       final Reporter reporter) throws IOException {
 
+    this.conf = cleanProjectionColumnIds(job);
     LOG.info("Before adding Hoodie columns, Projections :" + job.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR)
         + ", Ids :" + job.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR));