You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by jo...@apache.org on 2022/06/02 01:53:37 UTC

[impala] branch master updated (62683e0eb -> ed0d9341d)

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


    from 62683e0eb IMPALA-11324: Fix broken test_reportexecstatus_retries
     new b3dec99ea Resolve merge conflict from IMPALA-10645
     new ed0d9341d IMPALA-11325: Fix UnicodeDecodeError for shell file output

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../catalog/metastore/CatalogMetastoreServer.java       |  5 -----
 shell/shell_output.py                                   | 10 +++++++++-
 tests/shell/test_shell_commandline.py                   | 17 +++++++++++++++++
 3 files changed, 26 insertions(+), 6 deletions(-)


[impala] 01/02: Resolve merge conflict from IMPALA-10645

Posted by jo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit b3dec99ea1942069a67b971719d095567758dc9b
Author: Michael Smith <mi...@cloudera.com>
AuthorDate: Wed May 25 10:36:56 2022 -0700

    Resolve merge conflict from IMPALA-10645
    
    Change-Id: Ic8d693af43e28c5834ca56654afbc8e711def117
    Reviewed-on: http://gerrit.cloudera.org:8080/18564
    Reviewed-by: Joe McDonnell <jo...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../org/apache/impala/catalog/metastore/CatalogMetastoreServer.java  | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java b/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
index 5f33e2bde..14a0f6076 100644
--- a/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
+++ b/fe/src/main/java/org/apache/impala/catalog/metastore/CatalogMetastoreServer.java
@@ -350,12 +350,7 @@ public class CatalogMetastoreServer extends ThriftHiveMetastore implements
   }
 
   /**
-<<<<<<< HEAD
-   * Returns the RPC and connection metrics for this metastore server. //TODO hook this
-   * method to the Catalog's debug UI
-=======
    * Returns the RPC and connection metrics for this metastore server.
->>>>>>> c4a8633759... IMPALA-10645: Log catalogd HMS API metrics
    */
   @Override
   public TCatalogdHmsCacheMetrics getCatalogdHmsCacheMetrics() {


[impala] 02/02: IMPALA-11325: Fix UnicodeDecodeError for shell file output

Posted by jo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit ed0d9341d3229b5857c8583d1817172d61b0f68c
Author: Joe McDonnell <jo...@cloudera.com>
AuthorDate: Tue May 31 16:14:55 2022 -0700

    IMPALA-11325: Fix UnicodeDecodeError for shell file output
    
    When using the --output_file commandline option for
    impala-shell, the shell fails with UnicodeDecodeError
    if the output contains Unicode characters.
    
    For example, if running this command:
    impala-shell -B -q "select '引'" --output_file=output.txt
    This fails with:
    UnicodeDecodeError : 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)
    
    This happens due to an encode('utf-8') call happening
    in OutputStream::write() on a string that is already UTF-8 encoded.
    This changes the code to skip the encode('utf-8') call for Python 2.
    Python 3 is using a string and still needs the encode call.
    
    This is mostly a pragmatic fix to make the code a little bit
    more functional, and there is more work to be done to have
    clear contracts for the format() methods and clear points
    of conversion to/from bytes.
    
    Testing:
     - Ran shell tests with Python 2 and Python 3 on Ubuntu 18
     - Added a shell test that outputs a Unicode character
       to an output file. Without the fix, this test fails.
    
    Change-Id: Ic40be3d530c2694465f7bd2edb0e0586ff0e1fba
    Reviewed-on: http://gerrit.cloudera.org:8080/18576
    Reviewed-by: Michael Smith <mi...@cloudera.com>
    Reviewed-by: Quanlong Huang <hu...@gmail.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 shell/shell_output.py                 | 10 +++++++++-
 tests/shell/test_shell_commandline.py | 17 +++++++++++++++++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/shell/shell_output.py b/shell/shell_output.py
index 978196539..becc4dd06 100644
--- a/shell/shell_output.py
+++ b/shell/shell_output.py
@@ -112,7 +112,15 @@ class OutputStream(object):
         with open(self.filename, 'ab') as out_file:
           # Note that instances of this class do not persist, so it's fine to
           # close the we close the file handle after each write.
-          out_file.write(formatted_data.encode('utf-8'))  # file opened in binary mode
+          # The file is opened in binary mode. Python 2 returns Unicode bytes
+          # that can be written directly. Python 3 returns a string, which
+          # we need to encode before writing.
+          # TODO: Reexamine the contract of the format() function and see if
+          # we can remove this.
+          if sys.version_info.major == 2 and isinstance(formatted_data, str):
+            out_file.write(formatted_data)
+          else:
+            out_file.write(formatted_data.encode('utf-8'))
           out_file.write(b'\n')
       except IOError as err:
         file_err_msg = "Error opening file %s: %s" % (self.filename, str(err))
diff --git a/tests/shell/test_shell_commandline.py b/tests/shell/test_shell_commandline.py
index 3c6454d5a..7b410d333 100644
--- a/tests/shell/test_shell_commandline.py
+++ b/tests/shell/test_shell_commandline.py
@@ -1202,6 +1202,23 @@ class TestImpalaShell(ImpalaTestSuite):
       rows_from_file = [line.rstrip() for line in f]
       assert rows_from_stdout == rows_from_file
 
+  def test_output_file_utf8(self, vector, tmp_file):
+    """Test that writing UTF-8 output to a file using '--output_file' produces the
+    same output as written to stdout."""
+    # This is purely about UTF-8 output, so it doesn't need multiple rows.
+    query = "select '引'"
+    # Run the query normally and keep the stdout
+    output = run_impala_shell_cmd(vector, ['-q', query, '-B', '--output_delimiter=;'])
+    assert "Fetched 1 row(s)" in output.stderr
+    rows_from_stdout = output.stdout.strip().split('\n')
+    # Run the query with output sent to a file using '--output_file'.
+    result = run_impala_shell_cmd(vector, ['-q', query, '-B', '--output_delimiter=;',
+                                           '--output_file=%s' % tmp_file])
+    assert "Fetched 1 row(s)" in result.stderr
+    with open(tmp_file, "r") as f:
+      rows_from_file = [line.rstrip() for line in f]
+      assert rows_from_stdout == rows_from_file
+
   def test_http_socket_timeout(self, vector):
     """Test setting different http_socket_timeout_s values."""
     if (vector.get_value('strict_hs2_protocol') or