You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by st...@apache.org on 2022/03/02 02:01:28 UTC

[impala] branch master updated (7942a8c -> b2e4b29)

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from 7942a8c  IMPALA-11112: Impala can't resolve json tables created by Hive
     new 719d368  IMPALA-10049: Include RPC call_id in slow RPC logs
     new b2e4b29  IMPALA-11120: Fix codec not set in generating ORC tables

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/runtime/krpc-data-stream-sender.cc  |  3 ++-
 testdata/bin/generate-schema-statements.py | 10 +++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

[impala] 01/02: IMPALA-10049: Include RPC call_id in slow RPC logs

Posted by st...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 719d3688950e9c22811ca2bb619b616f89679593
Author: Riza Suminto <ri...@cloudera.com>
AuthorDate: Thu Feb 17 11:36:22 2022 -0800

    IMPALA-10049: Include RPC call_id in slow RPC logs
    
    KRPC log slow RPC trace in the receiver side. The trace log has the
    call_id info that matches with the sender. However, our slow RPC logging
    in the sender side does not log this call_id. It is hard to associate
    the slow RPC logs between sender and receiver.
    
    With the recent KRPC rebase in IMPALA-10931, we can now log the call_id
    on the sender side.
    
    Testing:
    I tested this with a low threshold and delays added (the same as we did
    in IMPALA-9128):
    
      start-impala-cluster.py \
          --impalad_args=--impala_slow_rpc_threshold_ms=1 \
          --impalad_args=--debug_actions=END_DATA_STREAM_DELAY:JITTER@3000@1.0
    
    The following is how the logs look like on the sender and receiver sides:
    
    impalad_node1.INFO (sender):
    I0217 10:29:36.278754  6606 krpc-data-stream-sender.cc:394] Slow TransmitData RPC (request call id 414) to 127.0.0.1:27002 (fragment_instance_id=d8453c2785c38df4:3473e28b00000041): took 343.279ms. Receiver time: 342.780ms Network time: 498.405us
    
    impalad_node2.INFO (receiver):
    I0217 10:29:36.278379  6775 rpcz_store.cc:269] Call impala.DataStreamService.TransmitData from 127.0.0.1:39702 (request call id 414) took 342ms. Trace:
    I0217 10:29:36.278479  6775 rpcz_store.cc:270] 0217 10:29:35.935586 (+     0us) impala-service-pool.cc:179] Inserting onto call queue
    0217 10:29:36.277730 (+342144us) impala-service-pool.cc:278] Handling call
    0217 10:29:36.277859 (+   129us) krpc-data-stream-recvr.cc:397] Deserializing batch
    0217 10:29:36.278330 (+   471us) krpc-data-stream-recvr.cc:424] Enqueuing deserialized batch
    0217 10:29:36.278369 (+    39us) inbound_call.cc:171] Queueing success response
    Metrics: {}
    
    Change-Id: I7fb5746fa0be575745a8e168405d43115c425389
    Reviewed-on: http://gerrit.cloudera.org:8080/18243
    Reviewed-by: Wenzhe Zhou <wz...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/runtime/krpc-data-stream-sender.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/be/src/runtime/krpc-data-stream-sender.cc b/be/src/runtime/krpc-data-stream-sender.cc
index 42d0435..37d6970 100644
--- a/be/src/runtime/krpc-data-stream-sender.cc
+++ b/be/src/runtime/krpc-data-stream-sender.cc
@@ -391,7 +391,8 @@ template <typename ResponsePBType>
 void KrpcDataStreamSender::Channel::LogSlowRpc(
     const char* rpc_name, int64_t total_time_ns, const ResponsePBType& resp) {
   int64_t network_time_ns = total_time_ns - resp.receiver_latency_ns();
-  LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
+  LOG(INFO) << "Slow " << rpc_name << " RPC (request call id "
+            << rpc_controller_.call_id() << ") to " << address_
             << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << "): "
             << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) << ". "
             << "Receiver time: "

[impala] 02/02: IMPALA-11120: Fix codec not set in generating ORC tables

Posted by st...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit b2e4b29f06141ad34eef2cbadfda259124792ac2
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Mon Feb 14 11:07:12 2022 +0800

    IMPALA-11120: Fix codec not set in generating ORC tables
    
    We use 'mapred.output.compression.codec' to set the compression codec in
    generating test files by Hive. However, it doesn't affect ORC files.
    Instead, we need to set 'orc.compress' in tblproperties for each ORC
    tables. The default value of 'orc.compress' is ZLIB which corresponds to
    our 'def' codec. We only need to set it for non-def codecs.
    
    This patch also fixes a bug in build_compression_codec_statement() that
    would raise KeyError when loading lz4 non-avro tables.
    
    Tests
     - Loaded tpch data in orc/none/none, orc/def/block, orc/snap/block,
       orc/lz4/block and verified there compression codecs.
    
    Change-Id: I02bd5d9400864145133ff019a3d076a6cab36fcc
    Reviewed-on: http://gerrit.cloudera.org:8080/18228
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 testdata/bin/generate-schema-statements.py | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/testdata/bin/generate-schema-statements.py b/testdata/bin/generate-schema-statements.py
index 061601f..e9b52b4 100755
--- a/testdata/bin/generate-schema-statements.py
+++ b/testdata/bin/generate-schema-statements.py
@@ -426,7 +426,7 @@ def avro_schema(columns):
   return json.dumps(record)
 
 def build_compression_codec_statement(codec, compression_type, file_format):
-  codec = AVRO_COMPRESSION_MAP[codec] if file_format == 'avro' else COMPRESSION_MAP[codec]
+  codec = (AVRO_COMPRESSION_MAP if file_format == 'avro' else COMPRESSION_MAP).get(codec)
   if not codec:
     return str()
   return (AVRO_COMPRESSION_CODEC % codec) if file_format == 'avro' else (
@@ -688,6 +688,14 @@ def generate_statements(output_name, test_vectors, sections,
           create_file_format == 'orc' and
           'transactional' not in tblproperties):
         tblproperties['transactional'] = 'true'
+      if create_file_format == 'orc' and create_codec != 'def':
+        # The default value of 'orc.compress' is ZLIB which corresponds to 'def'.
+        # We just need it for non-def codec.
+        # The original codec name can be used except snap.
+        if create_codec == 'snap':
+          tblproperties['orc.compress'] = 'SNAPPY'
+        else:
+          tblproperties['orc.compress'] = create_codec
 
       hdfs_location = '{0}.{1}{2}'.format(db_name, table_name, db_suffix)
       # hdfs file names for functional datasets are stored