You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by wz...@apache.org on 2021/07/27 01:00:50 UTC

[impala] branch master updated (b3c4ac9 -> 46f1343)

This is an automated email from the ASF dual-hosted git repository.

wzhou pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from b3c4ac9  IMPALA-10814: Fix crash on illegal Parquet file
     new f863611  IMPALA-10805: [DOCS] Document priority based scratch directory selection
     new 46f1343  IMPALA-10821 Fix TestTPCHJoinQueries.test_outer_joins failed

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docs/topics/impala_disk_space.xml                  | 39 ++++++++++++++++++++++
 .../workloads/tpch/queries/tpch-outer-joins.test   |  4 +--
 2 files changed, 41 insertions(+), 2 deletions(-)

[impala] 01/02: IMPALA-10805: [DOCS] Document priority based scratch directory selection

Posted by wz...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

wzhou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit f863611497b29eefb551cee8645fa5d86749924f
Author: Shajini Thayasingh <st...@cloudera.com>
AuthorDate: Mon Jul 19 13:52:27 2021 -0700

    IMPALA-10805: [DOCS] Document priority based scratch directory selection
    
    Made minor changes.
    Incorporated feedback received by providing more examples.
    Explained how to configure priorities for the scratch directories.
    Provided an example displaying priority based configuration.
    
    Change-Id: Iec170fdefcde09d4ee99d06b0876a17eb0bde2f6
    Reviewed-on: http://gerrit.cloudera.org:8080/17700
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 docs/topics/impala_disk_space.xml | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/docs/topics/impala_disk_space.xml b/docs/topics/impala_disk_space.xml
index d1c4ca4..b32502f 100644
--- a/docs/topics/impala_disk_space.xml
+++ b/docs/topics/impala_disk_space.xml
@@ -277,6 +277,45 @@ under the License.
 
     </section>
     <section>
+      <title>Priority Based Scratch Directory Selection</title>
+      <p>The location of the intermediate files are configured by starting the impalad daemon with
+        the flag ‑‑scratch_dirs="path_to_directory". Currently this startup flag uses the configured
+        scratch directories in a round robin fashion. Automatic selection of scratch directories in
+        a round robin fashion may not always be ideal in every situation since these directories
+        could come from different classes of storage system volumes having different performance
+        characteristics (SSD vs HDD, local storage vs network attached storage, etc.). To optimize
+        your workload, you have an option to configure the priority of the scratch directories based
+        on your storage system configuration.</p>
+      <p>The scratch directories will be selected for spilling based on how you configure the
+        priorities of the directories and if you provide the same priority for multiple directories
+        then the directories will be selected in a round robin fashion.</p>
+      <p>The valid formats for specifying the priority directories are as shown here:
+        <codeblock>
+          &lt;dir-path>:&lt;limit>:&lt;priority>
+          &lt;dir-path>::&lt;priority>
+</codeblock></p>
+        <p>Example:</p>
+      <p>
+        <codeblock>
+        /dir1:200GB:0
+        /dir1::0
+</codeblock>
+      </p>
+      <p>The following formats use the default priority:
+        <codeblock>
+        /dir1
+        /dir1:200GB
+        /dir1:200GB:
+</codeblock>
+      </p>
+      <p>In the example below, dir1 will be used as a spill victim until it is full and then dir2, dir3,
+        and dir4 will be used in a round robin fashion.</p>
+      <p>
+        <codeblock>‑‑scratch_dirs="/dir1:200GB:0, /dir2:1024GB:1, /dir3:1024GB:1, /dir4:1024GB:1"
+</codeblock>
+      </p>
+    </section>
+    <section>
       <title>Increasing Scratch Capacity</title>
       <p> You can compress the data spilled to disk to increase the effective scratch capacity. You
         typically more than double capacity using compression and reduce spilling to disk. Use the

[impala] 02/02: IMPALA-10821 Fix TestTPCHJoinQueries.test_outer_joins failed

Posted by wz...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

wzhou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 46f13437db2d64d9ee431c2c5cf424be3b8d6377
Author: Yida Wu <wy...@gmail.com>
AuthorDate: Thu Jul 22 15:45:44 2021 -0700

    IMPALA-10821 Fix TestTPCHJoinQueries.test_outer_joins failed
    
    A new added testcase in TestTPCHJoinQueries.test_outer_joins
    can't pass the s3 build because the plan generated in s3 build
    is not the same as the default hdfs build for the scan node. In
    s3 build, it is "SCAN S3", while in hdfs build, it is "SCAN HDFS".
    
    The patch changed the testcase to use $FILESYSTEM_NAME, which
    will be changed according to the file system the testcase is
    using.
    
    Tests:
    Reran and passed the failed testcase in s3 build.
    
    Change-Id: I7e068d9da03517f8316e7a2505ce1466523d5917
    Reviewed-on: http://gerrit.cloudera.org:8080/17716
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 testdata/workloads/tpch/queries/tpch-outer-joins.test | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/testdata/workloads/tpch/queries/tpch-outer-joins.test b/testdata/workloads/tpch/queries/tpch-outer-joins.test
index ea51d71..e0e96f2 100644
--- a/testdata/workloads/tpch/queries/tpch-outer-joins.test
+++ b/testdata/workloads/tpch/queries/tpch-outer-joins.test
@@ -84,7 +84,7 @@ AND a.`SELECT` = b.`INSERT`;
 '05:EXCHANGE [UNPARTITIONED]'
 '02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]'
 '|--04:EXCHANGE [HASH(b.`INSERT`,b.`insert`)]'
-'|  00:SCAN HDFS [default.t1 b]'
+'|  00:SCAN $FILESYSTEM_NAME [default.t1 b]'
 '03:EXCHANGE [HASH(a.`SELECT`,a.`select`)]'
-'01:SCAN HDFS [default.t2 a]'
+'01:SCAN $FILESYSTEM_NAME [default.t2 a]'
 ====