You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by wz...@apache.org on 2021/07/27 01:00:50 UTC
[impala] branch master updated (b3c4ac9 -> 46f1343)
This is an automated email from the ASF dual-hosted git repository.
wzhou pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.
from b3c4ac9 IMPALA-10814: Fix crash on illegal Parquet file
new f863611 IMPALA-10805: [DOCS] Document priority based scratch directory selection
new 46f1343 IMPALA-10821 Fix TestTPCHJoinQueries.test_outer_joins failed
The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
docs/topics/impala_disk_space.xml | 39 ++++++++++++++++++++++
.../workloads/tpch/queries/tpch-outer-joins.test | 4 +--
2 files changed, 41 insertions(+), 2 deletions(-)
[impala] 01/02: IMPALA-10805: [DOCS] Document priority based
scratch directory selection
Posted by wz...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wzhou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit f863611497b29eefb551cee8645fa5d86749924f
Author: Shajini Thayasingh <st...@cloudera.com>
AuthorDate: Mon Jul 19 13:52:27 2021 -0700
IMPALA-10805: [DOCS] Document priority based scratch directory selection
Made minor changes.
Incorporated feedback received by providing more examples.
Explained how to configure priorities for the scratch directories.
Provided an example displaying priority based configuration.
Change-Id: Iec170fdefcde09d4ee99d06b0876a17eb0bde2f6
Reviewed-on: http://gerrit.cloudera.org:8080/17700
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
docs/topics/impala_disk_space.xml | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/docs/topics/impala_disk_space.xml b/docs/topics/impala_disk_space.xml
index d1c4ca4..b32502f 100644
--- a/docs/topics/impala_disk_space.xml
+++ b/docs/topics/impala_disk_space.xml
@@ -277,6 +277,45 @@ under the License.
</section>
<section>
+ <title>Priority Based Scratch Directory Selection</title>
+ <p>The location of the intermediate files are configured by starting the impalad daemon with
+ the flag ‑‑scratch_dirs="path_to_directory". Currently this startup flag uses the configured
+ scratch directories in a round robin fashion. Automatic selection of scratch directories in
+ a round robin fashion may not always be ideal in every situation since these directories
+ could come from different classes of storage system volumes having different performance
+ characteristics (SSD vs HDD, local storage vs network attached storage, etc.). To optimize
+ your workload, you have an option to configure the priority of the scratch directories based
+ on your storage system configuration.</p>
+ <p>The scratch directories will be selected for spilling based on how you configure the
+ priorities of the directories and if you provide the same priority for multiple directories
+ then the directories will be selected in a round robin fashion.</p>
+ <p>The valid formats for specifying the priority directories are as shown here:
+ <codeblock>
+ <dir-path>:<limit>:<priority>
+ <dir-path>::<priority>
+</codeblock></p>
+ <p>Example:</p>
+ <p>
+ <codeblock>
+ /dir1:200GB:0
+ /dir1::0
+</codeblock>
+ </p>
+ <p>The following formats use the default priority:
+ <codeblock>
+ /dir1
+ /dir1:200GB
+ /dir1:200GB:
+</codeblock>
+ </p>
+ <p>In the example below, dir1 will be used as a spill victim until it is full and then dir2, dir3,
+ and dir4 will be used in a round robin fashion.</p>
+ <p>
+ <codeblock>‑‑scratch_dirs="/dir1:200GB:0, /dir2:1024GB:1, /dir3:1024GB:1, /dir4:1024GB:1"
+</codeblock>
+ </p>
+ </section>
+ <section>
<title>Increasing Scratch Capacity</title>
<p> You can compress the data spilled to disk to increase the effective scratch capacity. You
typically more than double capacity using compression and reduce spilling to disk. Use the
[impala] 02/02: IMPALA-10821 Fix
TestTPCHJoinQueries.test_outer_joins failed
Posted by wz...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wzhou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit 46f13437db2d64d9ee431c2c5cf424be3b8d6377
Author: Yida Wu <wy...@gmail.com>
AuthorDate: Thu Jul 22 15:45:44 2021 -0700
IMPALA-10821 Fix TestTPCHJoinQueries.test_outer_joins failed
A new added testcase in TestTPCHJoinQueries.test_outer_joins
can't pass the s3 build because the plan generated in s3 build
is not the same as the default hdfs build for the scan node. In
s3 build, it is "SCAN S3", while in hdfs build, it is "SCAN HDFS".
The patch changed the testcase to use $FILESYSTEM_NAME, which
will be changed according to the file system the testcase is
using.
Tests:
Reran and passed the failed testcase in s3 build.
Change-Id: I7e068d9da03517f8316e7a2505ce1466523d5917
Reviewed-on: http://gerrit.cloudera.org:8080/17716
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
testdata/workloads/tpch/queries/tpch-outer-joins.test | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/testdata/workloads/tpch/queries/tpch-outer-joins.test b/testdata/workloads/tpch/queries/tpch-outer-joins.test
index ea51d71..e0e96f2 100644
--- a/testdata/workloads/tpch/queries/tpch-outer-joins.test
+++ b/testdata/workloads/tpch/queries/tpch-outer-joins.test
@@ -84,7 +84,7 @@ AND a.`SELECT` = b.`INSERT`;
'05:EXCHANGE [UNPARTITIONED]'
'02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]'
'|--04:EXCHANGE [HASH(b.`INSERT`,b.`insert`)]'
-'| 00:SCAN HDFS [default.t1 b]'
+'| 00:SCAN $FILESYSTEM_NAME [default.t1 b]'
'03:EXCHANGE [HASH(a.`SELECT`,a.`select`)]'
-'01:SCAN HDFS [default.t2 a]'
+'01:SCAN $FILESYSTEM_NAME [default.t2 a]'
====