You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by do...@apache.org on 2022/08/23 21:35:44 UTC
[orc] branch asf-site updated: Remove redundant source md file
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 9fa16aaae Remove redundant source md file
9fa16aaae is described below
commit 9fa16aaae249c1c84f067828d97bd3d419bf2971
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Tue Aug 23 14:35:21 2022 -0700
Remove redundant source md file
---
develop/design/lazy_filter.md | 335 ------------------------------------------
1 file changed, 335 deletions(-)
diff --git a/develop/design/lazy_filter.md b/develop/design/lazy_filter.md
deleted file mode 100644
index 5c64d8948..000000000
--- a/develop/design/lazy_filter.md
+++ /dev/null
@@ -1,335 +0,0 @@
-* [Lazy Filter](#LazyFilter)
- * [Background](#Background)
- * [Design](#Design)
- * [SArg to Filter](#SArgtoFilter)
- * [Read](#Read)
- * [Configuration](#Configuration)
- * [Tests](#Tests)
- * [Appendix](#Appendix)
- * [Benchmarks](#Benchmarks)
- * [Row vs Vector](#RowvsVector)
- * [Normalization vs Compact](#NormalizationvsCompact)
- * [Summary](#Summary)
-
-# Lazy Filter <a id="LazyFilter"></a>
-
-## Background <a id="Background"></a>
-
-This feature request started as a result of a needle in the haystack search that is performed with the following
-characteristics:
-
-* The search fields are not part of partition, bucket or sort specification.
-* The table is a very large table.
-* The result is very few rows compared to the scan size.
-* The search columns are a significant subset of selection columns in the query.
-
-Initial analysis showed that we could have a significant benefit by lazily reading the non-search columns only when we
-have a match. We explore the design and some benchmarks in subsequent sections.
-
-## Design <a id="Design"></a>
-
-This builds further on [ORC-577][ORC-577] which currently only restricts deserialization for some selected data types
-but does not improve on IO.
-
-On a high level the design includes the following components:
-
-```text
-┌──────────────┐ ┌────────────────────────┐
-│ │ │ Read │
-│ │ │ │
-│ │ │ ┌────────────┐ │
-│SArg to Filter│─────────▶│ │Read Filter │ │
-│ │ │ │ Columns │ │
-│ │ │ └────────────┘ │
-│ │ │ │ │
-└──────────────┘ │ ▼ │
- │ ┌────────────┐ │
- │ │Apply Filter│ │
- │ └────────────┘ │
- │ │ │
- │ ▼ │
- │ ┌────────────┐ │
- │ │Read Select │ │
- │ │ Columns │ │
- │ └────────────┘ │
- │ │
- │ │
- └────────────────────────┘
-```
-
-* **SArg to Filter**: Converts Search Arguments passed down into filters for efficient application during scans.
-* **Read**: Performs the lazy read using the filters.
- * **Read Filter Columns**: Read the filter columns from the file.
- * **Apply Filter**: Apply the filter on the read filter columns.
- * **Read Select Columns**: If filter selects at least a row then read the remaining columns.
-
-### SArg to Filter <a id="SArgtoFilter"></a>
-
-SArg to Filter converts the passed SArg into a filter. This enables automatic compatibility with both Spark and Hive as
-they already push down Search Arguments down to ORC.
-
-The SArg is automatically converted into a [Vector Filter][vfilter]. Which is applied during the read process. Two
-filter types were evaluated:
-
-* [Row Filter][rfilter] that evaluates each row across all the predicates once.
-* [Vector Filter][vfilter] that evaluates each filter across the entire vector and adjusts the subsequent evaluation.
-
-While a row based filter is easier to code, it is much [slower][rowvvector] to process. We also see a significant
-[performance gain][rowvvector] in the absence of normalization.
-
-The builder for search argument should allow skipping normalization during the [build][build]. This has been added with
-[HIVE-24458][HIVE-24458].
-
-### Read <a id="Read"></a>
-
-The read process has the following changes:
-
-```text
- │
- │
- │
-┌────────────────────────▼────────────────────────┐
-│ ┏━━━━━━━━━━━━━━━━┓ │
-│ ┃Plan ++Search++ ┃ │
-│ ┃ Columns ┃ │
-│ ┗━━━━━━━━━━━━━━━━┛ │
-│ Read │Stripe │
-└────────────────────────┼────────────────────────┘
- │
- ▼
-
-
- │
- │
-┌────────────────────────▼────────────────────────┐
-│ ┏━━━━━━━━━━━━━━━━┓ │
-│ ┃Read ++Search++ ┃ │
-│ ┃ Columns ┃◀─────────┐ │
-│ ┗━━━━━━━━━━━━━━━━┛ │ │
-│ │ Size = 0 │
-│ ▼ │ │
-│ ┏━━━━━━━━━━━━━━━━┓ │ │
-│ ┃ Apply Filter ┃──────────┘ │
-│ ┗━━━━━━━━━━━━━━━━┛ │
-│ Size > 0 │
-│ │ │
-│ ▼ │
-│ ┏━━━━━━━━━━━━━━━━┓ │
-│ ┃ Plan Select ┃ │
-│ ┃ Columns ┃ │
-│ ┗━━━━━━━━━━━━━━━━┛ │
-│ │ │
-│ ▼ │
-│ ┏━━━━━━━━━━━━━━━━┓ │
-│ ┃ Read Select ┃ │
-│ ┃ Columns ┃ │
-│ ┗━━━━━━━━━━━━━━━━┛ │
-│ Next │Batch │
-└────────────────────────┼────────────────────────┘
- │
- ▼
-```
-
-The read process changes:
-
-* **Read Stripe** used to plan the read of all (search + select) columns. This is enhanced to plan and fetch only the
- search columns. The rest of the stripe planning process optimizations remain unchanged e.g. partial read planning of
- the stripe based on RowGroup statistics.
-* **Next Batch** identifies the processing that takes place when `RecordReader.nextBatch` is invoked.
- * **Read Search Columns** takes place instead of reading all the selected columns. This is in sync with the planning
- that has taken place during **Read Stripe** where only the search columns have been planned.
- * **Apply Filter** on the batch that at this point only includes search columns. Evaluate the result of the filter:
- * **Size = 0** indicates all records have been filtered out. Given this we proceed to the next batch of search
- columns.
- * **Size > 0** indicates that at least one record accepted by the filter. This record needs to be substantiated with
- other columns.
- * **Plan Select Columns** is invoked to perform read of the select columns. The planning happens as follows:
- * Determine the current position of the read within the stripe and plan the read for the select columns from this
- point forward to the end of the stripe.
- * The Read planning of select columns respects the row groups filtered out as a result of the stripe planning.
- * Fetch the select columns using the above plan.
- * **Read Select Columns** into the vectorized row batch
- * Return this batch.
-
-The current implementation performs a single read for the select columns in a stripe.
-
-```text
-┌──────────────────────────────────────────────────┐
-│ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ │
-│ │RG0 │ │RG1 │ │RG2■│ │RG3 │ │RG4 │ │RG5■│ │RG6 │ │
-│ └────┘ └────┘ └────┘ └────┘ └────┘ └────┘ └────┘ │
-│ Stripe │
-└──────────────────────────────────────────────────┘
-```
-
-The above diagram depicts a stripe with 7 Row Groups out of which **RG2** and **RG5** are selected by the filter. The
-current implementation does the following:
-
-* Start the read planning process from the first match RG2
-* Read to the end of the stripe that includes RG6
-* Based on the above fetch skips RG0 and RG1 subject to compression block boundaries
-
-The above logic could be enhanced to perform say **2 or n** reads before reading to the end of stripe. The current
-implementation allows 0 reads before reading to the end of the stripe. The value of **n** could be configurable but
-should avoid too many short reads.
-
-The read behavior changes as follows with multiple reads being allowed within a stripe for select columns:
-
-```text
-┌──────────────────────────────────────────────────┐
-│ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ │
-│ │ │ │ │ │■■■■│ │■■■■│ │■■■■│ │■■■■│ │■■■■│ │
-│ └────┘ └────┘ └────┘ └────┘ └────┘ └────┘ └────┘ │
-│ Current implementation │
-└──────────────────────────────────────────────────┘
-┌──────────────────────────────────────────────────┐
-│ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ │
-│ │ │ │ │ │■■■■│ │ │ │ │ │■■■■│ │■■■■│ │
-│ └────┘ └────┘ └────┘ └────┘ └────┘ └────┘ └────┘ │
-│ Allow 1 partial read │
-└──────────────────────────────────────────────────┘
-```
-
-The figure shows that we could read significantly fewer bytes by performing an additional read before reading to the end
-of stripe. This shall be included as a subsequent enhancement to this patch.
-
-## Configuration <a id="Configuration"></a>
-
-The following configuration options are exposed that control the filter behavior:
-
-|Property |Type |Default|
-|:--- |:--- |:--- |
-|orc.sarg.to.filter |boolean|false |
-|orc.filter.use.selected|boolean|false |
-
-* `orc.sarg.to.filter` can be used to turn off the SArg to filter conversion. This might be particularly relevant in
- cases where the filter is expensive and does not eliminate a lot of records. This will not be relevant once we have
- the option to turn off the filters on the caller as they have been completely implemented by the ORC layer.
-* `orc.filter.use.selected` is an important setting that if incorrectly enabled results in wrong output. A boolean flag
- to determine if the selected vector is supported by the reading application. If false, the output of the ORC reader
- must have the filter reapplied to avoid using unset values in the unselected rows. If unsure please leave this as
- false.
-
-## Tests <a id="Tests"></a>
-
-We evaluated this patch against a search job with the following stats:
-
-* Table
- * Size: ~**420 TB**
- * Data fields: ~**120**
- * Partition fields: **3**
-* Scan
- * Search fields: 3 data fields with large (~ 1000 value) IN clauses compounded by **OR**.
- * Select fields: 16 data fields (includes the 3 search fields), 1 partition field
- * Search:
- * Size: ~**180 TB**
- * Records: **3.99 T**
- * Selected:
- * Size: ~**100 MB**
- * Records: **1 M**
-
-We have observed the following reductions:
-
-|Test |IO Reduction %|CPU Reduction %|
-|:--- | ---:| ---:|
-|SELECT 16 cols| 45| 47|
-|SELECT * | 70| 87|
-
-* The savings are more significant as you increase the number of select columns with respect to the search columns
-* When the filter selects most data, no significant penalty observed as a result of 2 IO compared with a single IO
- * We do have a penalty as a result of the double filter application both in ORC and in the calling engine.
-
-## Appendix <a id="Appendix"></a>
-
-### Benchmarks <a id="Benchmarks"></a>
-
-#### Row vs Vector <a id="RowvsVector"></a>
-
-We start with a decision of using a Row filter vs a Vector filter. The Row filter has the advantage of simpler code when
-compared with the Vector filter.
-
-```bash
-java -jar java/bench/core/target/orc-benchmarks-core-*-uber.jar filter simple
-```
-
-|Benchmark |(fInSize)|(fType)|Mode| Cnt| Score|Error |Units|
-|:--- | ---:|:--- |:---|---:| ---:|:--- |:--- |
-|FilterBench.SimpleFilter.filter| 4|row |avgt| 20|38.207|± 0.178|us/op|
-|FilterBench.SimpleFilter.filter| 4|vector |avgt| 20|18.663|± 0.117|us/op|
-|FilterBench.SimpleFilter.filter| 8|row |avgt| 20|50.694|± 0.313|us/op|
-|FilterBench.SimpleFilter.filter| 8|vector |avgt| 20|35.532|± 0.190|us/op|
-|FilterBench.SimpleFilter.filter| 16|row |avgt| 20|52.443|± 0.268|us/op|
-|FilterBench.SimpleFilter.filter| 16|vector |avgt| 20|33.966|± 0.204|us/op|
-|FilterBench.SimpleFilter.filter| 32|row |avgt| 20|68.504|± 0.318|us/op|
-|FilterBench.SimpleFilter.filter| 32|vector |avgt| 20|51.707|± 0.302|us/op|
-|FilterBench.SimpleFilter.filter| 256|row |avgt| 20|88.348|± 0.793|us/op|
-|FilterBench.SimpleFilter.filter| 256|vector |avgt| 20|72.602|± 0.282|us/op|
-
-Explanation:
-
-* **fInSize** calls out the number of values in the IN clause.
-* **fType** calls out the whether the filter is a row based filter, or a vector based filter.
-
-Observations:
-
-* The vector based filter is significantly faster than the row based filter.
- * At best, vector was faster by **51.15%**
- * At worst, vector was faster by **17.82%**
-* The performance of the filters is deteriorates with the increase of the IN values, however even in this case the
- vector filter is much better than the row filter. The current `IN` filter employs a binary search on an array instead
- of a hash lookup.
-
-#### Normalization vs Compact <a id="NormalizationvsCompact"></a>
-
-In this test we use a complex filter with both AND, and OR to understand the impact of Conjunctive Normal Form on the
-filter performance. The Search Argument builder by default performs a CNF. The advantage of the CNF would again be a
-simpler code base.
-
-```bash
-java -jar java/bench/core/target/orc-benchmarks-core-*-uber.jar filter complex
-```
-
-|Benchmark |(fSize)|(fType)|(normalize)|Mode| Cnt| Score|Error |Units|
-|:--- | ---:|:--- |:--- |:---|---:| ---:|:--- |:--- |
-|FilterBench.ComplexFilter.filter| 2|row |true |avgt| 20| 91.922|± 0.301 |us/op|
-|FilterBench.ComplexFilter.filter| 2|row |false |avgt| 20| 90.741|± 0.556 |us/op|
-|FilterBench.ComplexFilter.filter| 2|vector |true |avgt| 20| 61.137|± 0.398 |us/op|
-|FilterBench.ComplexFilter.filter| 2|vector |false |avgt| 20| 54.829|± 0.431 |us/op|
-|FilterBench.ComplexFilter.filter| 4|row |true |avgt| 20| 284.956|± 1.237 |us/op|
-|FilterBench.ComplexFilter.filter| 4|row |false |avgt| 20| 130.526|± 0.767 |us/op|
-|FilterBench.ComplexFilter.filter| 4|vector |true |avgt| 20| 242.387|± 1.053 |us/op|
-|FilterBench.ComplexFilter.filter| 4|vector |false |avgt| 20| 98.530|± 0.423 |us/op|
-|FilterBench.ComplexFilter.filter| 8|row |true |avgt| 20|8007.101|± 54.912|us/op|
-|FilterBench.ComplexFilter.filter| 8|row |false |avgt| 20| 234.943|± 4.713 |us/op|
-|FilterBench.ComplexFilter.filter| 8|vector |true |avgt| 20|7013.758|± 33.701|us/op|
-|FilterBench.ComplexFilter.filter| 8|vector |false |avgt| 20| 190.442|± 0.881 |us/op|
-
-Explanation:
-
-* **fSize** identifies the size of the children in the OR clause that will be normalized.
-* **normalize** identifies whether normalize was carried out on the Search Argument.
-
-Observations:
-
-* Vector filter is better than the row filter as demonstrated by the [Row vs Vector Test][rowvvector].
-* Normalizing the search argument results in a significant performance penalty given the explosion of the operator tree
- * In case where an AND includes 8 ORs, the compact version is faster by **97.29%**
-
-#### Summary <a id="Summary"></a>
-
-Based on the benchmarks we have the following conclusions:
-
-* Vector based filter is significantly better than a row based filter and justifies the more complex code.
-* Compact filter is significantly faster than a normalized filter.
-
-[ORC-577]: https://issues.apache.org/jira/browse/ORC-577
-
-[HIVE-24458]: https://issues.apache.org/jira/browse/HIVE-24458
-
-[vfilter]: ../../../java/core/src/java/org/apache/orc/impl/filter/VectorFilter.java
-
-[rowvvector]: #RowvsVector
-
-[normalvcompact]: #NormalizationvsCompact
-
-[build]: https://github.com/apache/hive/blob/storage-branch-2.7/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java#L491
\ No newline at end of file