You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Panagiotis Garefalakis (Jira)" <ji...@apache.org> on 2020/02/21 15:56:00 UTC
[jira] [Commented] (ORC-597) Row-level Filtering bench
[ https://issues.apache.org/jira/browse/ORC-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041980#comment-17041980 ]
Panagiotis Garefalakis commented on ORC-597:
--------------------------------------------
Row-filter benchark uses existing datasets (github, sales, taxi) with configurable filter_percentages and projected columns.
It seems that even filtering out 10% of the rows can drop runtime by a second while filtering-out as low as 20% performs on par with no-filtering at all.
{code:java}
Benchmark (compression) (dataset) (filter_percentage) (projected_columns) Mode Cnt Score Error Units
RowFilterProjectionBenchmark.orcNoFilter none sales 0.01 all avgt 5 11475225.464 ± 1623255.254 us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord none sales 0.01 all avgt 5 623.538 #
RowFilterProjectionBenchmark.orcNoFilter:perRecord none sales 0.01 all avgt 5 0.459 ± 0.065 us/op
RowFilterProjectionBenchmark.orcNoFilter:reads none sales 0.01 all avgt 5 895.000 #
RowFilterProjectionBenchmark.orcNoFilter:records none sales 0.01 all avgt 5 125000000.000 #
RowFilterProjectionBenchmark.orcNoFilter none sales 0.1 all avgt 5 11675996.797 ± 2018888.900 us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord none sales 0.1 all avgt 5 623.538 #
RowFilterProjectionBenchmark.orcNoFilter:perRecord none sales 0.1 all avgt 5 0.467 ± 0.081 us/op
RowFilterProjectionBenchmark.orcNoFilter:reads none sales 0.1 all avgt 5 895.000 #
RowFilterProjectionBenchmark.orcNoFilter:records none sales 0.1 all avgt 5 125000000.000 #
RowFilterProjectionBenchmark.orcNoFilter none sales 0.4 all avgt 5 11435162.159 ± 2618968.876 us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord none sales 0.4 all avgt 5 623.538 #
RowFilterProjectionBenchmark.orcNoFilter:perRecord none sales 0.4 all avgt 5 0.457 ± 0.105 us/op
RowFilterProjectionBenchmark.orcNoFilter:reads none sales 0.4 all avgt 5 895.000 #
RowFilterProjectionBenchmark.orcNoFilter:records none sales 0.4 all avgt 5 125000000.000 #
RowFilterProjectionBenchmark.orcNoFilter none sales 0.8 all avgt 5 11310452.698 ± 716395.472 us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord none sales 0.8 all avgt 5 623.538 #
RowFilterProjectionBenchmark.orcNoFilter:perRecord none sales 0.8 all avgt 5 0.452 ± 0.029 us/op
RowFilterProjectionBenchmark.orcNoFilter:reads none sales 0.8 all avgt 5 895.000 #
RowFilterProjectionBenchmark.orcNoFilter:records none sales 0.8 all avgt 5 125000000.000 #
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
RowFilterProjectionBenchmark.orcRowFilter none sales 0.01 all avgt 5 10555379.527 ± 2636332.098 us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord none sales 0.01 all avgt 5 623.538 #
RowFilterProjectionBenchmark.orcRowFilter:perRecord none sales 0.01 all avgt 5 0.422 ± 0.105 us/op
RowFilterProjectionBenchmark.orcRowFilter:reads none sales 0.01 all avgt 5 895.000 #
RowFilterProjectionBenchmark.orcRowFilter:records none sales 0.01 all avgt 5 125000000.000 #
RowFilterProjectionBenchmark.orcRowFilter none sales 0.1 all avgt 5 10568755.756 ± 2958742.985 us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord none sales 0.1 all avgt 5 623.538 #
RowFilterProjectionBenchmark.orcRowFilter:perRecord none sales 0.1 all avgt 5 0.423 ± 0.118 us/op
RowFilterProjectionBenchmark.orcRowFilter:reads none sales 0.1 all avgt 5 895.000 #
RowFilterProjectionBenchmark.orcRowFilter:records none sales 0.1 all avgt 5 125000000.000 #
RowFilterProjectionBenchmark.orcRowFilter none sales 0.4 all avgt 5 10775518.795 ± 807832.612 us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord none sales 0.4 all avgt 5 623.538 #
RowFilterProjectionBenchmark.orcRowFilter:perRecord none sales 0.4 all avgt 5 0.431 ± 0.032 us/op
RowFilterProjectionBenchmark.orcRowFilter:reads none sales 0.4 all avgt 5 895.000 #
RowFilterProjectionBenchmark.orcRowFilter:records none sales 0.4 all avgt 5 125000000.000 #
RowFilterProjectionBenchmark.orcRowFilter none sales 0.8 all avgt 5 11479177.704 ± 957484.991 us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord none sales 0.8 all avgt 5 623.538 #
RowFilterProjectionBenchmark.orcRowFilter:perRecord none sales 0.8 all avgt 5 0.459 ± 0.038 us/op
RowFilterProjectionBenchmark.orcRowFilter:reads none sales 0.8 all avgt 5 895.000 #
RowFilterProjectionBenchmark.orcRowFilter:records none sales 0.8 all avgt 5 125000000.000 #
{code}
> Row-level Filtering bench
> -------------------------
>
> Key: ORC-597
> URL: https://issues.apache.org/jira/browse/ORC-597
> Project: ORC
> Issue Type: Sub-task
> Reporter: Panagiotis Garefalakis
> Assignee: Panagiotis Garefalakis
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Extend orc-benchmarks for row-level filtering
--
This message was sent by Atlassian Jira
(v8.3.4#803005)