You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Panagiotis Garefalakis (Jira)" <ji...@apache.org> on 2020/02/21 15:56:00 UTC

[jira] [Commented] (ORC-597) Row-level Filtering bench

    [ https://issues.apache.org/jira/browse/ORC-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041980#comment-17041980 ] 

Panagiotis Garefalakis commented on ORC-597:
--------------------------------------------

Row-filter benchark uses existing datasets (github, sales, taxi) with configurable filter_percentages and projected columns.
It seems that even filtering out 10% of the rows can drop runtime by a second while filtering-out as low as 20% performs on par with no-filtering at all.

 
{code:java}
Benchmark                                                 (compression)  (dataset)  (filter_percentage)  (projected_columns)  Mode  Cnt          Score         Error  Units
RowFilterProjectionBenchmark.orcNoFilter                           none      sales                 0.01                  all  avgt    5   11475225.464 ± 1623255.254  us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord            none      sales                 0.01                  all  avgt    5        623.538                    #
RowFilterProjectionBenchmark.orcNoFilter:perRecord                 none      sales                 0.01                  all  avgt    5          0.459 ±       0.065  us/op
RowFilterProjectionBenchmark.orcNoFilter:reads                     none      sales                 0.01                  all  avgt    5        895.000                    #
RowFilterProjectionBenchmark.orcNoFilter:records                   none      sales                 0.01                  all  avgt    5  125000000.000                    #
RowFilterProjectionBenchmark.orcNoFilter                           none      sales                  0.1                  all  avgt    5   11675996.797 ± 2018888.900  us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord            none      sales                  0.1                  all  avgt    5        623.538                    #
RowFilterProjectionBenchmark.orcNoFilter:perRecord                 none      sales                  0.1                  all  avgt    5          0.467 ±       0.081  us/op
RowFilterProjectionBenchmark.orcNoFilter:reads                     none      sales                  0.1                  all  avgt    5        895.000                    #
RowFilterProjectionBenchmark.orcNoFilter:records                   none      sales                  0.1                  all  avgt    5  125000000.000                    #
RowFilterProjectionBenchmark.orcNoFilter                           none      sales                  0.4                  all  avgt    5   11435162.159 ± 2618968.876  us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord            none      sales                  0.4                  all  avgt    5        623.538                    #
RowFilterProjectionBenchmark.orcNoFilter:perRecord                 none      sales                  0.4                  all  avgt    5          0.457 ±       0.105  us/op
RowFilterProjectionBenchmark.orcNoFilter:reads                     none      sales                  0.4                  all  avgt    5        895.000                    #
RowFilterProjectionBenchmark.orcNoFilter:records                   none      sales                  0.4                  all  avgt    5  125000000.000                    #
RowFilterProjectionBenchmark.orcNoFilter                           none      sales                  0.8                  all  avgt    5   11310452.698 ±  716395.472  us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord            none      sales                  0.8                  all  avgt    5        623.538                    #
RowFilterProjectionBenchmark.orcNoFilter:perRecord                 none      sales                  0.8                  all  avgt    5          0.452 ±       0.029  us/op
RowFilterProjectionBenchmark.orcNoFilter:reads                     none      sales                  0.8                  all  avgt    5        895.000                    #
RowFilterProjectionBenchmark.orcNoFilter:records                   none      sales                  0.8                  all  avgt    5  125000000.000                    #

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

RowFilterProjectionBenchmark.orcRowFilter                          none      sales                 0.01                  all  avgt    5   10555379.527 ± 2636332.098  us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord           none      sales                 0.01                  all  avgt    5        623.538                    #
RowFilterProjectionBenchmark.orcRowFilter:perRecord                none      sales                 0.01                  all  avgt    5          0.422 ±       0.105  us/op
RowFilterProjectionBenchmark.orcRowFilter:reads                    none      sales                 0.01                  all  avgt    5        895.000                    #
RowFilterProjectionBenchmark.orcRowFilter:records                  none      sales                 0.01                  all  avgt    5  125000000.000                    #
RowFilterProjectionBenchmark.orcRowFilter                          none      sales                  0.1                  all  avgt    5   10568755.756 ± 2958742.985  us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord           none      sales                  0.1                  all  avgt    5        623.538                    #
RowFilterProjectionBenchmark.orcRowFilter:perRecord                none      sales                  0.1                  all  avgt    5          0.423 ±       0.118  us/op
RowFilterProjectionBenchmark.orcRowFilter:reads                    none      sales                  0.1                  all  avgt    5        895.000                    #
RowFilterProjectionBenchmark.orcRowFilter:records                  none      sales                  0.1                  all  avgt    5  125000000.000                    #
RowFilterProjectionBenchmark.orcRowFilter                          none      sales                  0.4                  all  avgt    5   10775518.795 ±  807832.612  us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord           none      sales                  0.4                  all  avgt    5        623.538                    #
RowFilterProjectionBenchmark.orcRowFilter:perRecord                none      sales                  0.4                  all  avgt    5          0.431 ±       0.032  us/op
RowFilterProjectionBenchmark.orcRowFilter:reads                    none      sales                  0.4                  all  avgt    5        895.000                    #
RowFilterProjectionBenchmark.orcRowFilter:records                  none      sales                  0.4                  all  avgt    5  125000000.000                    #
RowFilterProjectionBenchmark.orcRowFilter                          none      sales                  0.8                  all  avgt    5   11479177.704 ±  957484.991  us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord           none      sales                  0.8                  all  avgt    5        623.538                    #
RowFilterProjectionBenchmark.orcRowFilter:perRecord                none      sales                  0.8                  all  avgt    5          0.459 ±       0.038  us/op
RowFilterProjectionBenchmark.orcRowFilter:reads                    none      sales                  0.8                  all  avgt    5        895.000                    #
RowFilterProjectionBenchmark.orcRowFilter:records                  none      sales                  0.8                  all  avgt    5  125000000.000                    #

{code}

> Row-level Filtering bench
> -------------------------
>
>                 Key: ORC-597
>                 URL: https://issues.apache.org/jira/browse/ORC-597
>             Project: ORC
>          Issue Type: Sub-task
>            Reporter: Panagiotis Garefalakis
>            Assignee: Panagiotis Garefalakis
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Extend orc-benchmarks for row-level filtering



--
This message was sent by Atlassian Jira
(v8.3.4#803005)