You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Vladimir Sitnikov (JIRA)" <ji...@apache.org> on 2019/01/09 19:09:00 UTC
[jira] [Comment Edited] (CALCITE-2635) getMonotonocity is slow on
wide tables
[ https://issues.apache.org/jira/browse/CALCITE-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738551#comment-16738551 ]
Vladimir Sitnikov edited comment on CALCITE-2635 at 1/9/19 7:08 PM:
--------------------------------------------------------------------
{quote}@PerformanceTest(expectedDuration = "2s", variance = "5%"){quote}
Expected duration depends on the hardware. For instance, notebook, virtual machine, desktop, vps, etc, all could have very different raw performance.
I think it is much better to invest time to having something like https://arewefastyet.com
In other words, we could have a set of "standard" benchmarks + consistent machine for execution + scheduled executions so we can track regressions.
**I'm inclined to merge this fix with no extra tests.**
Note: the change is a clear win.
Alternative option is to implement HashMap to speedup {{org.apache.calcite.rel.type.RelDataType#getField(String fieldName, boolean caseSensitive, boolean elideRecord)}}. We do have {{org.apache.calcite.rel.type.RelDataTypeFactoryImpl#canonize(org.apache.calcite.rel.type.RelDataType)}}, so lazy initialized cache of field positions might help.
However, we don't really expect single table to have lots of collations, so we could just go with PR#891
On top of that, we might add a hard limit like "try no more than first 50 collations of the table", so even a table with extreme amount of collations won't create a problem for {{getMonotonocity}}
was (Author: vladimirsitnikov):
{quote}@PerformanceTest(expectedDuration = "2s", variance = "5%"){quote}
Expected duration depends on the hardware. For instance, notebook, virtual machine, desktop, vps, etc, all could have very different raw performance.
I think it is much better to invest time to having something like https://arewefastyet.com
In other words, we could have a set of "standard" benchmarks + consistent machine for execution + scheduled executions so we can track regressions.
I'm inclined to merge this fix with no extra tests.
Note: the change is a clear win.
Alternative option is to implement HashMap to speedup {{org.apache.calcite.rel.type.RelDataType#getField(String fieldName, boolean caseSensitive, boolean elideRecord)}}. We do have {{org.apache.calcite.rel.type.RelDataTypeFactoryImpl#canonize(org.apache.calcite.rel.type.RelDataType)}}, so lazy initialized cache of field positions might help.
However, we don't really expect single table to have lots of collations, so we could just go with PR#891
On top of that, we might add a hard limit like "try no more than first 50 collations of the table", so even a table with extreme amount of collations won't create a problem for {{getMonotonocity}}
> getMonotonocity is slow on wide tables
> --------------------------------------
>
> Key: CALCITE-2635
> URL: https://issues.apache.org/jira/browse/CALCITE-2635
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Gian Merlino
> Assignee: Gian Merlino
> Priority: Major
> Labels: performance
>
> RelOptTableImpl's getMonotonocity does an indexOf on {{rowType.getFieldNames()}}, which is O(N) in the number of fields. IdentifierNamespace calls getMonotonicity once for every field in the table namespace, so it becomes O(N^2) in the number of fields. We observed 2-4 second query planning times with a table that had 18,000 columns, reduced to about 150ms after patching getMonotonicity to be O(1) in the number of fields.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)