You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Liya Fan (Jira)" <ji...@apache.org> on 2020/06/11 02:55:00 UTC

[jira] [Comment Edited] (CALCITE-4049) Reduce the time complexity of getting shortest distances

    [ https://issues.apache.org/jira/browse/CALCITE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132852#comment-17132852 ] 

Liya Fan edited comment on CALCITE-4049 at 6/11/20, 2:54 AM:
-------------------------------------------------------------

[~julianhyde] Thanks a lot for your comments. 
The down-side of adding the size field is that it makes the object size 4 bytes larger (I am not sure if it is 50% larger, as an object has other meta-data to store). The up-side is reducing the time complexity of getting size from O( n ) to O(1). 

IMO, a general trend for performance optimization is to reduce time complexity by taking more spaces, as memory/disk spaces are becoming less expensive today and modern computers tend to have larger memory/disk spaces. 

For example, we build indices for relation databases to reduce the query time, at the expense of extra memory/disk spaces. We can also find examples from Calcite code base. For example, in class [RexCall|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexCall.java], we have the {{nodeCount}} field, which introduces an extra 4 byte. However, since it reduces the time complexity of getting node count, it should be considered a good optimization. 



was (Author: fan_li_ya):
[~julianhyde] Thanks a lot for your comments. 
The down-side of adding the size field is that it makes the object size 4 bytes larger (I am not sure if it is 50% larger, as an object has other meta-data to store). The up-side is reducing the time complexity of getting size from O(n) to O(1). 

IMO, a general trend for performance optimization is to reduce time complexity by taking more spaces, as memory/disk spaces are becoming less expensive today and modern computers tend to have larger memory/disk spaces. 

For example, we build indices for relation databases to reduce the query time, at the expense of extra memory/disk spaces. We can also find examples from Calcite code base. For example, in class [RexCall|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexCall.java], we have the {{nodeCount}} field, which introduces an extra 4 byte. However, since it reduces the time complexity of getting node count, it should be considered a good optimization. 


> Reduce the time complexity of getting shortest distances
> --------------------------------------------------------
>
>                 Key: CALCITE-4049
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4049
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Liya Fan
>            Assignee: Liya Fan
>            Priority: Major
>             Fix For: 1.24.0
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we have {{Graphs#makeImmutable}} to compute the shortest paths between all pairs of nodes. For many scenarios, however, we do not need the exact paths between nodes. Instead, we are only interested in the lengths of the shortest paths. 
> To get the path length, we need to get the shortest path first, which is returned as a {{List}}, then we call the {{List#size()}} method. According to the current implementation, the returned list is of type {{ConsList}}. The time complexity of {{ConsList#size}} is O(p) (p is the number of vertices on the path), which is inefficient. 
> In this issue, we revise the implementation of {{ConsList#size}} so that it takes O(1) time. In addition, we also give a utiltiy to get the shortest distances between nodes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)