You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Jacob S. Barrett (Jira)" <ji...@apache.org> on 2019/08/30 23:05:00 UTC

[jira] [Comment Edited] (GEODE-2793) Look into reducing the amount of PDX deserializations in OQL query intermediate result sets for indexed OR queries containing PdxInstanceImpls

    [ https://issues.apache.org/jira/browse/GEODE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919949#comment-16919949 ] 

Jacob S. Barrett edited comment on GEODE-2793 at 8/30/19 11:04 PM:
-------------------------------------------------------------------

The biggest overhead we see in profiling queries is the repeated hash code calculations on {{PdxInstanceImpl.hashcode()}}. In the current query benchmark it accounts for ~60% of the CPU time and a significant amount of transient object allocations. 

We really should investigate a way of stashing the hashcode in the PDX byte stream. Care must be taken to avoid upgrade and backwards compatibility issues. Also, if the PDX entry is updated in place it must have the hashcode recalculated.

Measure using PartitionedIndexQueryBenchmark.

Alternatively, a single PdxInstanceImpl object could be stored in the entry either along side the PDX stream or in place of the PDX stream. The instance already caches the hashcode after calculating from the stream. We could then also optimize {{PdxInstanceImpl.equals(Object)}} by caching equality field values locally, with weak references. 



was (Author: jbarrett):
The biggest overhead we see in profiling queries is the repeated hash code calculations on {{PdxInstanceImpl.hashcode()}}. In the current query benchmark it accounts for ~60% of the CPU time and a significant amount of transient object allocations. 

We really should investigate a way of stashing the hashcode in the PDX byte stream. Care must be taken to avoid upgrade and backwards compatibility issues. Also, if the PDX entry is updated in place it must have the hashcode recalculated.

Alternatively, a single PdxInstanceImpl object could be stored in the entry either along side the PDX stream or in place of the PDX stream. The instance already caches the hashcode after calculating from the stream. We could then also optimize {{PdxInstanceImpl.equals(Object)}} by caching equality field values locally, with weak references. 


> Look into reducing the amount of PDX deserializations in OQL query intermediate result sets for indexed OR queries containing PdxInstanceImpls
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-2793
>                 URL: https://issues.apache.org/jira/browse/GEODE-2793
>             Project: Geode
>          Issue Type: Bug
>          Components: querying
>            Reporter: Barry Oglesby
>            Assignee: Jacob S. Barrett
>            Priority: Major
>              Labels: perfomance
>
> Intermediate result sets for each of the indexed OR clauses are represented by ResultsBags. Each index is sorted and iterated in AbstractGroupOrRangeJunction auxFilterEvaluate. When entry in the index is added to a ResultsBag, hashCode is invoked. In the case of a PdxInstanceImpl, this causes all of its identity fields to be deserialized so that hashCode can be invoked on them.
> Then, when each ResultsBag is sorted during QueryUtils union and sizeSortedUnion by invoking occurrences on each entry, equals is invoked each entry. In the case of a PdxInstanceImpl, this causes all of its identity fields to be deserialized so that equals can be invoked on them.
> Here is an example query that shows the PDX deserializations:
> {noformat}
> select * from /region this where ((map['entry1']='value1' OR map['entry2']='value2' OR map['entry3']='value3' OR map['entry4']='value4' OR map['entry5']='value5' OR map['entry6']='value6' OR map['entry7']='value7' OR map['entry8']='value8' OR map['entry9']='value9' OR map['entry10']='value10')) ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)