You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Katherine Marsden <km...@sbcglobal.net> on 2013/02/27 21:47:39 UTC

How Much Memory for hash joins

I was wondering what is the default maximum memory for hash joins.

Looking at OptimizerFactoryImpl I see
protected int maxMemoryPerTable = 1048576 unless overridden by 
"derby.language.maxMemoryPerTable";
Is actually intended  per table or per active query?  I don't see the 
property in the documentation.
If I set this to zero will it turn off hash joins all together?


In BackingStoreHashTableFromScan I see.

       this.max_inmemory_rowcnt = max_inmemory_rowcnt;
         if( max_inmemory_rowcnt > 0)
             max_inmemory_size = Long.MAX_VALUE;
         else
             max_inmemory_size = Runtime.getRuntime().totalMemory()/100;

So what is the intent and actual behavior of 
"derby.language.maxMemoryPerTable" and its default and do they match? 
Are there other factors that go into setting the ceiling for memory 
usage for has joins.

Thanks

Kathey

P.S.
In actual practice on a *very* old derby version   Apache Derby - 
10.1.2.1 - (330608) I am looking at a hprof dump which shows almost 2GB 
of Blob and clob objects that trace back to hash joins and a 
BAckingStoreHashTableFromScan objects that have values as below with 
max_inmemory_size as Long.MAX_VALUE as I would expect from the above code.

e.g.
instance of 
org.apache.derby.impl.store.access.BackingStoreHashTableFromScan@0xa59f7d08 
(63 bytes)
Class:
class org.apache.derby.impl.store.access.BackingStoreHashTableFromScan
Instance data members:
auxillary_runtimestats (L) : <null>
diskHashtable (L) : <null>
hash_table (L) : java.util.Hashtable@0xa59f7d48 (40 bytes)
inmemory_rowcnt (J) : 8686
keepAfterCommit (Z) : false
key_column_numbers (L) : [I@0xa33b6110 (12 bytes)
max_inmemory_rowcnt (J) : 58254
max_inmemory_size (J) : 9223372036854775807
open_scan (L) : 
org.apache.derby.impl.store.access.heap.HeapScan@0xa59f7f10 (64 bytes)
remove_duplicates (Z) : false
row_source (L) : <null>
skipNullKeyColumns (Z) : true
tc (L) : org.apache.derby.impl.store.access.RAMTransaction@0x8f960428 
(57 bytes)
References to this object:
org.apache.derby.impl.sql.execute.HashScanResultSet@0xa33b5fc8 (321 
bytes) : field hashtable
Other Queries

Re: How Much Memory for hash joins

Posted by Knut Anders Hatlen <kn...@oracle.com>.
Mike Matrigali <mi...@sbcglobal.net> writes:

> I have always assumed that the intent of the option is per opened
> "table" in store.  This is not user friendly at all since user does
> not really know how this matches up to their query, which is likely
> why the option was never made public for a zero admin db.

There's a brief discussion in DERBY-4620 on how this setting could be
auto-tuned.

Also, the memory estimates we use for the hash tables are inaccurate
(they are too low), so maxMemoryPerTable is effectively higher than its
nominal value. There's a patch attached to the issue that fixes the
estimates, but I've been hesitant to check it in, since it changes a lot
of query plans in the regression tests (wisconsin & co).

Re: How Much Memory for hash joins

Posted by Katherine Marsden <km...@sbcglobal.net>.
On 3/4/2013 5:21 AM, Knut Anders Hatlen wrote:
> Mike Matrigali <mi...@sbcglobal.net> writes:
>
>> Also note that these are all estimates within the system.  As Knut
>> pointed out there are some known problems with the estimates.  And
>> even with fixes he has suggested, the code is probably just guessing
>> with things like blobs/clobs.
> I haven't verified, but I think HashJoinStrategy uses
> DataTypeDescriptor.estimatedMemoryUsage() to estimate how much memory
> the hash table will consume. That method has no case for BLOB or CLOB,
> so it looks as if it will return zero for LOB columns. If that's so, it
> will definitely overestimate how many rows fits in maxMemoryPerTable
> kilobytes if the rows contain LOBs.
>
Thanks Knut for finding that, I filed DERBY-6096 for the issue.


Re: How Much Memory for hash joins

Posted by Knut Anders Hatlen <kn...@oracle.com>.
Mike Matrigali <mi...@sbcglobal.net> writes:

> Also note that these are all estimates within the system.  As Knut
> pointed out there are some known problems with the estimates.  And
> even with fixes he has suggested, the code is probably just guessing
> with things like blobs/clobs.

I haven't verified, but I think HashJoinStrategy uses
DataTypeDescriptor.estimatedMemoryUsage() to estimate how much memory
the hash table will consume. That method has no case for BLOB or CLOB,
so it looks as if it will return zero for LOB columns. If that's so, it
will definitely overestimate how many rows fits in maxMemoryPerTable
kilobytes if the rows contain LOBs.

-- 
Knut Anders

Re: How Much Memory for hash joins

Posted by Mike Matrigali <mi...@sbcglobal.net>.
The goal was to use larger memory if it was available.  At the time Java
did not provide much access to this info, so only totalMemory() was 
available to use.  I think this translated to current allocated memory.
So if you start the jvm with a lot of memory (even if you are not using
it), then derby will try to use 1% of this larger value.  That is the
intent.

If you don't start with bigger memory, even if you have a mx that allows
for bigger memory, we won't use it for the 1% calculation.  This was
probably coded when 1.4.2 was the latest jvm, so there may be better
interfaces now, not sure.

Also note that these are all estimates within the system.  As Knut 
pointed out there are some known problems with the estimates.  And
even with fixes he has suggested, the code is probably just guessing
with things like blobs/clobs.


On 3/1/2013 9:28 AM, Katherine Marsden wrote:
> On 2/28/2013 9:11 AM, Mike Matrigali wrote:
>> In BackingStoreHashTableFromScan I see.
>>>
>>>        this.max_inmemory_rowcnt = max_inmemory_rowcnt;
>>>          if( max_inmemory_rowcnt > 0)
>>>              max_inmemory_size = Long.MAX_VALUE;
>>>          else
>>>              max_inmemory_size = Runtime.getRuntime().totalMemory()/100;
>>>
>
>
> I have been reading the comments and trying to make sense of the logic
> and understand all that is happening with max_inmemory_size. I don't
> have a test case that goes through the else part of the condition above.
>
> One thing I did notice is that Runtime.getRuntime().totalMemory()
> returns really different things if -Xms is set large, for example with
> nothing else going on with  -Xms1048m -Xmx1048m I get:
>
> Total Memory:1098907648 Free Memory:1088915600
>
> With just    -Xmx1048m
> Total Memory:4194304 Free Memory:2750304
>
> Two questions
> 1) Might hash joins use an unexpectedly large amount of memory if -Xms
> is set large?  I know at the user site where this was being set, they
> were setting -Xms in the hopes of optimizing memory usage but I wonder
> if it actually increased the amount of memory used by hash joins.
i don't think it is unexpected, the intent is to use more memory if more
memory is available.  see above.  I think the code would have used the
mx value if at the time it could have gotten at that through jvm interfaces.
> 2) Is there a test case that goes through the else clause above that I
> could use for my experimentation?
>
> Thanks
>
> Kathey
>


Re: How Much Memory for hash joins

Posted by Knut Anders Hatlen <kn...@oracle.com>.
Katherine Marsden <km...@sbcglobal.net> writes:

> On 2/28/2013 9:11 AM, Mike Matrigali wrote:
>> In BackingStoreHashTableFromScan I see.
>>>
>>>        this.max_inmemory_rowcnt = max_inmemory_rowcnt;
>>>          if( max_inmemory_rowcnt > 0)
>>>              max_inmemory_size = Long.MAX_VALUE;
>>>          else
>>>              max_inmemory_size = Runtime.getRuntime().totalMemory()/100;
>>>
>
>
> I have been reading the comments and trying to make sense of the logic
> and understand all that is happening with max_inmemory_size. I don't
> have a test case that goes through the else part of the condition
> above.
>
> One thing I did notice is that Runtime.getRuntime().totalMemory()
> returns really different things if -Xms is set large, for example with
> nothing else going on with  -Xms1048m -Xmx1048m I get:
>
> Total Memory:1098907648 Free Memory:1088915600
>
> With just    -Xmx1048m
> Total Memory:4194304 Free Memory:2750304
>
> Two questions
> 1) Might hash joins use an unexpectedly large amount of memory if -Xms
> is set large?  I know at the user site where this was being set, they
> were setting -Xms in the hopes of optimizing memory usage but I wonder
> if it actually increased the amount of memory used by hash joins.

I think hash scans pass in a maximum row count to the
BackingStoreHashTable, so the totalMemory() calculation won't be used by
hash joins. The only use of the totalMemory() calculation I'm aware of,
is for scrollable result sets.

> 2) Is there a test case that goes through the else clause above that I
> could use for my experimentation?

Executing this command in ij exercises the else clause:

ij> get scroll insensitive cursor c as 'select * from sys.systables';

-- 
Knut Anders

Re: How Much Memory for hash joins

Posted by Katherine Marsden <km...@sbcglobal.net>.
On 2/28/2013 9:11 AM, Mike Matrigali wrote:
> In BackingStoreHashTableFromScan I see.
>>
>>        this.max_inmemory_rowcnt = max_inmemory_rowcnt;
>>          if( max_inmemory_rowcnt > 0)
>>              max_inmemory_size = Long.MAX_VALUE;
>>          else
>>              max_inmemory_size = Runtime.getRuntime().totalMemory()/100;
>>


I have been reading the comments and trying to make sense of the logic 
and understand all that is happening with max_inmemory_size. I don't 
have a test case that goes through the else part of the condition above.

One thing I did notice is that Runtime.getRuntime().totalMemory() 
returns really different things if -Xms is set large, for example with  
nothing else going on with  -Xms1048m -Xmx1048m I get:

Total Memory:1098907648 Free Memory:1088915600

With just    -Xmx1048m
Total Memory:4194304 Free Memory:2750304

Two questions
1) Might hash joins use an unexpectedly large amount of memory if -Xms 
is set large?  I know at the user site where this was being set, they 
were setting -Xms in the hopes of optimizing memory usage but I wonder 
if it actually increased the amount of memory used by hash joins.
2) Is there a test case that goes through the else clause above that I 
could use for my experimentation?

Thanks

Kathey

Re: How Much Memory for hash joins

Posted by Mike Matrigali <mi...@sbcglobal.net>.
There are some good comments in
java/engine/org/apache/derby/iapi/store/access/BackingStoreHashTable.java which 
this class inherits from.

I am not sure what parameters are passed down usually to this class from
optimizer/execution.

I have always assumed that the intent of the option is per opened 
"table" in store.  This is not user friendly at all since user does not 
really know how this matches up to their query, which is likely
why the option was never made public for a zero admin db.

Internally one of these backing store things can be created for any
index or table access as part of query.  A single query with joins
could have an number of these depending on how many terms are in
the joins.


On 2/27/2013 12:47 PM, Katherine Marsden wrote:
> I was wondering what is the default maximum memory for hash joins.
>
> Looking at OptimizerFactoryImpl I see
> protected int maxMemoryPerTable = 1048576 unless overridden by
> "derby.language.maxMemoryPerTable";
> Is actually intended  per table or per active query?  I don't see the
> property in the documentation.
> If I set this to zero will it turn off hash joins all together?
>
>
> In BackingStoreHashTableFromScan I see.
>
>        this.max_inmemory_rowcnt = max_inmemory_rowcnt;
>          if( max_inmemory_rowcnt > 0)
>              max_inmemory_size = Long.MAX_VALUE;
>          else
>              max_inmemory_size = Runtime.getRuntime().totalMemory()/100;
>
> So what is the intent and actual behavior of
> "derby.language.maxMemoryPerTable" and its default and do they match?
> Are there other factors that go into setting the ceiling for memory
> usage for has joins.
>
> Thanks
>
> Kathey
>
> P.S.
> In actual practice on a *very* old derby version   Apache Derby -
> 10.1.2.1 - (330608) I am looking at a hprof dump which shows almost 2GB
> of Blob and clob objects that trace back to hash joins and a
> BAckingStoreHashTableFromScan objects that have values as below with
> max_inmemory_size as Long.MAX_VALUE as I would expect from the above code.
>
> e.g.
> instance of
> org.apache.derby.impl.store.access.BackingStoreHashTableFromScan@0xa59f7d08
> (63 bytes)
> Class:
> class org.apache.derby.impl.store.access.BackingStoreHashTableFromScan
> Instance data members:
> auxillary_runtimestats (L) : <null>
> diskHashtable (L) : <null>
> hash_table (L) : java.util.Hashtable@0xa59f7d48 (40 bytes)
> inmemory_rowcnt (J) : 8686
> keepAfterCommit (Z) : false
> key_column_numbers (L) : [I@0xa33b6110 (12 bytes)
> max_inmemory_rowcnt (J) : 58254
> max_inmemory_size (J) : 9223372036854775807
> open_scan (L) :
> org.apache.derby.impl.store.access.heap.HeapScan@0xa59f7f10 (64 bytes)
> remove_duplicates (Z) : false
> row_source (L) : <null>
> skipNullKeyColumns (Z) : true
> tc (L) : org.apache.derby.impl.store.access.RAMTransaction@0x8f960428
> (57 bytes)
> References to this object:
> org.apache.derby.impl.sql.execute.HashScanResultSet@0xa33b5fc8 (321
> bytes) : field hashtable
> Other Queries
>