You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Taewoo Kim (JIRA)" <ji...@apache.org> on 2016/08/02 17:02:20 UTC
[jira] [Comment Edited] (ASTERIXDB-1556) Prefix-based multi-way
Fuzzy-join generates an exception.
[ https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403898#comment-15403898 ]
Taewoo Kim edited comment on ASTERIXDB-1556 at 8/2/16 5:01 PM:
---------------------------------------------------------------
It seems that the compiler doesn't set the hash table size for external-group-by, in-memory-hash-join, and hash-group-by during APIFramework.compileQuery(). Only the following settings are applied.
{code:title=APIFramework.java|borderStyle=solid}
AsterixCompilerProperties compilerProperties = AsterixAppContextInfo.getInstance().getCompilerProperties();
int frameSize = compilerProperties.getFrameSize();
int sortFrameLimit = (int) (compilerProperties.getSortMemorySize() / frameSize);
int groupFrameLimit = (int) (compilerProperties.getGroupMemorySize() / frameSize);
int joinFrameLimit = (int) (compilerProperties.getJoinMemorySize() / frameSize);
OptimizationConfUtil.getPhysicalOptimizationConfig().setFrameSize(frameSize);
OptimizationConfUtil.getPhysicalOptimizationConfig().setMaxFramesExternalSort(sortFrameLimit);
OptimizationConfUtil.getPhysicalOptimizationConfig().setMaxFramesExternalGroupBy(groupFrameLimit);
OptimizationConfUtil.getPhysicalOptimizationConfig().setMaxFramesForJoin(joinFrameLimit);
{code}
Here, the number of frame limit is set. However, the hash table size is always set to 10,485,767 based on the following setting in PhysicalOptimizationConfig().
{code:title=PhysicalOptimizationConfig.java|borderStyle=solid}
public PhysicalOptimizationConfig() {
int frameSize = 32768;
setInt(FRAMESIZE, frameSize);
setInt(MAX_FRAMES_EXTERNAL_SORT, (int) (((long) 32 * MB) / frameSize));
setInt(MAX_FRAMES_EXTERNAL_GROUP_BY, (int) (((long) 32 * MB) / frameSize));
// use http://www.rsok.com/~jrm/printprimes.html to find prime numbers
setInt(DEFAULT_HASH_GROUP_TABLE_SIZE, 10485767);
setInt(DEFAULT_EXTERNAL_GROUP_TABLE_SIZE, 10485767);
setInt(DEFAULT_IN_MEM_HASH_JOIN_TABLE_SIZE, 10485767);
}
{code}
Though we have three methods that can change the default table size, there are no callers for these methods.
{code:title=PhysicalOptimizationConfig.java|borderStyle=solid}
public void setExternalGroupByTableSize(int tableSize) {
setInt(DEFAULT_EXTERNAL_GROUP_TABLE_SIZE, tableSize);
}
public void setInMemHashJoinTableSize(int tableSize) {
setInt(DEFAULT_IN_MEM_HASH_JOIN_TABLE_SIZE, tableSize);
}
public void setHashGroupByTableSize(int tableSize) {
setInt(DEFAULT_HASH_GROUP_TABLE_SIZE, tableSize);
}
{code}
I checked the hybrid-hash join part and it seems that The callers that create a hash table in the join part is well adjusted based on the number of tuples (file sizes). But, for Group-by, there is no such setting. So, the HashSpillableTableFactory.buildSpillableTable() always set the 8 (INT_SIZE * 2) times of the table size (10,485,767), which is 8 * 10,485,767 = 80MB. So, for example, if we have 8 partitions, then engine always assigns 80 * 8 = 640MB for each group-by operator.
{code:title=SerializableHashTable.java|borderStyle=solid}
public SerializableHashTable(int tableSize, final IHyracksFrameMgrContext ctx) throws HyracksDataException {
this.ctx = ctx;
int frameSize = ctx.getInitialFrameSize();
int residual = tableSize * INT_SIZE * 2 % frameSize == 0 ? 0 : 1;
int headerSize = tableSize * INT_SIZE * 2 / frameSize + residual;
headers = new IntSerDeBuffer[headerSize];
IntSerDeBuffer frame = new IntSerDeBuffer(ctx.allocateFrame().array());
contents.add(frame);
frameCurrentIndex.add(0);
frameCapacity = frame.capacity();
}
{code}
was (Author: wangsaeu):
It seems that the compiler doesn't set the hash table size for external-group-by, in-memory-hash-join, and hash-group-by during APIFramework.compileQuery(). Only the following settings are applied.
AsterixCompilerProperties compilerProperties = AsterixAppContextInfo.getInstance().getCompilerProperties();
int frameSize = compilerProperties.getFrameSize();
int sortFrameLimit = (int) (compilerProperties.getSortMemorySize() / frameSize);
int groupFrameLimit = (int) (compilerProperties.getGroupMemorySize() / frameSize);
int joinFrameLimit = (int) (compilerProperties.getJoinMemorySize() / frameSize);
OptimizationConfUtil.getPhysicalOptimizationConfig().setFrameSize(frameSize);
OptimizationConfUtil.getPhysicalOptimizationConfig().setMaxFramesExternalSort(sortFrameLimit);
OptimizationConfUtil.getPhysicalOptimizationConfig().setMaxFramesExternalGroupBy(groupFrameLimit);
OptimizationConfUtil.getPhysicalOptimizationConfig().setMaxFramesForJoin(joinFrameLimit);
Here, the number of frame limit is set. However, the hash table size is always set to 10,485,767 based on the following setting in PhysicalOptimizationConfig().
public PhysicalOptimizationConfig() {
int frameSize = 32768;
setInt(FRAMESIZE, frameSize);
setInt(MAX_FRAMES_EXTERNAL_SORT, (int) (((long) 32 * MB) / frameSize));
setInt(MAX_FRAMES_EXTERNAL_GROUP_BY, (int) (((long) 32 * MB) / frameSize));
// use http://www.rsok.com/~jrm/printprimes.html to find prime numbers
setInt(DEFAULT_HASH_GROUP_TABLE_SIZE, 10485767);
setInt(DEFAULT_EXTERNAL_GROUP_TABLE_SIZE, 10485767);
setInt(DEFAULT_IN_MEM_HASH_JOIN_TABLE_SIZE, 10485767);
}
Though we have three methods that can change the default table size, there are no callers for these methods.
public void setExternalGroupByTableSize(int tableSize) {
setInt(DEFAULT_EXTERNAL_GROUP_TABLE_SIZE, tableSize);
}
public void setInMemHashJoinTableSize(int tableSize) {
setInt(DEFAULT_IN_MEM_HASH_JOIN_TABLE_SIZE, tableSize);
}
public void setHashGroupByTableSize(int tableSize) {
setInt(DEFAULT_HASH_GROUP_TABLE_SIZE, tableSize);
}
I checked the hybrid-hash join part and it seems that The callers that create a hash table in the join part is well adjusted based on the number of tuples (file sizes). But, for Group-by, there is no such setting. So, the HashSpillableTableFactory.buildSpillableTable() always set the 8 (INT_SIZE * 2) times of the table size (10,485,767), which is 8 * 10,485,767 = 80MB. So, for example, if we have 8 partitions, then engine always assigns 80 * 8 = 640MB for each group-by operator.
public SerializableHashTable(int tableSize, final IHyracksFrameMgrContext ctx) throws HyracksDataException {
this.ctx = ctx;
int frameSize = ctx.getInitialFrameSize();
int residual = tableSize * INT_SIZE * 2 % frameSize == 0 ? 0 : 1;
int headerSize = tableSize * INT_SIZE * 2 / frameSize + residual;
headers = new IntSerDeBuffer[headerSize];
IntSerDeBuffer frame = new IntSerDeBuffer(ctx.allocateFrame().array());
contents.add(frame);
frameCurrentIndex.add(0);
frameCapacity = frame.capacity();
}
> Prefix-based multi-way Fuzzy-join generates an exception.
> ---------------------------------------------------------
>
> Key: ASTERIXDB-1556
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
> Project: Apache AsterixDB
> Issue Type: Bug
> Reporter: Taewoo Kim
> Assignee: Taewoo Kim
> Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf, 3wayjoin.pdf, 3wayjoin.rtf, 3wayjoinplan.rtf
>
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( > 2), the system generates an out-of-memory exception.
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is translated into massive number of operators (more than 200 operators in the plan for a 3-way fuzzy join), it could generate out-of-memory exception.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)