You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by weinan003 <gi...@git.apache.org> on 2018/03/27 02:43:07 UTC

[GitHub] incubator-hawq pull request #1349: HAWQ-1598. Vectorized Scan Node Framework...

GitHub user weinan003 opened a pull request:

    https://github.com/apache/incubator-hawq/pull/1349

    HAWQ-1598. Vectorized Scan Node Framework initialization

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/weinan003/incubator-hawq 1598

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hawq/pull/1349.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1349
    
----
commit 4126df19e498da2764358a4913c9c0a37c0af64b
Author: Weinan Wang <we...@...>
Date:   2018-03-26T10:07:21Z

    HAWQ-1598. Vectorized Scan Node Framework initialization

----


---

[GitHub] incubator-hawq pull request #1349: HAWQ-1598. Vectorized Scan Node Framework...

Posted by weinan003 <gi...@git.apache.org>.
Github user weinan003 commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1349#discussion_r177944185
  
    --- Diff: contrib/vexecutor/vexecutor.c ---
    @@ -66,12 +75,47 @@ _PG_fini(void)
     	vmthd.ExecEndNode_Hook = NULL;
     }
     
    -static PlanState* VExecInitNode(PlanState *node,EState *eState,int eflags)
    +static void backportTupleDescriptor(PlanState* ps,TupleDesc td)
    --- End diff --
    
    Due to Vectorized type unrecognizable in normal execution node, the TupleDesc structure should backport metadata to normal type before we do V->N result pop up. At present, we only support scan node vectorized execution, so we force this backport function invoked. In the future, we will check if its parent node is "N" type. I will comment the function


---

[GitHub] incubator-hawq pull request #1349: HAWQ-1598. Vectorized Scan Node Framework...

Posted by weinan003 <gi...@git.apache.org>.
Github user weinan003 commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1349#discussion_r177632783
  
    --- Diff: contrib/vexecutor/vadt.c ---
    @@ -39,10 +39,11 @@ size_t v##type##Size(vheader *vh) \
     vheader* buildv##type(int n) \
     { \
     	vheader* result; \
    -	result = (vheader*) palloc0(offsetof(v##type,values) + n * sizeof(type)); \
    +	result = (vheader*) malloc(offsetof(v##type,values) + n * sizeof(type)); \
    +	memset(result,0,offsetof(v##type,values) + n * sizeof(type)); \
     	result->dim = n; \
     	result->elemtype = typeoid; \
    -	result->isnull = palloc(sizeof(bool) * n); \
    +	result->isnull = malloc(sizeof(bool) * n); \
    --- End diff --
    
    It does make sense, I will attach another commit change it


---

[GitHub] incubator-hawq pull request #1349: HAWQ-1598. Vectorized Scan Node Framework...

Posted by interma <gi...@git.apache.org>.
Github user interma commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1349#discussion_r177326032
  
    --- Diff: src/test/feature/vexecutor/test_vexecutor.cpp ---
    @@ -60,11 +59,47 @@ TEST_F(TestVexecutor, vadt)
     
     	// run sql file to get ans file and then diff it with out file
     	util.execSQLFile("vexecutor/sql/create_type.sql",
    -	                 "vexecutor/ans/create_type.ans");
    +					 "vexecutor/ans/create_type.ans");
     
     	util.execSQLFile("vexecutor/sql/vadt.sql",
    -	                 "vexecutor/ans/vadt.ans");
    +					 "vexecutor/ans/vadt.ans");
     
     	util.execSQLFile("vexecutor/sql/drop_type.sql",
    -	                 "vexecutor/ans/drop_type.ans");
    +					 "vexecutor/ans/drop_type.ans");
     }
    +
    +TEST_F(TestVexecutor, scanframework)
    +{
    +	hawq::test::SQLUtility util;
    +
    +	util.execute("drop table if exists test1");
    +	util.execute("create table test1 ("
    +				 "	unique1		int4,"
    +				 "	unique2		int4,"
    +				 "	two			int4,"
    +				 "	four		int4,"
    +				 "	ten			int4,"
    +				 "	twenty		int4,"
    +				 "	hundred		int4,"
    +				 "	thousand	int4,"
    +				 "	twothousand	int4,"
    +				 "	fivethous	int4,"
    +				 "	tenthous	int4,"
    +				 "	odd			int4,"
    +				 "	even		int4,"
    +				 "	stringu1	name,"
    +				 "	stringu2	name,"
    +				 "	string4		name) WITH (appendonly = true, orientation = PARQUET, pagesize = 1048576, rowgroupsize = 8388608, compresstype = SNAPPY) DISTRIBUTED RANDOMLY;");
    +
    +  std::string pwd = util.getTestRootPath();
    +  std::string cmd = "COPY test1 FROM '" + pwd + "/vexecutor/data/tenk.data'";
    --- End diff --
    
    why need so many data in tenk.data?


---

[GitHub] incubator-hawq pull request #1349: HAWQ-1598. Vectorized Scan Node Framework...

Posted by weinan003 <gi...@git.apache.org>.
Github user weinan003 commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1349#discussion_r177948142
  
    --- Diff: contrib/vexecutor/vexecutor.c ---
    @@ -96,20 +140,70 @@ static PlanState* VExecInitNode(PlanState *node,EState *eState,int eflags)
     		((VectorizedState*)(subState->vectorized))->parent = node;
     	}
     
    -	//***TODO:process the vectorized execution operators here
    +	if(Gp_role != GP_ROLE_DISPATCH)
    +	{
    +		switch (nodeTag(node) )
    +		{
    +			case T_AppendOnlyScan:
    +			case T_ParquetScan:
    +			case T_TableScanState:
    +				START_MEMORY_ACCOUNT(curMemoryAccount);
    +					{
    +						TupleDesc td = ((TableScanState *)node)->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
    +						((TableScanState *)node)->ss.ss_ScanTupleSlot->PRIVATE_tb = PointerGetDatum(tbGenerate(td->natts,BATCHSIZE));
    +						node->ps_ResultTupleSlot->PRIVATE_tb = PointerGetDatum(tbGenerate(td->natts,BATCHSIZE));
    +						/* if V->N */
    +						backportTupleDescriptor(node,node->ps_ResultTupleSlot->tts_tupleDescriptor);
    +					}
    +						END_MEMORY_ACCOUNT();
    +				break;
    +			default:
    +				((VectorizedState *)node->vectorized)->vectorized = false;
    +				break;
    +		}
    +	}
     
    -	elog(DEBUG3, "PG VEXEC INIT NODE");
     	return node;
     }
     static TupleTableSlot* VExecProcNode(PlanState *node)
     {
    -	return NULL;
    +    TupleTableSlot* result = NULL;
    +    switch(nodeTag(node))
    +    {
    +        case T_ParquetScanState:
    +        case T_AppendOnlyScanState:
    +        case T_TableScanState:
    +            result = ExecTableVScan((TableScanState*)node);
    +            break;
    +        default:
    +            break;
    +    }
    +    return result;
     }
     
     static bool VExecEndNode(PlanState *node)
     {
    +    if(Gp_role == GP_ROLE_DISPATCH)
    +		return false;
    +
     	elog(DEBUG3, "PG VEXEC END NODE");
    -	return false;
    +	bool ret = false;
    +	switch (nodeTag(node))
    +	{
    +		case T_AppendOnlyScanState:
    +		case T_ParquetScanState:
    +            tbDestroy((TupleBatch *) (&node->ps_ResultTupleSlot->PRIVATE_tb));
    +            tbDestroy((TupleBatch *) (&((TableScanState *) node)->ss.ss_ScanTupleSlot->PRIVATE_tb));
    +			ret = true;
    +			break;
    +		case T_TableScanState:
    --- End diff --
    
    In Init process, we prefer to replace some metadata after normal init process done, because both normal and vectorized init process most part are same. In the End step, in the current, I think there is no more we need to do. Tuplebatch buffer which is reusable buffer can be recycled by memory context, others data destruct in the normal source release function.


---

[GitHub] incubator-hawq pull request #1349: HAWQ-1598. Vectorized Scan Node Framework...

Posted by wengyanqing <gi...@git.apache.org>.
Github user wengyanqing commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1349#discussion_r177941676
  
    --- Diff: contrib/vexecutor/vexecutor.c ---
    @@ -96,20 +140,70 @@ static PlanState* VExecInitNode(PlanState *node,EState *eState,int eflags)
     		((VectorizedState*)(subState->vectorized))->parent = node;
     	}
     
    -	//***TODO:process the vectorized execution operators here
    +	if(Gp_role != GP_ROLE_DISPATCH)
    +	{
    +		switch (nodeTag(node) )
    +		{
    +			case T_AppendOnlyScan:
    +			case T_ParquetScan:
    +			case T_TableScanState:
    +				START_MEMORY_ACCOUNT(curMemoryAccount);
    +					{
    +						TupleDesc td = ((TableScanState *)node)->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
    +						((TableScanState *)node)->ss.ss_ScanTupleSlot->PRIVATE_tb = PointerGetDatum(tbGenerate(td->natts,BATCHSIZE));
    +						node->ps_ResultTupleSlot->PRIVATE_tb = PointerGetDatum(tbGenerate(td->natts,BATCHSIZE));
    +						/* if V->N */
    +						backportTupleDescriptor(node,node->ps_ResultTupleSlot->tts_tupleDescriptor);
    +					}
    +						END_MEMORY_ACCOUNT();
    +				break;
    +			default:
    +				((VectorizedState *)node->vectorized)->vectorized = false;
    +				break;
    +		}
    +	}
     
    -	elog(DEBUG3, "PG VEXEC INIT NODE");
     	return node;
     }
     static TupleTableSlot* VExecProcNode(PlanState *node)
     {
    -	return NULL;
    +    TupleTableSlot* result = NULL;
    +    switch(nodeTag(node))
    +    {
    +        case T_ParquetScanState:
    +        case T_AppendOnlyScanState:
    +        case T_TableScanState:
    +            result = ExecTableVScan((TableScanState*)node);
    +            break;
    +        default:
    +            break;
    +    }
    +    return result;
     }
     
     static bool VExecEndNode(PlanState *node)
     {
    +    if(Gp_role == GP_ROLE_DISPATCH)
    +		return false;
    +
     	elog(DEBUG3, "PG VEXEC END NODE");
    -	return false;
    +	bool ret = false;
    +	switch (nodeTag(node))
    +	{
    +		case T_AppendOnlyScanState:
    +		case T_ParquetScanState:
    +            tbDestroy((TupleBatch *) (&node->ps_ResultTupleSlot->PRIVATE_tb));
    +            tbDestroy((TupleBatch *) (&((TableScanState *) node)->ss.ss_ScanTupleSlot->PRIVATE_tb));
    +			ret = true;
    +			break;
    +		case T_TableScanState:
    --- End diff --
    
    The logic of case T_TableScanState in VExecEndNode is not the same as VExecInitNode, what's the difference ?


---

[GitHub] incubator-hawq pull request #1349: HAWQ-1598. Vectorized Scan Node Framework...

Posted by weinan003 <gi...@git.apache.org>.
Github user weinan003 closed the pull request at:

    https://github.com/apache/incubator-hawq/pull/1349


---

[GitHub] incubator-hawq pull request #1349: HAWQ-1598. Vectorized Scan Node Framework...

Posted by wengyanqing <gi...@git.apache.org>.
Github user wengyanqing commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1349#discussion_r177940619
  
    --- Diff: contrib/vexecutor/vexecutor.c ---
    @@ -21,15 +21,23 @@
     #include "vexecutor.h"
     #include "utils/guc.h"
     #include "vcheck.h"
    +#include "miscadmin.h"
    +#include "tuplebatch.h"
    +#include "executor/nodeTableScan.h"
    +#include "catalog/catquery.h"
    +#include "cdb/cdbvars.h"
    +#include "execVScan.h"
     
     PG_MODULE_MAGIC;
    -
    +#define BATCHSIZE 1024
    --- End diff --
    
    It's better to make BATCHSIZE as GUC so that we could tune this value easily.


---

[GitHub] incubator-hawq pull request #1349: HAWQ-1598. Vectorized Scan Node Framework...

Posted by weinan003 <gi...@git.apache.org>.
Github user weinan003 commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1349#discussion_r177945087
  
    --- Diff: contrib/vexecutor/execVScan.c ---
    @@ -0,0 +1,167 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +#include "execVScan.h"
    +#include "miscadmin.h"
    +
    +static TupleTableSlot*
    +ExecVScan(ScanState *node, ExecScanAccessMtd accessMtd);
    +
    +static const ScanMethod *
    +getVScanMethod(int tableType)
    +{
    +    static const ScanMethod scanMethods[] =
    +            {
    +                    //HEAPSCAN
    +                    {
    +                    },
    +                    //APPENDONLYSCAN
    +                    {
    +                            &AppendOnlyScanNext, &BeginScanAppendOnlyRelation, &EndScanAppendOnlyRelation,
    --- End diff --
    
    Well, this PR only provide a vectorized scan framework, AO and Parquet vectorized scan function will implement later. For easy to do feature test for this PR. I use normal AO and Parquet function as the tuple scanner under the framework


---

[GitHub] incubator-hawq pull request #1349: HAWQ-1598. Vectorized Scan Node Framework...

Posted by wengyanqing <gi...@git.apache.org>.
Github user wengyanqing commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1349#discussion_r177942413
  
    --- Diff: contrib/vexecutor/vexecutor.c ---
    @@ -66,12 +75,47 @@ _PG_fini(void)
     	vmthd.ExecEndNode_Hook = NULL;
     }
     
    -static PlanState* VExecInitNode(PlanState *node,EState *eState,int eflags)
    +static void backportTupleDescriptor(PlanState* ps,TupleDesc td)
    --- End diff --
    
    What's the purpose of this function ? Would you give the comment ?


---

[GitHub] incubator-hawq pull request #1349: HAWQ-1598. Vectorized Scan Node Framework...

Posted by weinan003 <gi...@git.apache.org>.
Github user weinan003 commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1349#discussion_r177366916
  
    --- Diff: src/test/feature/vexecutor/test_vexecutor.cpp ---
    @@ -60,11 +59,47 @@ TEST_F(TestVexecutor, vadt)
     
     	// run sql file to get ans file and then diff it with out file
     	util.execSQLFile("vexecutor/sql/create_type.sql",
    -	                 "vexecutor/ans/create_type.ans");
    +					 "vexecutor/ans/create_type.ans");
     
     	util.execSQLFile("vexecutor/sql/vadt.sql",
    -	                 "vexecutor/ans/vadt.ans");
    +					 "vexecutor/ans/vadt.ans");
     
     	util.execSQLFile("vexecutor/sql/drop_type.sql",
    -	                 "vexecutor/ans/drop_type.ans");
    +					 "vexecutor/ans/drop_type.ans");
     }
    +
    +TEST_F(TestVexecutor, scanframework)
    +{
    +	hawq::test::SQLUtility util;
    +
    +	util.execute("drop table if exists test1");
    +	util.execute("create table test1 ("
    +				 "	unique1		int4,"
    +				 "	unique2		int4,"
    +				 "	two			int4,"
    +				 "	four		int4,"
    +				 "	ten			int4,"
    +				 "	twenty		int4,"
    +				 "	hundred		int4,"
    +				 "	thousand	int4,"
    +				 "	twothousand	int4,"
    +				 "	fivethous	int4,"
    +				 "	tenthous	int4,"
    +				 "	odd			int4,"
    +				 "	even		int4,"
    +				 "	stringu1	name,"
    +				 "	stringu2	name,"
    +				 "	string4		name) WITH (appendonly = true, orientation = PARQUET, pagesize = 1048576, rowgroupsize = 8388608, compresstype = SNAPPY) DISTRIBUTED RANDOMLY;");
    +
    +  std::string pwd = util.getTestRootPath();
    +  std::string cmd = "COPY test1 FROM '" + pwd + "/vexecutor/data/tenk.data'";
    --- End diff --
    
    I believe that it can relieve future test workload.


---

[GitHub] incubator-hawq pull request #1349: HAWQ-1598. Vectorized Scan Node Framework...

Posted by interma <gi...@git.apache.org>.
Github user interma commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1349#discussion_r177327550
  
    --- Diff: contrib/vexecutor/vadt.c ---
    @@ -39,10 +39,11 @@ size_t v##type##Size(vheader *vh) \
     vheader* buildv##type(int n) \
     { \
     	vheader* result; \
    -	result = (vheader*) palloc0(offsetof(v##type,values) + n * sizeof(type)); \
    +	result = (vheader*) malloc(offsetof(v##type,values) + n * sizeof(type)); \
    +	memset(result,0,offsetof(v##type,values) + n * sizeof(type)); \
     	result->dim = n; \
     	result->elemtype = typeoid; \
    -	result->isnull = palloc(sizeof(bool) * n); \
    +	result->isnull = malloc(sizeof(bool) * n); \
    --- End diff --
    
    why use malloc here? and the below code is still palloc()


---

[GitHub] incubator-hawq issue #1349: HAWQ-1598. Vectorized Scan Node Framework initia...

Posted by amyrazz44 <gi...@git.apache.org>.
Github user amyrazz44 commented on the issue:

    https://github.com/apache/incubator-hawq/pull/1349
  
    LGTM +1


---

[GitHub] incubator-hawq pull request #1349: HAWQ-1598. Vectorized Scan Node Framework...

Posted by wengyanqing <gi...@git.apache.org>.
Github user wengyanqing commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1349#discussion_r177941951
  
    --- Diff: contrib/vexecutor/execVScan.c ---
    @@ -0,0 +1,167 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +#include "execVScan.h"
    +#include "miscadmin.h"
    +
    +static TupleTableSlot*
    +ExecVScan(ScanState *node, ExecScanAccessMtd accessMtd);
    +
    +static const ScanMethod *
    +getVScanMethod(int tableType)
    +{
    +    static const ScanMethod scanMethods[] =
    +            {
    +                    //HEAPSCAN
    +                    {
    +                    },
    +                    //APPENDONLYSCAN
    +                    {
    +                            &AppendOnlyScanNext, &BeginScanAppendOnlyRelation, &EndScanAppendOnlyRelation,
    --- End diff --
    
    Are these scan functions placeholders?


---