You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by chiyang10000 <gi...@git.apache.org> on 2017/11/28 04:44:33 UTC

[GitHub] incubator-hawq pull request #1314: HAWQ-1555. Add access interfaces for prot...

GitHub user chiyang10000 opened a pull request:

    https://github.com/apache/incubator-hawq/pull/1314

    HAWQ-1555. Add access interfaces for protocol and format in pluggable storage framework

    Please help to review the code.
    It will not affect the original HAWQ internal process, just serving as a invoker wrappers for the following work in the pluggable storage framework. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chiyang10000/incubator-hawq HAWQ-786

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hawq/pull/1314.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1314
    
----
commit 4ac6d67ebe7cb4a6bee2b032181d12125a7b4b67
Author: Chiyang Wan <ch...@gmail.com>
Date:   2017-11-28T01:16:48Z

    HAWQ-1555. Add access interfaces for protocol and format in pluggable storage framework

----


---

[GitHub] incubator-hawq pull request #1314: HAWQ-1555. Add access interfaces for prot...

Posted by chiyang10000 <gi...@git.apache.org>.
Github user chiyang10000 closed the pull request at:

    https://github.com/apache/incubator-hawq/pull/1314


---

[GitHub] incubator-hawq issue #1314: HAWQ-1555. Add access interfaces for protocol an...

Posted by stanlyxiang <gi...@git.apache.org>.
Github user stanlyxiang commented on the issue:

    https://github.com/apache/incubator-hawq/pull/1314
  
    +1


---

[GitHub] incubator-hawq issue #1314: HAWQ-1555. Add access interfaces for protocol an...

Posted by jiny2 <gi...@git.apache.org>.
Github user jiny2 commented on the issue:

    https://github.com/apache/incubator-hawq/pull/1314
  
    +1 
    
    I think it makes sense to have partial facility code checked in for simplifying the following code merge and delivery work. Therefore, per my understanding, it is not necessary to provide feature test cases in current phase, as all existing HAWQ test cases target end-to-end process. We can do it later when some functions work. Of course test case is a must do.
    
    @chiyang10000 for this delivery please make sure current existing HAWQ feature test cases pass without unexpected errors raised. 


---

[GitHub] incubator-hawq pull request #1314: HAWQ-1555. Add access interfaces for prot...

Posted by jiny2 <gi...@git.apache.org>.
Github user jiny2 commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1314#discussion_r154278719
  
    --- Diff: src/include/access/plugstorage.h ---
    @@ -0,0 +1,221 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +/*-------------------------------------------------------------------------
    + *
    + * plugstorage.h
    + *
    + *    Pluggable storage definitions. Support external table for now.
    + *
    + *-------------------------------------------------------------------------
    + */
    +
    +#ifndef PLUGSTORAGE_H
    +#define PLUGSTORAGE_H
    +
    +#include "postgres.h"
    +#include "postgres_ext.h"
    +#include "fmgr.h"
    +#include "nodes/pg_list.h"
    +#include "nodes/plannodes.h"
    +#include "nodes/execnodes.h"
    +#include "access/sdir.h"
    +#include "access/relscan.h"
    +#include "access/extprotocol.h"
    +#include "access/tupdesc.h"
    +#include "access/fileam.h"
    +#include "utils/relcache.h"
    +#include "executor/tuptable.h"
    +
    +/* From src/include/access/fileam.h */
    +extern char *getExtTblCategoryInFmtOptsStr(char *fmtStr);
    +extern char *getExtTblFormatterTypeInFmtOptsStr(char *fmtStr);
    +extern char *getExtTblFormatterTypeInFmtOptsList(List *fmtOpts);
    +
    +
    +/*
    + * ExternalTableType
    + *
    + *    enum for different types of external tables.
    + *
    + *    The different types of external tables can be combinations of different
    + *    protocols and formats. To be specific:
    + *    {GPFDIST, GPFDISTS, HTTP, COMMAND, HDFS} X {TEXT, CSV, ORC}
    + *
    + *    NOTE:
    + *
    + *    The GENERIC external table type is used to simplify the call back
    + *    implementation for different combination of external table formats
    + *    and protocols. It need to be improved so that each external table
    + *    type should get its access method with minimum cost during runtime.
    + *
    + *    The fact is that for custom protocol (HDFS), the format types for
    + *    TEXT, CSV, and ORC external tables are all 'b' in pg_exttable catalog
    + *    table. The formatter information is stored in format options which is
    + *    a list strings. Thus, during read and write access of these tables,
    + *    there will be performance issue if we get the format type and access
    + *    method for them for each tuple.
    + *
    + */
    +typedef enum
    +{
    +	ExternalTableType_GENERIC,     /* GENERIC external table format and protocol */
    +	ExternalTableType_PLUG,        /* Pluggable format with hdfs protocol, i.e., ORC */
    +	ExternalTableType_Invalid
    +} ExternalTableType;
    +
    +/*
    + * PlugStorageValidatorData is the node type that is passed as fmgr "context"
    + * info when a function is called by the Pluggable Storage Framework.
    + */
    +typedef struct PlugStorageValidatorData
    +{
    +	/* see T_PlugStorageValidatorData */
    +	NodeTag		type;
    +
    +	/* check if format interfaces are implemented in pg_proc */
    +	char		*format_name;
    +
    +	/* check if format options and their values are valid */
    +	List		*format_opts;
    +	char		*format_str;
    +	char		*encoding_name;
    +	bool		is_writable;
    +
    +	/* check if format data types are supported in table definition */
    +	TupleDesc	tuple_desc;
    +
    +} PlugStorageValidatorData;
    +
    +typedef struct PlugStorageValidatorData *PlugStorageValidator;
    +
    +/*
    + * PlugStorageData is used to pass args between hawq and pluggable data formats.
    + */
    +typedef struct PlugStorageData
    +{
    +	NodeTag                type;           /* see T_PlugStorageData */
    +	int                    ps_table_type;
    +	int                    ps_formatter_type;
    +	char                  *ps_formatter_name;
    +	int                    ps_error_ao_seg_no;
    +	Relation               ps_relation;
    +	PlanState             *ps_plan_state;
    +	ExternalScan          *ps_ext_scan;
    +	ScanState             *ps_scan_state;
    +	ScanDirection          ps_scan_direction;
    +	FileScanDesc           ps_file_scan_desc;
    +	ExternalScanState     *ps_ext_scan_state;
    +	ResultRelSegFileInfo  *ps_result_seg_file_info;
    +	ExternalInsertDesc     ps_ext_insert_desc;
    +	ExternalSelectDesc     ps_ext_select_desc;
    +	bool                   ps_has_tuple;
    +	Oid                    ps_tuple_oid;
    +	TupleTableSlot        *ps_tuple_table_slot;
    +
    +} PlugStorageData;
    +
    +typedef PlugStorageData *PlugStorage;
    +
    +
    +/*
    + * getExternalTableTypeList
    + * getExternalTableTypeStr
    + *
    + *    Return the table type for a given external table
    + *
    + *    Currently it is an implementation with performance cost due to the
    + *    fact that the format types for TEXT, CSV, and ORC external tables
    + *    with customer protocol (HDFS) are all 'b' in pg_exttable catalog
    + *    table. It needs to visit the format options to get the table type.
    + */
    +void getExternalTableTypeList(const char formatType,
    +                              List *formatOptions,
    +                              int *formatterType,
    +                              char **formatterName);
    +
    +void getExternalTableTypeStr(const char formatType,
    +                             char *formatOptions,
    +							 int *formatterType,
    --- End diff --
    
    please double check this format


---

[GitHub] incubator-hawq pull request #1314: HAWQ-1555. Add access interfaces for prot...

Posted by stanlyxiang <gi...@git.apache.org>.
Github user stanlyxiang commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1314#discussion_r154280568
  
    --- Diff: src/backend/access/external/plugstorage.c ---
    @@ -0,0 +1,533 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +/*-------------------------------------------------------------------------
    + *
    + * plugstorage.c
    + *
    + *    Pluggable storage implementation. Support external table for now.
    + *
    + *-------------------------------------------------------------------------
    + */
    +#include "access/plugstorage.h"
    +#include "catalog/pg_type.h"
    +#include "catalog/pg_proc.h"
    +#include "catalog/pg_exttable.h"
    +#include "nodes/value.h"
    +#include "parser/parse_func.h"
    +
    +/*
    + * getExternalTableTypeList
    + *
    + *    Return the table type for a given external table
    + *
    + *    Currently it is an implementation with performance cost due to the
    + *    fact that the format types for TEXT, CSV, and ORC external tables
    --- End diff --
    
    currently there is no type "TEXT" "CSV" and "ORC".


---

[GitHub] incubator-hawq pull request #1314: HAWQ-1555. Add access interfaces for prot...

Posted by jiny2 <gi...@git.apache.org>.
Github user jiny2 commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1314#discussion_r154278905
  
    --- Diff: src/backend/access/external/Makefile ---
    @@ -26,7 +26,7 @@ subdir = src/backend/access/external
     top_builddir = ../../../..
     include $(top_builddir)/src/Makefile.global
     
    -OBJS = fileam.o url.o libchurl.o hd_work_mgr.o pxfuriparser.o pxfheaders.o \
    +OBJS = fileam.o plugstorage.o url.o libchurl.o hd_work_mgr.o pxfuriparser.o pxfheaders.o \
    --- End diff --
    
    I think it is better to add this new object to the tail of OBJS list


---

[GitHub] incubator-hawq issue #1314: HAWQ-1555. Add access interfaces for protocol an...

Posted by chiyang10000 <gi...@git.apache.org>.
Github user chiyang10000 commented on the issue:

    https://github.com/apache/incubator-hawq/pull/1314
  
    Thanks for your guys detailed reviews and comments. I have updated my code.


---

[GitHub] incubator-hawq pull request #1314: HAWQ-1555. Add access interfaces for prot...

Posted by jiny2 <gi...@git.apache.org>.
Github user jiny2 commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/1314#discussion_r154279848
  
    --- Diff: src/backend/access/external/plugstorage.c ---
    @@ -0,0 +1,533 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +/*-------------------------------------------------------------------------
    + *
    + * plugstorage.c
    + *
    + *    Pluggable storage implementation. Support external table for now.
    + *
    + *-------------------------------------------------------------------------
    + */
    +#include "access/plugstorage.h"
    +#include "catalog/pg_type.h"
    +#include "catalog/pg_proc.h"
    +#include "catalog/pg_exttable.h"
    +#include "nodes/value.h"
    +#include "parser/parse_func.h"
    +
    +/*
    + * getExternalTableTypeList
    + *
    + *    Return the table type for a given external table
    + *
    + *    Currently it is an implementation with performance cost due to the
    + *    fact that the format types for TEXT, CSV, and ORC external tables
    + *    with customer protocol (HDFS) are all 'b' in pg_exttable catalog
    + *    table. It needs to visit the format options to get the table type.
    + */
    +void getExternalTableTypeList(const char formatType,
    +                              List *formatOptions,
    +                              int *formatterType,
    +                              char **formatterName)
    +{
    +	if (fmttype_is_custom(formatType))
    +	{
    +		Assert(formatOptions);
    +
    +		*formatterName = getExtTblFormatterTypeInFmtOptsList(formatOptions);
    +
    +		*formatterType = ExternalTableType_PLUG;
    +	}
    +	else
    +	{
    +		*formatterType = ExternalTableType_Invalid;
    +		*formatterName = NULL;
    +		elog(ERROR, "undefined external table format type: %c", formatType);
    +	}
    +}
    +
    +void getExternalTableTypeStr(const char formatType,
    --- End diff --
    
    We need some descriptions like the function above, please add more comments before delivery


---

[GitHub] incubator-hawq issue #1314: HAWQ-1555. Add access interfaces for protocol an...

Posted by huor <gi...@git.apache.org>.
Github user huor commented on the issue:

    https://github.com/apache/incubator-hawq/pull/1314
  
    Great to wrap up the pluggable storage framework work and start merging the work! +1


---

[GitHub] incubator-hawq issue #1314: HAWQ-1555. Add access interfaces for protocol an...

Posted by interma <gi...@git.apache.org>.
Github user interma commented on the issue:

    https://github.com/apache/incubator-hawq/pull/1314
  
    Hi chiyang
    Can you provide some unit/feature tests with you PR?
    You can refer the existed tests in here: 
    https://github.com/apache/incubator-hawq/tree/master/src/test/feature
    
    Thanks!



---