You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hawq.apache.org by 来熊 <yi...@163.com> on 2016/09/21 07:25:27 UTC

回复:Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)


My environment is >>>> : 1 master 3 segments
SQL >>>>>>: 
CREATE TABLE call_center (
    cc_call_center_sk integer,
    cc_call_center_id character varying(16),
    cc_rec_start_date date,
    cc_rec_end_date date,
    cc_closed_date_sk integer,
    cc_open_date_sk integer,
    cc_name character varying(50),
    cc_class character varying(50),
    cc_employees integer,
    cc_sq_ft integer,
    cc_hours character varying(20),
    cc_manager character varying(40),
    cc_mkt_id integer,
    cc_mkt_class character varying(50),
    cc_mkt_desc character varying(100),
    cc_market_manager character varying(40),
    cc_division text,
    cc_division_name character varying(50),
    cc_company text,
    cc_company_name character varying(50),
    cc_street_number character varying(10),
    cc_street_name character varying(60),
    cc_street_type character varying(15),
    cc_suite_number character varying(10),
    cc_city character varying(60),
    cc_county character varying(30),
    cc_state text,
    cc_zip character varying(10),
    cc_country character varying(20),
    cc_gmt_offset numeric(5,2),
    cc_tax_percentage numeric(5,2)
)
WITH (appendonly=true, orientation=parquet)
DISTRIBUTED BY (cc_call_center_sk);


CREATE EXTERNAL TABLE ext_call_center (like call_center)
LOCATION (
'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');


insert into call_center select * from ext_call_center;


ERROR:  External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)


hawq config >>>>:


GUC: add_missing_fromValue: off
GUC: application_nameValue: 
GUC: array_nullsValue: on
GUC: authentication_timeoutValue: 60
GUC: backslash_quoteValue: safe_encoding
GUC: block_sizeValue: 32768
GUC: bonjour_nameValue: 
GUC: check_function_bodiesValue: on
GUC: client_encodingValue: UTF8
GUC: client_min_messagesValue: ERROR
GUC: cpu_index_tuple_costValue: 0.005
GUC: cpu_operator_costValue: 0.0025
GUC: cpu_tuple_costValue: 0.01
GUC: cursor_tuple_fractionValue: 1
GUC: custom_variable_classesValue: 
GUC: DateStyleValue: ISO, MDY
GUC: db_user_namespaceValue: off
GUC: deadlock_timeoutValue: 1000
GUC: debug_assertionsValue: off
GUC: debug_pretty_printValue: off
GUC: debug_print_parseValue: off
GUC: debug_print_planValue: off
GUC: debug_print_prelim_planValue: off
GUC: debug_print_rewrittenValue: off
GUC: debug_print_slice_tableValue: off
GUC: default_hash_table_bucket_numberValue: 6
GUC: default_statement_memValue: 128000
GUC: default_statistics_targetValue: 25
GUC: default_tablespaceValue: 
GUC: default_transaction_isolationValue: read committed
GUC: default_transaction_read_onlyValue: off
GUC: dfs_urlValue: localhost:8020/hawq
GUC: dynamic_library_pathValue: $libdir
GUC: effective_cache_sizeValue: 16384
GUC: enable_bitmapscanValue: on
GUC: enable_groupaggValue: on
GUC: enable_hashaggValue: on
GUC: enable_hashjoinValue: on
GUC: enable_indexscanValue: on
GUC: enable_mergejoinValue: off
GUC: enable_nestloopValue: off
GUC: enable_secure_filesystemValue: off
GUC: enable_seqscanValue: on
GUC: enable_sortValue: on
GUC: enable_tidscanValue: on
GUC: escape_string_warningValue: on
GUC: explain_memory_verbosityValue: suppress
GUC: explain_pretty_printValue: on
GUC: extra_float_digitsValue: 0
GUC: filesystem_support_truncateValue: on
GUC: from_collapse_limitValue: 20
GUC: gp_adjust_selectivity_for_outerjoinsValue: on
GUC: gp_analyze_relative_errorValue: 0.25
GUC: gp_autostats_modeValue: NONE
GUC: gp_autostats_on_change_thresholdValue: 2147483647
GUC: gp_backup_directIOValue: off
GUC: gp_backup_directIO_read_chunk_mbValue: 20
GUC: gp_cached_segworkers_thresholdValue: 5
GUC: gp_command_countValue: 6
GUC: gp_connections_per_threadValue: 64
GUC: gp_debug_lingerValue: 0
GUC: gp_dynamic_partition_pruningValue: on
GUC: gp_email_connect_avoid_durationValue: 7200
GUC: gp_email_connect_failuresValue: 5
GUC: gp_email_connect_timeoutValue: 15
GUC: gp_email_fromValue: 
GUC: gp_email_smtp_passwordValue: 
GUC: gp_email_smtp_serverValue: localhost:25
GUC: gp_email_smtp_useridValue: 
GUC: gp_email_toValue: 
GUC: gp_enable_agg_distinctValue: on
GUC: gp_enable_agg_distinct_pruningValue: on
GUC: gp_enable_direct_dispatchValue: on
GUC: gp_enable_fallback_planValue: on
GUC: gp_enable_fast_sriValue: on
GUC: gp_enable_gpperfmonValue: off
GUC: gp_enable_groupext_distinct_gatherValue: on
GUC: gp_enable_groupext_distinct_pruningValue: on
GUC: gp_enable_multiphase_aggValue: on
GUC: gp_enable_predicate_propagationValue: on
GUC: gp_enable_preuniqueValue: on
GUC: gp_enable_sequential_window_plansValue: on
GUC: gp_enable_sort_distinctValue: on
GUC: gp_enable_sort_limitValue: on
GUC: gp_external_enable_execValue: on
GUC: gp_external_grant_privilegesValue: off
GUC: gp_external_max_segsValue: 64
GUC: gp_filerep_tcp_keepalives_countValue: 2
GUC: gp_filerep_tcp_keepalives_idleValue: 60
GUC: gp_filerep_tcp_keepalives_intervalValue: 30
GUC: gp_force_use_default_temporary_directoryValue: off
GUC: gp_gpperfmon_send_intervalValue: 1
GUC: gp_hashjoin_tuples_per_bucketValue: 5
GUC: gp_idf_deduplicateValue: auto
GUC: gp_interconnect_cache_future_packetsValue: on
GUC: gp_interconnect_default_rttValue: 20
GUC: gp_interconnect_fc_methodValue: LOSS
GUC: gp_interconnect_hash_multiplierValue: 2
GUC: gp_interconnect_min_retries_before_timeoutValue: 100
GUC: gp_interconnect_min_rtoValue: 20
GUC: gp_interconnect_queue_depthValue: 4
GUC: gp_interconnect_setup_timeoutValue: 7200
GUC: gp_interconnect_snd_queue_depthValue: 2
GUC: gp_interconnect_timer_checking_periodValue: 20
GUC: gp_interconnect_timer_periodValue: 5
GUC: gp_interconnect_transmit_timeoutValue: 3600
GUC: gp_interconnect_typeValue: UDP
GUC: gp_log_formatValue: csv
GUC: gp_max_csv_line_lengthValue: 1048576
GUC: gp_max_databasesValue: 16
GUC: gp_max_filespacesValue: 8
GUC: gp_max_packet_sizeValue: 8192
GUC: gp_max_partition_levelValue: 0
GUC: gp_max_plan_sizeValue: 0
GUC: gp_max_relationsValue: 65536
GUC: gp_max_tablespacesValue: 16
GUC: gp_motion_cost_per_rowValue: 0
GUC: gp_num_contents_in_clusterValue: -10000
GUC: gp_query_context_mem_limitValue: 102400
GUC: gp_reject_percent_thresholdValue: 300
GUC: gp_reraise_signalValue: on
GUC: gp_roleValue: utility
GUC: gp_safefswritesizeValue: 0
GUC: gp_segment_connect_timeoutValue: 600
GUC: gp_segments_for_plannerValue: 0
GUC: gp_session_idValue: -1
GUC: gp_set_proc_affinityValue: off
GUC: gp_snmp_communityValue: public
GUC: gp_snmp_monitor_addressValue: 
GUC: gp_snmp_use_inform_or_trapValue: trap
GUC: gp_statistics_pullup_from_child_partitionValue: on
GUC: gp_statistics_use_fkeysValue: on
GUC: gp_subtrans_warn_limitValue: 16777216
GUC: gp_temporary_directory_mark_errorValue: 0
GUC: gp_udp_bufsize_kValue: 0
GUC: gp_vmem_protect_segworker_cache_limitValue: 500
GUC: gp_workfile_checksummingValue: on
GUC: gp_workfile_compress_algorithmValue: none
GUC: gp_workfile_limit_per_queryValue: 0
GUC: gp_workfile_limit_per_segmentValue: 0
GUC: gpperfmon_portValue: 8888
GUC: hawq_global_rm_typeValue: yarn
GUC: hawq_master_address_hostValue: master.bigdata
GUC: hawq_master_address_portValue: 6432
GUC: hawq_master_temp_directoryValue: /tmp
GUC: hawq_metadata_cache_block_capacityValue: 2097152
GUC: hawq_metadata_cache_check_intervalValue: 30
GUC: hawq_metadata_cache_flush_ratioValue: 0.85
GUC: hawq_metadata_cache_free_block_max_ratioValue: 0.05
GUC: hawq_metadata_cache_free_block_normal_ratioValue: 0.2
GUC: hawq_metadata_cache_max_hdfs_file_numValue: 524288
GUC: hawq_metadata_cache_reduce_ratioValue: 0.7
GUC: hawq_metadata_cache_refresh_intervalValue: 3600
GUC: hawq_metadata_cache_refresh_max_numValue: 1000
GUC: hawq_metadata_cache_refresh_timeoutValue: 3600
GUC: hawq_re_cgroup_hierarchy_nameValue: hadoop-yarn
GUC: hawq_re_cgroup_mount_pointValue: /sys/fs/cgroup
GUC: hawq_re_cpu_enableValue: off
GUC: hawq_re_cpu_weightValue: 1024
GUC: hawq_re_memory_overcommit_maxValue: 8192
GUC: hawq_re_vcore_pcore_ratioValue: 1
GUC: hawq_rm_cluster_report_periodValue: 60
GUC: hawq_rm_clusterratio_core_to_memorygb_factorValue: 5
GUC: hawq_rm_connpool_sameaddr_buffersizeValue: 2
GUC: hawq_rm_container_batch_limitValue: 1000
GUC: hawq_rm_enable_connpoolValue: on
GUC: hawq_rm_force_alterqueue_cancel_queued_requestValue: on
GUC: hawq_rm_force_fifo_queuingValue: on
GUC: hawq_rm_master_domain_portValue: 5436
GUC: hawq_rm_master_portValue: 5437
GUC: hawq_rm_memory_limit_persegValue: 480GB
GUC: hawq_rm_min_resource_persegValue: 2
GUC: hawq_rm_nocluster_timeoutValue: 60
GUC: hawq_rm_nresqueue_limitValue: 128
GUC: hawq_rm_nslice_perseg_limitValue: 5000
GUC: hawq_rm_nvcore_limit_persegValue: 16
GUC: hawq_rm_nvseg_for_analyze_nopart_perquery_limitValue: 512
GUC: hawq_rm_nvseg_for_analyze_nopart_perquery_perseg_limitValue: 8
GUC: hawq_rm_nvseg_for_analyze_part_perquery_limitValue: 256
GUC: hawq_rm_nvseg_for_analyze_part_perquery_perseg_limitValue: 4
GUC: hawq_rm_nvseg_for_copy_from_perqueryValue: 6
GUC: hawq_rm_nvseg_perquery_limitValue: 512
GUC: hawq_rm_nvseg_perquery_perseg_limitValue: 6
GUC: hawq_rm_nvseg_variance_amon_seg_limitValue: 1
GUC: hawq_rm_nvseg_variance_amon_seg_respool_limitValue: 2
GUC: hawq_rm_regularize_io_factorValue: 1
GUC: hawq_rm_regularize_io_maxValue: 1.37439e+11
GUC: hawq_rm_regularize_nvseg_factorValue: 1
GUC: hawq_rm_regularize_nvseg_maxValue: 300
GUC: hawq_rm_regularize_usage_factorValue: 1
GUC: hawq_rm_rejectrequest_nseg_limitValue: 0.25
GUC: hawq_rm_request_timeoutcheck_intervalValue: 1
GUC: hawq_rm_resource_allocation_timeoutValue: 600
GUC: hawq_rm_resource_idle_timeoutValue: 300
GUC: hawq_rm_respool_test_fileValue: 
GUC: hawq_rm_return_percent_on_overcommitValue: 10
GUC: hawq_rm_segment_config_refresh_intervalValue: 30
GUC: hawq_rm_segment_heartbeat_intervalValue: 30
GUC: hawq_rm_segment_heartbeat_timeoutValue: 300
GUC: hawq_rm_segment_portValue: 5438
GUC: hawq_rm_segment_tmpdir_detect_intervalValue: 300
GUC: hawq_rm_session_lease_heartbeat_enableValue: on
GUC: hawq_rm_session_lease_heartbeat_intervalValue: 10
GUC: hawq_rm_session_lease_timeoutValue: 180
GUC: hawq_rm_stmt_nvsegValue: 0
GUC: hawq_rm_stmt_vseg_memoryValue: 128mb
GUC: hawq_rm_tolerate_nseg_limitValue: 0.25
GUC: hawq_rm_yarn_addressValue: worker1.bigdata:8050
GUC: hawq_rm_yarn_app_nameValue: hawq
GUC: hawq_rm_yarn_queue_nameValue: default
GUC: hawq_rm_yarn_scheduler_addressValue: worker1.bigdata:8030
GUC: hawq_segment_address_portValue: 40000
GUC: hawq_segment_history_keep_periodValue: 365
GUC: hawq_segment_temp_directoryValue: /tmp
GUC: integer_datetimesValue: on
GUC: IntervalStyleValue: postgres
GUC: join_collapse_limitValue: 20
GUC: krb5_ccnameValue: /tmp/postgres.ccname
GUC: krb_caseins_usersValue: off
GUC: krb_server_keyfileValue: FILE:/data/pulse2-agent/agents/agent1/work/HAWQ-main-opt/rhel5_x86_64/src/hawq-db-dist/etc/krb5.keytab
GUC: krb_srvnameValue: postgres
GUC: lc_collateValue: C
GUC: lc_ctypeValue: C
GUC: lc_messagesValue: en_US.utf8
GUC: lc_monetaryValue: en_US.utf8
GUC: lc_numericValue: en_US.utf8
GUC: lc_timeValue: en_US.utf8
GUC: listen_addressesValue: *
GUC: local_preload_librariesValue: 
GUC: log_autostatsValue: off
GUC: log_connectionsValue: off
GUC: log_disconnectionsValue: off
GUC: log_dispatch_statsValue: off
GUC: log_durationValue: off
GUC: log_error_verbosityValue: default
GUC: log_executor_statsValue: off
GUC: log_hostnameValue: off
GUC: log_min_duration_statementValue: -1
GUC: log_min_error_statementValue: error
GUC: log_min_messagesValue: warning
GUC: log_parser_statsValue: off
GUC: log_planner_statsValue: off
GUC: log_rotation_ageValue: 1440
GUC: log_rotation_sizeValue: 0
GUC: log_statementValue: none
GUC: log_statement_statsValue: off
GUC: log_timezoneValue: PRC
GUC: log_truncate_on_rotationValue: off
GUC: maintenance_work_memValue: 65536
GUC: master_directoryValue: 
GUC: max_appendonly_segfilesValue: 262144
GUC: max_appendonly_tablesValue: 10000
GUC: max_connectionsValue: 1280
GUC: max_files_per_processValue: 150
GUC: max_fsm_pagesValue: 200000
GUC: max_fsm_relationsValue: 1000
GUC: max_function_argsValue: 100
GUC: max_identifier_lengthValue: 63
GUC: max_index_keysValue: 32
GUC: max_locks_per_transactionValue: 128
GUC: max_prepared_transactionsValue: 250
GUC: max_stack_depthValue: 2048
GUC: max_work_memValue: 1024000
GUC: metadata_cache_testfileValue: 
GUC: optimizerValue: on
GUC: optimizer_analyze_root_partitionValue: on
GUC: optimizer_minidumpValue: onerror
GUC: optimizer_parts_to_force_sort_on_insertValue: 160
GUC: password_encryptionValue: on
GUC: password_hash_algorithmValue: MD5
GUC: pljava_classpathValue: 
GUC: pljava_release_lingering_savepointsValue: off
GUC: pljava_statement_cache_sizeValue: 0
GUC: pljava_vmoptionsValue: 
GUC: portValue: 6432
GUC: pxf_enable_filter_pushdownValue: on
GUC: pxf_enable_locality_optimizationsValue: on
GUC: pxf_enable_stat_collectionValue: on
GUC: pxf_remote_service_loginValue: 
GUC: pxf_remote_service_secretValue: 
GUC: pxf_service_addressValue: localhost:51200
GUC: pxf_stat_max_fragmentsValue: 100
GUC: random_page_costValue: 100
GUC: regex_flavorValue: advanced
GUC: runaway_detector_activation_percentValue: 95
GUC: search_pathValue: "$user",public
GUC: seg_max_connectionsValue: 3000
GUC: segment_directoryValue: 
GUC: seq_page_costValue: 1
GUC: server_encodingValue: UTF8
GUC: server_ticket_renew_intervalValue: 43200000
GUC: server_versionValue: 8.2.15
GUC: server_version_numValue: 80215
GUC: shared_buffersValue: 4000
GUC: shared_preload_librariesValue: 
GUC: sslValue: off
GUC: ssl_ciphersValue: ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
GUC: standard_conforming_stringsValue: off
GUC: standby_address_hostValue: localhost
GUC: statement_timeoutValue: 0
GUC: superuser_reserved_connectionsValue: 3
GUC: tcp_keepalives_countValue: 9
GUC: tcp_keepalives_idleValue: 7200
GUC: tcp_keepalives_intervalValue: 75
GUC: temp_buffersValue: 1024
GUC: TimeZoneValue: PRC
GUC: timezone_abbreviationsValue: Default
GUC: track_activitiesValue: on
GUC: track_countsValue: off
GUC: transaction_isolationValue: read committed
GUC: transaction_read_onlyValue: off
GUC: transform_null_equalsValue: off
GUC: unix_socket_directoryValue: 
GUC: unix_socket_groupValue: 
GUC: unix_socket_permissionsValue: 511
GUC: update_process_titleValue: on
GUC: vacuum_cost_delayValue: 0
GUC: vacuum_cost_limitValue: 200
GUC: vacuum_cost_page_dirtyValue: 20
GUC: vacuum_cost_page_missValue: 10
GUC: vacuum_freeze_min_ageValue: 100000000
GUC: work_memValue: 51200






At 2016-09-21 13:00:41, "Vineet Goel" <vv...@apache.org> wrote:

Could you please post your SQL DDL statement? How many URLs do you have in your external table? Also, your HASH dist table - how many buckets are defined, if any? Are the # of URLs more than the # of buckets or default_hash_table_bucket_number value? Perhaps you can attach your hawq-site.xml file as well.


Also see: 
http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html



Thanks
Vineet



On Tue, Sep 20, 2016 at 7:07 PM 来熊 <yi...@163.com> wrote:

Hi,all:
    I am testing hawq 2.0.0 , and I find a problem like this:
 I load data from an external table (created using "like target_table" statement) ,
if the target table was distributed by some column(s), it raise this error:
 External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)
if the target table was distributed randomly, it works well,
I don't set any parameter special,does anybody know how to resolve this problem?
thanks a lot.

Re: Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)

Posted by Lirong Jian <ji...@gmail.com>.
As it is known, the number of segments should be equal to or greater than
the number of URLs in gpfdist external tables. This is a legacy issue
inherited from GPDB. We encountered this issue too when working on a cloud
data warehouse based on GPDB. We finally needed to find a workaround to
address this issue.

This restriction maybe be fine with GPDB, since the number of segments is
fixed once the system is initialized. However, IMO, this restriction looks
an overkill with HAWQ, since one of the significant features of HAWQ is its
elastic execution runtime. The number of segments (virtual segments) to run
a query is totally dynamic, while the definition of an external table is
usually static.

I spent some time checking the code related to this restriction, and found
that logic of assigning URLs to segments requires one segment taking at
most one URL.

In HAWQ, Can we eliminate this restriction by relaxing the URL-to-Segment
assignment logic? Looking forwards to your guys' comment and advice.

Lirong

Lirong Jian
HashData Inc.

2016-09-23 8:01 GMT+08:00 Hubert Zhang <hz...@pivotal.io>:

> +1 with Vineet's comment.
> in your statement "insert into call_center select * from ext_call_center;"
> There are two strict restriction of how many virtual segments need to be
> started:
> 1 the bucket number of hash distributed result relation call_center (must
> equal to #vseg)
> 2 the number of locations in  gpfdist ext table ext_call_center (<= #vseg)
>
> You must set the the bucket number of call_center bigger than the the
> number of locations in ext_call_center
>
> Thanks
> Hubert
>
>
> On Thu, Sep 22, 2016 at 1:19 AM, Luis Macedo <lm...@pivotal.io> wrote:
>
>> Also on your location clause you should not reference the same file more
>> than one time.
>>
>> If you want to scale gpfdist process you need to use different range on
>> each port for a same server. (Not sure if I explain myself :))
>>
>> If you use one gpfdist per server performance should be fine. One gpfdist
>> process usually can do 250MB/s if underlying infra allows.
>>
>> Rgds
>>
>> --- Sent from my Nexus 5x
>>
>> Em 21 de set de 2016 12:14 PM, "Vineet Goel" <vv...@apache.org>
>> escreveu:
>>
>>> Your default_hash_table_bucket_number value is set to 6. Typically,
>>> this should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments,
>>> you should set this value to 18. Any time you change this parameter, you
>>> should redistribute your HASH distributed tables, if you have any (unless
>>> the table DDL has # of buckets defined, I think).
>>>
>>> Increase default_hash_table_bucket_number to 18 and retry the insert.
>>> Since you have 18 ext table URLs, it should work with the change.
>>>
>>> Thanks
>>> Vineet
>>>
>>>
>>> On Wed, Sep 21, 2016 at 12:26 AM 来熊 <yi...@163.com> wrote:
>>>
>>>>
>>>> My environment is >>>> : 1 master 3 segments
>>>> SQL >>>>>>:
>>>> CREATE TABLE call_center (
>>>>     cc_call_center_sk integer,
>>>>     cc_call_center_id character varying(16),
>>>>     cc_rec_start_date date,
>>>>     cc_rec_end_date date,
>>>>     cc_closed_date_sk integer,
>>>>     cc_open_date_sk integer,
>>>>     cc_name character varying(50),
>>>>     cc_class character varying(50),
>>>>     cc_employees integer,
>>>>     cc_sq_ft integer,
>>>>     cc_hours character varying(20),
>>>>     cc_manager character varying(40),
>>>>     cc_mkt_id integer,
>>>>     cc_mkt_class character varying(50),
>>>>     cc_mkt_desc character varying(100),
>>>>     cc_market_manager character varying(40),
>>>>     cc_division text,
>>>>     cc_division_name character varying(50),
>>>>     cc_company text,
>>>>     cc_company_name character varying(50),
>>>>     cc_street_number character varying(10),
>>>>     cc_street_name character varying(60),
>>>>     cc_street_type character varying(15),
>>>>     cc_suite_number character varying(10),
>>>>     cc_city character varying(60),
>>>>     cc_county character varying(30),
>>>>     cc_state text,
>>>>     cc_zip character varying(10),
>>>>     cc_country character varying(20),
>>>>     cc_gmt_offset numeric(5,2),
>>>>     cc_tax_percentage numeric(5,2)
>>>> )
>>>> WITH (appendonly=true, orientation=parquet)
>>>> DISTRIBUTED BY (cc_call_center_sk);
>>>>
>>>> CREATE EXTERNAL TABLE ext_call_center (like call_center)
>>>> LOCATION (
>>>> 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
>>>> FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');
>>>>
>>>> insert into call_center select * from ext_call_center;
>>>>
>>>> ERROR:  External scan error: There are more external files (URLs) than
>>>> primary segments that can read them (COptTasks.cpp:1756)
>>>>
>>>> hawq config >>>>:
>>>>
>>>> GUC : add_missing_from Value : off
>>>> GUC : application_name Value :
>>>> GUC : array_nulls Value : on
>>>> GUC : authentication_timeout Value : 60
>>>> GUC : backslash_quote Value : safe_encoding
>>>> GUC : block_size Value : 32768
>>>> GUC : bonjour_name Value :
>>>> GUC : check_function_bodies Value : on
>>>> GUC : client_encoding Value : UTF8
>>>> GUC : client_min_messages Value : ERROR
>>>> GUC : cpu_index_tuple_cost Value : 0.005
>>>> GUC : cpu_operator_cost Value : 0.0025
>>>> GUC : cpu_tuple_cost Value : 0.01
>>>> GUC : cursor_tuple_fraction Value : 1
>>>> GUC : custom_variable_classes Value :
>>>> GUC : DateStyle Value : ISO, MDY
>>>> GUC : db_user_namespace Value : off
>>>> GUC : deadlock_timeout Value : 1000
>>>> GUC : debug_assertions Value : off
>>>> GUC : debug_pretty_print Value : off
>>>> GUC : debug_print_parse Value : off
>>>> GUC : debug_print_plan Value : off
>>>> GUC : debug_print_prelim_plan Value : off
>>>> GUC : debug_print_rewritten Value : off
>>>> GUC : debug_print_slice_table Value : off
>>>> GUC : default_hash_table_bucket_number Value : 6
>>>> GUC : default_statement_mem Value : 128000
>>>> GUC : default_statistics_target Value : 25
>>>> GUC : default_tablespace Value :
>>>> GUC : default_transaction_isolation Value : read committed
>>>> GUC : default_transaction_read_only Value : off
>>>> GUC : dfs_url Value : localhost:8020/hawq
>>>> GUC : dynamic_library_path Value : $libdir
>>>> GUC : effective_cache_size Value : 16384
>>>> GUC : enable_bitmapscan Value : on
>>>> GUC : enable_groupagg Value : on
>>>> GUC : enable_hashagg Value : on
>>>> GUC : enable_hashjoin Value : on
>>>> GUC : enable_indexscan Value : on
>>>> GUC : enable_mergejoin Value : off
>>>> GUC : enable_nestloop Value : off
>>>> GUC : enable_secure_filesystem Value : off
>>>> GUC : enable_seqscan Value : on
>>>> GUC : enable_sort Value : on
>>>> GUC : enable_tidscan Value : on
>>>> GUC : escape_string_warning Value : on
>>>> GUC : explain_memory_verbosity Value : suppress
>>>> GUC : explain_pretty_print Value : on
>>>> GUC : extra_float_digits Value : 0
>>>> GUC : filesystem_support_truncate Value : on
>>>> GUC : from_collapse_limit Value : 20
>>>> GUC : gp_adjust_selectivity_for_outerjoins Value : on
>>>> GUC : gp_analyze_relative_error Value : 0.25
>>>> GUC : gp_autostats_mode Value : NONE
>>>> GUC : gp_autostats_on_change_threshold Value : 2147483647
>>>> GUC : gp_backup_directIO Value : off
>>>> GUC : gp_backup_directIO_read_chunk_mb Value : 20
>>>> GUC : gp_cached_segworkers_threshold Value : 5
>>>> GUC : gp_command_count Value : 6
>>>> GUC : gp_connections_per_thread Value : 64
>>>> GUC : gp_debug_linger Value : 0
>>>> GUC : gp_dynamic_partition_pruning Value : on
>>>> GUC : gp_email_connect_avoid_duration Value : 7200
>>>> GUC : gp_email_connect_failures Value : 5
>>>> GUC : gp_email_connect_timeout Value : 15
>>>> GUC : gp_email_from Value :
>>>> GUC : gp_email_smtp_password Value :
>>>> GUC : gp_email_smtp_server Value : localhost:25
>>>> GUC : gp_email_smtp_userid Value :
>>>> GUC : gp_email_to Value :
>>>> GUC : gp_enable_agg_distinct Value : on
>>>> GUC : gp_enable_agg_distinct_pruning Value : on
>>>> GUC : gp_enable_direct_dispatch Value : on
>>>> GUC : gp_enable_fallback_plan Value : on
>>>> GUC : gp_enable_fast_sri Value : on
>>>> GUC : gp_enable_gpperfmon Value : off
>>>> GUC : gp_enable_groupext_distinct_gather Value : on
>>>> GUC : gp_enable_groupext_distinct_pruning Value : on
>>>> GUC : gp_enable_multiphase_agg Value : on
>>>> GUC : gp_enable_predicate_propagation Value : on
>>>> GUC : gp_enable_preunique Value : on
>>>> GUC : gp_enable_sequential_window_plans Value : on
>>>> GUC : gp_enable_sort_distinct Value : on
>>>> GUC : gp_enable_sort_limit Value : on
>>>> GUC : gp_external_enable_exec Value : on
>>>> GUC : gp_external_grant_privileges Value : off
>>>> GUC : gp_external_max_segs Value : 64
>>>> GUC : gp_filerep_tcp_keepalives_count Value : 2
>>>> GUC : gp_filerep_tcp_keepalives_idle Value : 60
>>>> GUC : gp_filerep_tcp_keepalives_interval Value : 30
>>>> GUC : gp_force_use_default_temporary_directory Value : off
>>>> GUC : gp_gpperfmon_send_interval Value : 1
>>>> GUC : gp_hashjoin_tuples_per_bucket Value : 5
>>>> GUC : gp_idf_deduplicate Value : auto
>>>> GUC : gp_interconnect_cache_future_packets Value : on
>>>> GUC : gp_interconnect_default_rtt Value : 20
>>>> GUC : gp_interconnect_fc_method Value : LOSS
>>>> GUC : gp_interconnect_hash_multiplier Value : 2
>>>> GUC : gp_interconnect_min_retries_before_timeout Value : 100
>>>> GUC : gp_interconnect_min_rto Value : 20
>>>> GUC : gp_interconnect_queue_depth Value : 4
>>>> GUC : gp_interconnect_setup_timeout Value : 7200
>>>> GUC : gp_interconnect_snd_queue_depth Value : 2
>>>> GUC : gp_interconnect_timer_checking_period Value : 20
>>>> GUC : gp_interconnect_timer_period Value : 5
>>>> GUC : gp_interconnect_transmit_timeout Value : 3600
>>>> GUC : gp_interconnect_type Value : UDP
>>>> GUC : gp_log_format Value : csv
>>>> GUC : gp_max_csv_line_length Value : 1048576
>>>> GUC : gp_max_databases Value : 16
>>>> GUC : gp_max_filespaces Value : 8
>>>> GUC : gp_max_packet_size Value : 8192
>>>> GUC : gp_max_partition_level Value : 0
>>>> GUC : gp_max_plan_size Value : 0
>>>> GUC : gp_max_relations Value : 65536
>>>> GUC : gp_max_tablespaces Value : 16
>>>> GUC : gp_motion_cost_per_row Value : 0
>>>> GUC : gp_num_contents_in_cluster Value : -10000
>>>> GUC : gp_query_context_mem_limit Value : 102400
>>>> GUC : gp_reject_percent_threshold Value : 300
>>>> GUC : gp_reraise_signal Value : on
>>>> GUC : gp_role Value : utility
>>>> GUC : gp_safefswritesize Value : 0
>>>> GUC : gp_segment_connect_timeout Value : 600
>>>> GUC : gp_segments_for_planner Value : 0
>>>> GUC : gp_session_id Value : -1
>>>> GUC : gp_set_proc_affinity Value : off
>>>> GUC : gp_snmp_community Value : public
>>>> GUC : gp_snmp_monitor_address Value :
>>>> GUC : gp_snmp_use_inform_or_trap Value : trap
>>>> GUC : gp_statistics_pullup_from_child_partition Value : on
>>>> GUC : gp_statistics_use_fkeys Value : on
>>>> GUC : gp_subtrans_warn_limit Value : 16777216
>>>> GUC : gp_temporary_directory_mark_error Value : 0
>>>> GUC : gp_udp_bufsize_k Value : 0
>>>> GUC : gp_vmem_protect_segworker_cache_limit Value : 500
>>>> GUC : gp_workfile_checksumming Value : on
>>>> GUC : gp_workfile_compress_algorithm Value : none
>>>> GUC : gp_workfile_limit_per_query Value : 0
>>>> GUC : gp_workfile_limit_per_segment Value : 0
>>>> GUC : gpperfmon_port Value : 8888
>>>> GUC : hawq_global_rm_type Value : yarn
>>>> GUC : hawq_master_address_host Value : master.bigdata
>>>> GUC : hawq_master_address_port Value : 6432
>>>> GUC : hawq_master_temp_directory Value : /tmp
>>>> GUC : hawq_metadata_cache_block_capacity Value : 2097152
>>>> GUC : hawq_metadata_cache_check_interval Value : 30
>>>> GUC : hawq_metadata_cache_flush_ratio Value : 0.85
>>>> GUC : hawq_metadata_cache_free_block_max_ratio Value : 0.05
>>>> GUC : hawq_metadata_cache_free_block_normal_ratio Value : 0.2
>>>> GUC : hawq_metadata_cache_max_hdfs_file_num Value : 524288
>>>> GUC : hawq_metadata_cache_reduce_ratio Value : 0.7
>>>> GUC : hawq_metadata_cache_refresh_interval Value : 3600
>>>> GUC : hawq_metadata_cache_refresh_max_num Value : 1000
>>>> GUC : hawq_metadata_cache_refresh_timeout Value : 3600
>>>> GUC : hawq_re_cgroup_hierarchy_name Value : hadoop-yarn
>>>> GUC : hawq_re_cgroup_mount_point Value : /sys/fs/cgroup
>>>> GUC : hawq_re_cpu_enable Value : off
>>>> GUC : hawq_re_cpu_weight Value : 1024
>>>> GUC : hawq_re_memory_overcommit_max Value : 8192
>>>> GUC : hawq_re_vcore_pcore_ratio Value : 1
>>>> GUC : hawq_rm_cluster_report_period Value : 60
>>>> GUC : hawq_rm_clusterratio_core_to_memorygb_factor Value : 5
>>>> GUC : hawq_rm_connpool_sameaddr_buffersize Value : 2
>>>> GUC : hawq_rm_container_batch_limit Value : 1000
>>>> GUC : hawq_rm_enable_connpool Value : on
>>>> GUC : hawq_rm_force_alterqueue_cancel_queued_request Value : on
>>>> GUC : hawq_rm_force_fifo_queuing Value : on
>>>> GUC : hawq_rm_master_domain_port Value : 5436
>>>> GUC : hawq_rm_master_port Value : 5437
>>>> GUC : hawq_rm_memory_limit_perseg Value : 480GB
>>>> GUC : hawq_rm_min_resource_perseg Value : 2
>>>> GUC : hawq_rm_nocluster_timeout Value : 60
>>>> GUC : hawq_rm_nresqueue_limit Value : 128
>>>> GUC : hawq_rm_nslice_perseg_limit Value : 5000
>>>> GUC : hawq_rm_nvcore_limit_perseg Value : 16
>>>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_limit Value : 512
>>>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_perseg_limit Value : 8
>>>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_limit Value : 256
>>>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_perseg_limit Value : 4
>>>> GUC : hawq_rm_nvseg_for_copy_from_perquery Value : 6
>>>> GUC : hawq_rm_nvseg_perquery_limit Value : 512
>>>> GUC : hawq_rm_nvseg_perquery_perseg_limit Value : 6
>>>> GUC : hawq_rm_nvseg_variance_amon_seg_limit Value : 1
>>>> GUC : hawq_rm_nvseg_variance_amon_seg_respool_limit Value : 2
>>>> GUC : hawq_rm_regularize_io_factor Value : 1
>>>> GUC : hawq_rm_regularize_io_max Value : 1.37439e+11
>>>> GUC : hawq_rm_regularize_nvseg_factor Value : 1
>>>> GUC : hawq_rm_regularize_nvseg_max Value : 300
>>>> GUC : hawq_rm_regularize_usage_factor Value : 1
>>>> GUC : hawq_rm_rejectrequest_nseg_limit Value : 0.25
>>>> GUC : hawq_rm_request_timeoutcheck_interval Value : 1
>>>> GUC : hawq_rm_resource_allocation_timeout Value : 600
>>>> GUC : hawq_rm_resource_idle_timeout Value : 300
>>>> GUC : hawq_rm_respool_test_file Value :
>>>> GUC : hawq_rm_return_percent_on_overcommit Value : 10
>>>> GUC : hawq_rm_segment_config_refresh_interval Value : 30
>>>> GUC : hawq_rm_segment_heartbeat_interval Value : 30
>>>> GUC : hawq_rm_segment_heartbeat_timeout Value : 300
>>>> GUC : hawq_rm_segment_port Value : 5438
>>>> GUC : hawq_rm_segment_tmpdir_detect_interval Value : 300
>>>> GUC : hawq_rm_session_lease_heartbeat_enable Value : on
>>>> GUC : hawq_rm_session_lease_heartbeat_interval Value : 10
>>>> GUC : hawq_rm_session_lease_timeout Value : 180
>>>> GUC : hawq_rm_stmt_nvseg Value : 0
>>>> GUC : hawq_rm_stmt_vseg_memory Value : 128mb
>>>> GUC : hawq_rm_tolerate_nseg_limit Value : 0.25
>>>> GUC : hawq_rm_yarn_address Value : worker1.bigdata:8050
>>>> GUC : hawq_rm_yarn_app_name Value : hawq
>>>> GUC : hawq_rm_yarn_queue_name Value : default
>>>> GUC : hawq_rm_yarn_scheduler_address Value : worker1.bigdata:8030
>>>> GUC : hawq_segment_address_port Value : 40000
>>>> GUC : hawq_segment_history_keep_period Value : 365
>>>> GUC : hawq_segment_temp_directory Value : /tmp
>>>> GUC : integer_datetimes Value : on
>>>> GUC : IntervalStyle Value : postgres
>>>> GUC : join_collapse_limit Value : 20
>>>> GUC : krb5_ccname Value : /tmp/postgres.ccname
>>>> GUC : krb_caseins_users Value : off
>>>> GUC : krb_server_keyfile Value : FILE:/data/pulse2-agent/agents
>>>> /agent1/work/HAWQ-main-opt/rhel5_x86_64/src/hawq-db-dist/etc
>>>> /krb5.keytab
>>>> GUC : krb_srvname Value : postgres
>>>> GUC : lc_collate Value : C
>>>> GUC : lc_ctype Value : C
>>>> GUC : lc_messages Value : en_US.utf8
>>>> GUC : lc_monetary Value : en_US.utf8
>>>> GUC : lc_numeric Value : en_US.utf8
>>>> GUC : lc_time Value : en_US.utf8
>>>> GUC : listen_addresses Value : *
>>>> GUC : local_preload_libraries Value :
>>>> GUC : log_autostats Value : off
>>>> GUC : log_connections Value : off
>>>> GUC : log_disconnections Value : off
>>>> GUC : log_dispatch_stats Value : off
>>>> GUC : log_duration Value : off
>>>> GUC : log_error_verbosity Value : default
>>>> GUC : log_executor_stats Value : off
>>>> GUC : log_hostname Value : off
>>>> GUC : log_min_duration_statement Value : -1
>>>> GUC : log_min_error_statement Value : error
>>>> GUC : log_min_messages Value : warning
>>>> GUC : log_parser_stats Value : off
>>>> GUC : log_planner_stats Value : off
>>>> GUC : log_rotation_age Value : 1440
>>>> GUC : log_rotation_size Value : 0
>>>> GUC : log_statement Value : none
>>>> GUC : log_statement_stats Value : off
>>>> GUC : log_timezone Value : PRC
>>>> GUC : log_truncate_on_rotation Value : off
>>>> GUC : maintenance_work_mem Value : 65536
>>>> GUC : master_directory Value :
>>>> GUC : max_appendonly_segfiles Value : 262144
>>>> GUC : max_appendonly_tables Value : 10000
>>>> GUC : max_connections Value : 1280
>>>> GUC : max_files_per_process Value : 150
>>>> GUC : max_fsm_pages Value : 200000
>>>> GUC : max_fsm_relations Value : 1000
>>>> GUC : max_function_args Value : 100
>>>> GUC : max_identifier_length Value : 63
>>>> GUC : max_index_keys Value : 32
>>>> GUC : max_locks_per_transaction Value : 128
>>>> GUC : max_prepared_transactions Value : 250
>>>> GUC : max_stack_depth Value : 2048
>>>> GUC : max_work_mem Value : 1024000
>>>> GUC : metadata_cache_testfile Value :
>>>> GUC : optimizer Value : on
>>>> GUC : optimizer_analyze_root_partition Value : on
>>>> GUC : optimizer_minidump Value : onerror
>>>> GUC : optimizer_parts_to_force_sort_on_insert Value : 160
>>>> GUC : password_encryption Value : on
>>>> GUC : password_hash_algorithm Value : MD5
>>>> GUC : pljava_classpath Value :
>>>> GUC : pljava_release_lingering_savepoints Value : off
>>>> GUC : pljava_statement_cache_size Value : 0
>>>> GUC : pljava_vmoptions Value :
>>>> GUC : port Value : 6432
>>>> GUC : pxf_enable_filter_pushdown Value : on
>>>> GUC : pxf_enable_locality_optimizations Value : on
>>>> GUC : pxf_enable_stat_collection Value : on
>>>> GUC : pxf_remote_service_login Value :
>>>> GUC : pxf_remote_service_secret Value :
>>>> GUC : pxf_service_address Value : localhost:51200
>>>> GUC : pxf_stat_max_fragments Value : 100
>>>> GUC : random_page_cost Value : 100
>>>> GUC : regex_flavor Value : advanced
>>>> GUC : runaway_detector_activation_percent Value : 95
>>>> GUC : search_path Value : "$user",public
>>>> GUC : seg_max_connections Value : 3000
>>>> GUC : segment_directory Value :
>>>> GUC : seq_page_cost Value : 1
>>>> GUC : server_encoding Value : UTF8
>>>> GUC : server_ticket_renew_interval Value : 43200000
>>>> GUC : server_version Value : 8.2.15
>>>> GUC : server_version_num Value : 80215
>>>> GUC : shared_buffers Value : 4000
>>>> GUC : shared_preload_libraries Value :
>>>> GUC : ssl Value : off
>>>> GUC : ssl_ciphers Value : ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
>>>> GUC : standard_conforming_strings Value : off
>>>> GUC : standby_address_host Value : localhost
>>>> GUC : statement_timeout Value : 0
>>>> GUC : superuser_reserved_connections Value : 3
>>>> GUC : tcp_keepalives_count Value : 9
>>>> GUC : tcp_keepalives_idle Value : 7200
>>>> GUC : tcp_keepalives_interval Value : 75
>>>> GUC : temp_buffers Value : 1024
>>>> GUC : TimeZone Value : PRC
>>>> GUC : timezone_abbreviations Value : Default
>>>> GUC : track_activities Value : on
>>>> GUC : track_counts Value : off
>>>> GUC : transaction_isolation Value : read committed
>>>> GUC : transaction_read_only Value : off
>>>> GUC : transform_null_equals Value : off
>>>> GUC : unix_socket_directory Value :
>>>> GUC : unix_socket_group Value :
>>>> GUC : unix_socket_permissions Value : 511
>>>> GUC : update_process_title Value : on
>>>> GUC : vacuum_cost_delay Value : 0
>>>> GUC : vacuum_cost_limit Value : 200
>>>> GUC : vacuum_cost_page_dirty Value : 20
>>>> GUC : vacuum_cost_page_miss Value : 10
>>>> GUC : vacuum_freeze_min_age Value : 100000000
>>>> GUC : work_mem Value : 51200
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> At 2016-09-21 13:00:41, "Vineet Goel" <vv...@apache.org> wrote:
>>>>
>>>> Could you please post your SQL DDL statement? How many URLs do you have
>>>> in your external table? Also, your HASH dist table - how many buckets are
>>>> defined, if any? Are the # of URLs more than the # of buckets or
>>>> default_hash_table_bucket_number value? Perhaps you can attach your
>>>> hawq-site.xml file as well.
>>>>
>>>> Also see:
>>>> http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html
>>>>
>>>> Thanks
>>>> Vineet
>>>>
>>>>
>>>> On Tue, Sep 20, 2016 at 7:07 PM 来熊 <yi...@163.com> wrote:
>>>>
>>>>> Hi,all:
>>>>>     I am testing hawq 2.0.0 , and I find a problem like this:
>>>>>  I load data from an external table (created using "like target_table"
>>>>> statement) ,
>>>>> if the target table was distributed by some column(s), it raise this
>>>>> error:
>>>>>  External scan error: There are more external files (URLs) than
>>>>> primary segments that can read them (COptTasks.cpp:1756)
>>>>> if the target table was distributed randomly, it works well,
>>>>> I don't set any parameter special,does anybody know how to resolve
>>>>> this problem?
>>>>> thanks a lot.
>>>>>
>>>>
>
>
> --
> Thanks
>
> Hubert Zhang
>

Re: Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)

Posted by Lirong Jian <ji...@gmail.com>.
As it is known, the number of segments should be equal to or greater than
the number of URLs in gpfdist external tables. This is a legacy issue
inherited from GPDB. We encountered this issue too when working on a cloud
data warehouse based on GPDB. We finally needed to find a workaround to
address this issue.

This restriction maybe be fine with GPDB, since the number of segments is
fixed once the system is initialized. However, IMO, this restriction looks
an overkill with HAWQ, since one of the significant features of HAWQ is its
elastic execution runtime. The number of segments (virtual segments) to run
a query is totally dynamic, while the definition of an external table is
usually static.

I spent some time checking the code related to this restriction, and found
that logic of assigning URLs to segments requires one segment taking at
most one URL.

In HAWQ, Can we eliminate this restriction by relaxing the URL-to-Segment
assignment logic? Looking forwards to your guys' comment and advice.

Lirong

Lirong Jian
HashData Inc.

2016-09-23 8:01 GMT+08:00 Hubert Zhang <hz...@pivotal.io>:

> +1 with Vineet's comment.
> in your statement "insert into call_center select * from ext_call_center;"
> There are two strict restriction of how many virtual segments need to be
> started:
> 1 the bucket number of hash distributed result relation call_center (must
> equal to #vseg)
> 2 the number of locations in  gpfdist ext table ext_call_center (<= #vseg)
>
> You must set the the bucket number of call_center bigger than the the
> number of locations in ext_call_center
>
> Thanks
> Hubert
>
>
> On Thu, Sep 22, 2016 at 1:19 AM, Luis Macedo <lm...@pivotal.io> wrote:
>
>> Also on your location clause you should not reference the same file more
>> than one time.
>>
>> If you want to scale gpfdist process you need to use different range on
>> each port for a same server. (Not sure if I explain myself :))
>>
>> If you use one gpfdist per server performance should be fine. One gpfdist
>> process usually can do 250MB/s if underlying infra allows.
>>
>> Rgds
>>
>> --- Sent from my Nexus 5x
>>
>> Em 21 de set de 2016 12:14 PM, "Vineet Goel" <vv...@apache.org>
>> escreveu:
>>
>>> Your default_hash_table_bucket_number value is set to 6. Typically,
>>> this should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments,
>>> you should set this value to 18. Any time you change this parameter, you
>>> should redistribute your HASH distributed tables, if you have any (unless
>>> the table DDL has # of buckets defined, I think).
>>>
>>> Increase default_hash_table_bucket_number to 18 and retry the insert.
>>> Since you have 18 ext table URLs, it should work with the change.
>>>
>>> Thanks
>>> Vineet
>>>
>>>
>>> On Wed, Sep 21, 2016 at 12:26 AM 来熊 <yi...@163.com> wrote:
>>>
>>>>
>>>> My environment is >>>> : 1 master 3 segments
>>>> SQL >>>>>>:
>>>> CREATE TABLE call_center (
>>>>     cc_call_center_sk integer,
>>>>     cc_call_center_id character varying(16),
>>>>     cc_rec_start_date date,
>>>>     cc_rec_end_date date,
>>>>     cc_closed_date_sk integer,
>>>>     cc_open_date_sk integer,
>>>>     cc_name character varying(50),
>>>>     cc_class character varying(50),
>>>>     cc_employees integer,
>>>>     cc_sq_ft integer,
>>>>     cc_hours character varying(20),
>>>>     cc_manager character varying(40),
>>>>     cc_mkt_id integer,
>>>>     cc_mkt_class character varying(50),
>>>>     cc_mkt_desc character varying(100),
>>>>     cc_market_manager character varying(40),
>>>>     cc_division text,
>>>>     cc_division_name character varying(50),
>>>>     cc_company text,
>>>>     cc_company_name character varying(50),
>>>>     cc_street_number character varying(10),
>>>>     cc_street_name character varying(60),
>>>>     cc_street_type character varying(15),
>>>>     cc_suite_number character varying(10),
>>>>     cc_city character varying(60),
>>>>     cc_county character varying(30),
>>>>     cc_state text,
>>>>     cc_zip character varying(10),
>>>>     cc_country character varying(20),
>>>>     cc_gmt_offset numeric(5,2),
>>>>     cc_tax_percentage numeric(5,2)
>>>> )
>>>> WITH (appendonly=true, orientation=parquet)
>>>> DISTRIBUTED BY (cc_call_center_sk);
>>>>
>>>> CREATE EXTERNAL TABLE ext_call_center (like call_center)
>>>> LOCATION (
>>>> 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat',
>>>> 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
>>>> FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');
>>>>
>>>> insert into call_center select * from ext_call_center;
>>>>
>>>> ERROR:  External scan error: There are more external files (URLs) than
>>>> primary segments that can read them (COptTasks.cpp:1756)
>>>>
>>>> hawq config >>>>:
>>>>
>>>> GUC : add_missing_from Value : off
>>>> GUC : application_name Value :
>>>> GUC : array_nulls Value : on
>>>> GUC : authentication_timeout Value : 60
>>>> GUC : backslash_quote Value : safe_encoding
>>>> GUC : block_size Value : 32768
>>>> GUC : bonjour_name Value :
>>>> GUC : check_function_bodies Value : on
>>>> GUC : client_encoding Value : UTF8
>>>> GUC : client_min_messages Value : ERROR
>>>> GUC : cpu_index_tuple_cost Value : 0.005
>>>> GUC : cpu_operator_cost Value : 0.0025
>>>> GUC : cpu_tuple_cost Value : 0.01
>>>> GUC : cursor_tuple_fraction Value : 1
>>>> GUC : custom_variable_classes Value :
>>>> GUC : DateStyle Value : ISO, MDY
>>>> GUC : db_user_namespace Value : off
>>>> GUC : deadlock_timeout Value : 1000
>>>> GUC : debug_assertions Value : off
>>>> GUC : debug_pretty_print Value : off
>>>> GUC : debug_print_parse Value : off
>>>> GUC : debug_print_plan Value : off
>>>> GUC : debug_print_prelim_plan Value : off
>>>> GUC : debug_print_rewritten Value : off
>>>> GUC : debug_print_slice_table Value : off
>>>> GUC : default_hash_table_bucket_number Value : 6
>>>> GUC : default_statement_mem Value : 128000
>>>> GUC : default_statistics_target Value : 25
>>>> GUC : default_tablespace Value :
>>>> GUC : default_transaction_isolation Value : read committed
>>>> GUC : default_transaction_read_only Value : off
>>>> GUC : dfs_url Value : localhost:8020/hawq
>>>> GUC : dynamic_library_path Value : $libdir
>>>> GUC : effective_cache_size Value : 16384
>>>> GUC : enable_bitmapscan Value : on
>>>> GUC : enable_groupagg Value : on
>>>> GUC : enable_hashagg Value : on
>>>> GUC : enable_hashjoin Value : on
>>>> GUC : enable_indexscan Value : on
>>>> GUC : enable_mergejoin Value : off
>>>> GUC : enable_nestloop Value : off
>>>> GUC : enable_secure_filesystem Value : off
>>>> GUC : enable_seqscan Value : on
>>>> GUC : enable_sort Value : on
>>>> GUC : enable_tidscan Value : on
>>>> GUC : escape_string_warning Value : on
>>>> GUC : explain_memory_verbosity Value : suppress
>>>> GUC : explain_pretty_print Value : on
>>>> GUC : extra_float_digits Value : 0
>>>> GUC : filesystem_support_truncate Value : on
>>>> GUC : from_collapse_limit Value : 20
>>>> GUC : gp_adjust_selectivity_for_outerjoins Value : on
>>>> GUC : gp_analyze_relative_error Value : 0.25
>>>> GUC : gp_autostats_mode Value : NONE
>>>> GUC : gp_autostats_on_change_threshold Value : 2147483647
>>>> GUC : gp_backup_directIO Value : off
>>>> GUC : gp_backup_directIO_read_chunk_mb Value : 20
>>>> GUC : gp_cached_segworkers_threshold Value : 5
>>>> GUC : gp_command_count Value : 6
>>>> GUC : gp_connections_per_thread Value : 64
>>>> GUC : gp_debug_linger Value : 0
>>>> GUC : gp_dynamic_partition_pruning Value : on
>>>> GUC : gp_email_connect_avoid_duration Value : 7200
>>>> GUC : gp_email_connect_failures Value : 5
>>>> GUC : gp_email_connect_timeout Value : 15
>>>> GUC : gp_email_from Value :
>>>> GUC : gp_email_smtp_password Value :
>>>> GUC : gp_email_smtp_server Value : localhost:25
>>>> GUC : gp_email_smtp_userid Value :
>>>> GUC : gp_email_to Value :
>>>> GUC : gp_enable_agg_distinct Value : on
>>>> GUC : gp_enable_agg_distinct_pruning Value : on
>>>> GUC : gp_enable_direct_dispatch Value : on
>>>> GUC : gp_enable_fallback_plan Value : on
>>>> GUC : gp_enable_fast_sri Value : on
>>>> GUC : gp_enable_gpperfmon Value : off
>>>> GUC : gp_enable_groupext_distinct_gather Value : on
>>>> GUC : gp_enable_groupext_distinct_pruning Value : on
>>>> GUC : gp_enable_multiphase_agg Value : on
>>>> GUC : gp_enable_predicate_propagation Value : on
>>>> GUC : gp_enable_preunique Value : on
>>>> GUC : gp_enable_sequential_window_plans Value : on
>>>> GUC : gp_enable_sort_distinct Value : on
>>>> GUC : gp_enable_sort_limit Value : on
>>>> GUC : gp_external_enable_exec Value : on
>>>> GUC : gp_external_grant_privileges Value : off
>>>> GUC : gp_external_max_segs Value : 64
>>>> GUC : gp_filerep_tcp_keepalives_count Value : 2
>>>> GUC : gp_filerep_tcp_keepalives_idle Value : 60
>>>> GUC : gp_filerep_tcp_keepalives_interval Value : 30
>>>> GUC : gp_force_use_default_temporary_directory Value : off
>>>> GUC : gp_gpperfmon_send_interval Value : 1
>>>> GUC : gp_hashjoin_tuples_per_bucket Value : 5
>>>> GUC : gp_idf_deduplicate Value : auto
>>>> GUC : gp_interconnect_cache_future_packets Value : on
>>>> GUC : gp_interconnect_default_rtt Value : 20
>>>> GUC : gp_interconnect_fc_method Value : LOSS
>>>> GUC : gp_interconnect_hash_multiplier Value : 2
>>>> GUC : gp_interconnect_min_retries_before_timeout Value : 100
>>>> GUC : gp_interconnect_min_rto Value : 20
>>>> GUC : gp_interconnect_queue_depth Value : 4
>>>> GUC : gp_interconnect_setup_timeout Value : 7200
>>>> GUC : gp_interconnect_snd_queue_depth Value : 2
>>>> GUC : gp_interconnect_timer_checking_period Value : 20
>>>> GUC : gp_interconnect_timer_period Value : 5
>>>> GUC : gp_interconnect_transmit_timeout Value : 3600
>>>> GUC : gp_interconnect_type Value : UDP
>>>> GUC : gp_log_format Value : csv
>>>> GUC : gp_max_csv_line_length Value : 1048576
>>>> GUC : gp_max_databases Value : 16
>>>> GUC : gp_max_filespaces Value : 8
>>>> GUC : gp_max_packet_size Value : 8192
>>>> GUC : gp_max_partition_level Value : 0
>>>> GUC : gp_max_plan_size Value : 0
>>>> GUC : gp_max_relations Value : 65536
>>>> GUC : gp_max_tablespaces Value : 16
>>>> GUC : gp_motion_cost_per_row Value : 0
>>>> GUC : gp_num_contents_in_cluster Value : -10000
>>>> GUC : gp_query_context_mem_limit Value : 102400
>>>> GUC : gp_reject_percent_threshold Value : 300
>>>> GUC : gp_reraise_signal Value : on
>>>> GUC : gp_role Value : utility
>>>> GUC : gp_safefswritesize Value : 0
>>>> GUC : gp_segment_connect_timeout Value : 600
>>>> GUC : gp_segments_for_planner Value : 0
>>>> GUC : gp_session_id Value : -1
>>>> GUC : gp_set_proc_affinity Value : off
>>>> GUC : gp_snmp_community Value : public
>>>> GUC : gp_snmp_monitor_address Value :
>>>> GUC : gp_snmp_use_inform_or_trap Value : trap
>>>> GUC : gp_statistics_pullup_from_child_partition Value : on
>>>> GUC : gp_statistics_use_fkeys Value : on
>>>> GUC : gp_subtrans_warn_limit Value : 16777216
>>>> GUC : gp_temporary_directory_mark_error Value : 0
>>>> GUC : gp_udp_bufsize_k Value : 0
>>>> GUC : gp_vmem_protect_segworker_cache_limit Value : 500
>>>> GUC : gp_workfile_checksumming Value : on
>>>> GUC : gp_workfile_compress_algorithm Value : none
>>>> GUC : gp_workfile_limit_per_query Value : 0
>>>> GUC : gp_workfile_limit_per_segment Value : 0
>>>> GUC : gpperfmon_port Value : 8888
>>>> GUC : hawq_global_rm_type Value : yarn
>>>> GUC : hawq_master_address_host Value : master.bigdata
>>>> GUC : hawq_master_address_port Value : 6432
>>>> GUC : hawq_master_temp_directory Value : /tmp
>>>> GUC : hawq_metadata_cache_block_capacity Value : 2097152
>>>> GUC : hawq_metadata_cache_check_interval Value : 30
>>>> GUC : hawq_metadata_cache_flush_ratio Value : 0.85
>>>> GUC : hawq_metadata_cache_free_block_max_ratio Value : 0.05
>>>> GUC : hawq_metadata_cache_free_block_normal_ratio Value : 0.2
>>>> GUC : hawq_metadata_cache_max_hdfs_file_num Value : 524288
>>>> GUC : hawq_metadata_cache_reduce_ratio Value : 0.7
>>>> GUC : hawq_metadata_cache_refresh_interval Value : 3600
>>>> GUC : hawq_metadata_cache_refresh_max_num Value : 1000
>>>> GUC : hawq_metadata_cache_refresh_timeout Value : 3600
>>>> GUC : hawq_re_cgroup_hierarchy_name Value : hadoop-yarn
>>>> GUC : hawq_re_cgroup_mount_point Value : /sys/fs/cgroup
>>>> GUC : hawq_re_cpu_enable Value : off
>>>> GUC : hawq_re_cpu_weight Value : 1024
>>>> GUC : hawq_re_memory_overcommit_max Value : 8192
>>>> GUC : hawq_re_vcore_pcore_ratio Value : 1
>>>> GUC : hawq_rm_cluster_report_period Value : 60
>>>> GUC : hawq_rm_clusterratio_core_to_memorygb_factor Value : 5
>>>> GUC : hawq_rm_connpool_sameaddr_buffersize Value : 2
>>>> GUC : hawq_rm_container_batch_limit Value : 1000
>>>> GUC : hawq_rm_enable_connpool Value : on
>>>> GUC : hawq_rm_force_alterqueue_cancel_queued_request Value : on
>>>> GUC : hawq_rm_force_fifo_queuing Value : on
>>>> GUC : hawq_rm_master_domain_port Value : 5436
>>>> GUC : hawq_rm_master_port Value : 5437
>>>> GUC : hawq_rm_memory_limit_perseg Value : 480GB
>>>> GUC : hawq_rm_min_resource_perseg Value : 2
>>>> GUC : hawq_rm_nocluster_timeout Value : 60
>>>> GUC : hawq_rm_nresqueue_limit Value : 128
>>>> GUC : hawq_rm_nslice_perseg_limit Value : 5000
>>>> GUC : hawq_rm_nvcore_limit_perseg Value : 16
>>>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_limit Value : 512
>>>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_perseg_limit Value : 8
>>>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_limit Value : 256
>>>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_perseg_limit Value : 4
>>>> GUC : hawq_rm_nvseg_for_copy_from_perquery Value : 6
>>>> GUC : hawq_rm_nvseg_perquery_limit Value : 512
>>>> GUC : hawq_rm_nvseg_perquery_perseg_limit Value : 6
>>>> GUC : hawq_rm_nvseg_variance_amon_seg_limit Value : 1
>>>> GUC : hawq_rm_nvseg_variance_amon_seg_respool_limit Value : 2
>>>> GUC : hawq_rm_regularize_io_factor Value : 1
>>>> GUC : hawq_rm_regularize_io_max Value : 1.37439e+11
>>>> GUC : hawq_rm_regularize_nvseg_factor Value : 1
>>>> GUC : hawq_rm_regularize_nvseg_max Value : 300
>>>> GUC : hawq_rm_regularize_usage_factor Value : 1
>>>> GUC : hawq_rm_rejectrequest_nseg_limit Value : 0.25
>>>> GUC : hawq_rm_request_timeoutcheck_interval Value : 1
>>>> GUC : hawq_rm_resource_allocation_timeout Value : 600
>>>> GUC : hawq_rm_resource_idle_timeout Value : 300
>>>> GUC : hawq_rm_respool_test_file Value :
>>>> GUC : hawq_rm_return_percent_on_overcommit Value : 10
>>>> GUC : hawq_rm_segment_config_refresh_interval Value : 30
>>>> GUC : hawq_rm_segment_heartbeat_interval Value : 30
>>>> GUC : hawq_rm_segment_heartbeat_timeout Value : 300
>>>> GUC : hawq_rm_segment_port Value : 5438
>>>> GUC : hawq_rm_segment_tmpdir_detect_interval Value : 300
>>>> GUC : hawq_rm_session_lease_heartbeat_enable Value : on
>>>> GUC : hawq_rm_session_lease_heartbeat_interval Value : 10
>>>> GUC : hawq_rm_session_lease_timeout Value : 180
>>>> GUC : hawq_rm_stmt_nvseg Value : 0
>>>> GUC : hawq_rm_stmt_vseg_memory Value : 128mb
>>>> GUC : hawq_rm_tolerate_nseg_limit Value : 0.25
>>>> GUC : hawq_rm_yarn_address Value : worker1.bigdata:8050
>>>> GUC : hawq_rm_yarn_app_name Value : hawq
>>>> GUC : hawq_rm_yarn_queue_name Value : default
>>>> GUC : hawq_rm_yarn_scheduler_address Value : worker1.bigdata:8030
>>>> GUC : hawq_segment_address_port Value : 40000
>>>> GUC : hawq_segment_history_keep_period Value : 365
>>>> GUC : hawq_segment_temp_directory Value : /tmp
>>>> GUC : integer_datetimes Value : on
>>>> GUC : IntervalStyle Value : postgres
>>>> GUC : join_collapse_limit Value : 20
>>>> GUC : krb5_ccname Value : /tmp/postgres.ccname
>>>> GUC : krb_caseins_users Value : off
>>>> GUC : krb_server_keyfile Value : FILE:/data/pulse2-agent/agents
>>>> /agent1/work/HAWQ-main-opt/rhel5_x86_64/src/hawq-db-dist/etc
>>>> /krb5.keytab
>>>> GUC : krb_srvname Value : postgres
>>>> GUC : lc_collate Value : C
>>>> GUC : lc_ctype Value : C
>>>> GUC : lc_messages Value : en_US.utf8
>>>> GUC : lc_monetary Value : en_US.utf8
>>>> GUC : lc_numeric Value : en_US.utf8
>>>> GUC : lc_time Value : en_US.utf8
>>>> GUC : listen_addresses Value : *
>>>> GUC : local_preload_libraries Value :
>>>> GUC : log_autostats Value : off
>>>> GUC : log_connections Value : off
>>>> GUC : log_disconnections Value : off
>>>> GUC : log_dispatch_stats Value : off
>>>> GUC : log_duration Value : off
>>>> GUC : log_error_verbosity Value : default
>>>> GUC : log_executor_stats Value : off
>>>> GUC : log_hostname Value : off
>>>> GUC : log_min_duration_statement Value : -1
>>>> GUC : log_min_error_statement Value : error
>>>> GUC : log_min_messages Value : warning
>>>> GUC : log_parser_stats Value : off
>>>> GUC : log_planner_stats Value : off
>>>> GUC : log_rotation_age Value : 1440
>>>> GUC : log_rotation_size Value : 0
>>>> GUC : log_statement Value : none
>>>> GUC : log_statement_stats Value : off
>>>> GUC : log_timezone Value : PRC
>>>> GUC : log_truncate_on_rotation Value : off
>>>> GUC : maintenance_work_mem Value : 65536
>>>> GUC : master_directory Value :
>>>> GUC : max_appendonly_segfiles Value : 262144
>>>> GUC : max_appendonly_tables Value : 10000
>>>> GUC : max_connections Value : 1280
>>>> GUC : max_files_per_process Value : 150
>>>> GUC : max_fsm_pages Value : 200000
>>>> GUC : max_fsm_relations Value : 1000
>>>> GUC : max_function_args Value : 100
>>>> GUC : max_identifier_length Value : 63
>>>> GUC : max_index_keys Value : 32
>>>> GUC : max_locks_per_transaction Value : 128
>>>> GUC : max_prepared_transactions Value : 250
>>>> GUC : max_stack_depth Value : 2048
>>>> GUC : max_work_mem Value : 1024000
>>>> GUC : metadata_cache_testfile Value :
>>>> GUC : optimizer Value : on
>>>> GUC : optimizer_analyze_root_partition Value : on
>>>> GUC : optimizer_minidump Value : onerror
>>>> GUC : optimizer_parts_to_force_sort_on_insert Value : 160
>>>> GUC : password_encryption Value : on
>>>> GUC : password_hash_algorithm Value : MD5
>>>> GUC : pljava_classpath Value :
>>>> GUC : pljava_release_lingering_savepoints Value : off
>>>> GUC : pljava_statement_cache_size Value : 0
>>>> GUC : pljava_vmoptions Value :
>>>> GUC : port Value : 6432
>>>> GUC : pxf_enable_filter_pushdown Value : on
>>>> GUC : pxf_enable_locality_optimizations Value : on
>>>> GUC : pxf_enable_stat_collection Value : on
>>>> GUC : pxf_remote_service_login Value :
>>>> GUC : pxf_remote_service_secret Value :
>>>> GUC : pxf_service_address Value : localhost:51200
>>>> GUC : pxf_stat_max_fragments Value : 100
>>>> GUC : random_page_cost Value : 100
>>>> GUC : regex_flavor Value : advanced
>>>> GUC : runaway_detector_activation_percent Value : 95
>>>> GUC : search_path Value : "$user",public
>>>> GUC : seg_max_connections Value : 3000
>>>> GUC : segment_directory Value :
>>>> GUC : seq_page_cost Value : 1
>>>> GUC : server_encoding Value : UTF8
>>>> GUC : server_ticket_renew_interval Value : 43200000
>>>> GUC : server_version Value : 8.2.15
>>>> GUC : server_version_num Value : 80215
>>>> GUC : shared_buffers Value : 4000
>>>> GUC : shared_preload_libraries Value :
>>>> GUC : ssl Value : off
>>>> GUC : ssl_ciphers Value : ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
>>>> GUC : standard_conforming_strings Value : off
>>>> GUC : standby_address_host Value : localhost
>>>> GUC : statement_timeout Value : 0
>>>> GUC : superuser_reserved_connections Value : 3
>>>> GUC : tcp_keepalives_count Value : 9
>>>> GUC : tcp_keepalives_idle Value : 7200
>>>> GUC : tcp_keepalives_interval Value : 75
>>>> GUC : temp_buffers Value : 1024
>>>> GUC : TimeZone Value : PRC
>>>> GUC : timezone_abbreviations Value : Default
>>>> GUC : track_activities Value : on
>>>> GUC : track_counts Value : off
>>>> GUC : transaction_isolation Value : read committed
>>>> GUC : transaction_read_only Value : off
>>>> GUC : transform_null_equals Value : off
>>>> GUC : unix_socket_directory Value :
>>>> GUC : unix_socket_group Value :
>>>> GUC : unix_socket_permissions Value : 511
>>>> GUC : update_process_title Value : on
>>>> GUC : vacuum_cost_delay Value : 0
>>>> GUC : vacuum_cost_limit Value : 200
>>>> GUC : vacuum_cost_page_dirty Value : 20
>>>> GUC : vacuum_cost_page_miss Value : 10
>>>> GUC : vacuum_freeze_min_age Value : 100000000
>>>> GUC : work_mem Value : 51200
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> At 2016-09-21 13:00:41, "Vineet Goel" <vv...@apache.org> wrote:
>>>>
>>>> Could you please post your SQL DDL statement? How many URLs do you have
>>>> in your external table? Also, your HASH dist table - how many buckets are
>>>> defined, if any? Are the # of URLs more than the # of buckets or
>>>> default_hash_table_bucket_number value? Perhaps you can attach your
>>>> hawq-site.xml file as well.
>>>>
>>>> Also see:
>>>> http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html
>>>>
>>>> Thanks
>>>> Vineet
>>>>
>>>>
>>>> On Tue, Sep 20, 2016 at 7:07 PM 来熊 <yi...@163.com> wrote:
>>>>
>>>>> Hi,all:
>>>>>     I am testing hawq 2.0.0 , and I find a problem like this:
>>>>>  I load data from an external table (created using "like target_table"
>>>>> statement) ,
>>>>> if the target table was distributed by some column(s), it raise this
>>>>> error:
>>>>>  External scan error: There are more external files (URLs) than
>>>>> primary segments that can read them (COptTasks.cpp:1756)
>>>>> if the target table was distributed randomly, it works well,
>>>>> I don't set any parameter special,does anybody know how to resolve
>>>>> this problem?
>>>>> thanks a lot.
>>>>>
>>>>
>
>
> --
> Thanks
>
> Hubert Zhang
>

Re: Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)

Posted by Hubert Zhang <hz...@pivotal.io>.
+1 with Vineet's comment.
in your statement "insert into call_center select * from ext_call_center;"
There are two strict restriction of how many virtual segments need to be
started:
1 the bucket number of hash distributed result relation call_center (must
equal to #vseg)
2 the number of locations in  gpfdist ext table ext_call_center (<= #vseg)

You must set the the bucket number of call_center bigger than the the
number of locations in ext_call_center

Thanks
Hubert


On Thu, Sep 22, 2016 at 1:19 AM, Luis Macedo <lm...@pivotal.io> wrote:

> Also on your location clause you should not reference the same file more
> than one time.
>
> If you want to scale gpfdist process you need to use different range on
> each port for a same server. (Not sure if I explain myself :))
>
> If you use one gpfdist per server performance should be fine. One gpfdist
> process usually can do 250MB/s if underlying infra allows.
>
> Rgds
>
> --- Sent from my Nexus 5x
>
> Em 21 de set de 2016 12:14 PM, "Vineet Goel" <vv...@apache.org>
> escreveu:
>
>> Your default_hash_table_bucket_number value is set to 6. Typically, this
>> should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments, you
>> should set this value to 18. Any time you change this parameter, you should
>> redistribute your HASH distributed tables, if you have any (unless the
>> table DDL has # of buckets defined, I think).
>>
>> Increase default_hash_table_bucket_number to 18 and retry the insert.
>> Since you have 18 ext table URLs, it should work with the change.
>>
>> Thanks
>> Vineet
>>
>>
>> On Wed, Sep 21, 2016 at 12:26 AM 来熊 <yi...@163.com> wrote:
>>
>>>
>>> My environment is >>>> : 1 master 3 segments
>>> SQL >>>>>>:
>>> CREATE TABLE call_center (
>>>     cc_call_center_sk integer,
>>>     cc_call_center_id character varying(16),
>>>     cc_rec_start_date date,
>>>     cc_rec_end_date date,
>>>     cc_closed_date_sk integer,
>>>     cc_open_date_sk integer,
>>>     cc_name character varying(50),
>>>     cc_class character varying(50),
>>>     cc_employees integer,
>>>     cc_sq_ft integer,
>>>     cc_hours character varying(20),
>>>     cc_manager character varying(40),
>>>     cc_mkt_id integer,
>>>     cc_mkt_class character varying(50),
>>>     cc_mkt_desc character varying(100),
>>>     cc_market_manager character varying(40),
>>>     cc_division text,
>>>     cc_division_name character varying(50),
>>>     cc_company text,
>>>     cc_company_name character varying(50),
>>>     cc_street_number character varying(10),
>>>     cc_street_name character varying(60),
>>>     cc_street_type character varying(15),
>>>     cc_suite_number character varying(10),
>>>     cc_city character varying(60),
>>>     cc_county character varying(30),
>>>     cc_state text,
>>>     cc_zip character varying(10),
>>>     cc_country character varying(20),
>>>     cc_gmt_offset numeric(5,2),
>>>     cc_tax_percentage numeric(5,2)
>>> )
>>> WITH (appendonly=true, orientation=parquet)
>>> DISTRIBUTED BY (cc_call_center_sk);
>>>
>>> CREATE EXTERNAL TABLE ext_call_center (like call_center)
>>> LOCATION (
>>> 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
>>> FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');
>>>
>>> insert into call_center select * from ext_call_center;
>>>
>>> ERROR:  External scan error: There are more external files (URLs) than
>>> primary segments that can read them (COptTasks.cpp:1756)
>>>
>>> hawq config >>>>:
>>>
>>> GUC : add_missing_from Value : off
>>> GUC : application_name Value :
>>> GUC : array_nulls Value : on
>>> GUC : authentication_timeout Value : 60
>>> GUC : backslash_quote Value : safe_encoding
>>> GUC : block_size Value : 32768
>>> GUC : bonjour_name Value :
>>> GUC : check_function_bodies Value : on
>>> GUC : client_encoding Value : UTF8
>>> GUC : client_min_messages Value : ERROR
>>> GUC : cpu_index_tuple_cost Value : 0.005
>>> GUC : cpu_operator_cost Value : 0.0025
>>> GUC : cpu_tuple_cost Value : 0.01
>>> GUC : cursor_tuple_fraction Value : 1
>>> GUC : custom_variable_classes Value :
>>> GUC : DateStyle Value : ISO, MDY
>>> GUC : db_user_namespace Value : off
>>> GUC : deadlock_timeout Value : 1000
>>> GUC : debug_assertions Value : off
>>> GUC : debug_pretty_print Value : off
>>> GUC : debug_print_parse Value : off
>>> GUC : debug_print_plan Value : off
>>> GUC : debug_print_prelim_plan Value : off
>>> GUC : debug_print_rewritten Value : off
>>> GUC : debug_print_slice_table Value : off
>>> GUC : default_hash_table_bucket_number Value : 6
>>> GUC : default_statement_mem Value : 128000
>>> GUC : default_statistics_target Value : 25
>>> GUC : default_tablespace Value :
>>> GUC : default_transaction_isolation Value : read committed
>>> GUC : default_transaction_read_only Value : off
>>> GUC : dfs_url Value : localhost:8020/hawq
>>> GUC : dynamic_library_path Value : $libdir
>>> GUC : effective_cache_size Value : 16384
>>> GUC : enable_bitmapscan Value : on
>>> GUC : enable_groupagg Value : on
>>> GUC : enable_hashagg Value : on
>>> GUC : enable_hashjoin Value : on
>>> GUC : enable_indexscan Value : on
>>> GUC : enable_mergejoin Value : off
>>> GUC : enable_nestloop Value : off
>>> GUC : enable_secure_filesystem Value : off
>>> GUC : enable_seqscan Value : on
>>> GUC : enable_sort Value : on
>>> GUC : enable_tidscan Value : on
>>> GUC : escape_string_warning Value : on
>>> GUC : explain_memory_verbosity Value : suppress
>>> GUC : explain_pretty_print Value : on
>>> GUC : extra_float_digits Value : 0
>>> GUC : filesystem_support_truncate Value : on
>>> GUC : from_collapse_limit Value : 20
>>> GUC : gp_adjust_selectivity_for_outerjoins Value : on
>>> GUC : gp_analyze_relative_error Value : 0.25
>>> GUC : gp_autostats_mode Value : NONE
>>> GUC : gp_autostats_on_change_threshold Value : 2147483647
>>> GUC : gp_backup_directIO Value : off
>>> GUC : gp_backup_directIO_read_chunk_mb Value : 20
>>> GUC : gp_cached_segworkers_threshold Value : 5
>>> GUC : gp_command_count Value : 6
>>> GUC : gp_connections_per_thread Value : 64
>>> GUC : gp_debug_linger Value : 0
>>> GUC : gp_dynamic_partition_pruning Value : on
>>> GUC : gp_email_connect_avoid_duration Value : 7200
>>> GUC : gp_email_connect_failures Value : 5
>>> GUC : gp_email_connect_timeout Value : 15
>>> GUC : gp_email_from Value :
>>> GUC : gp_email_smtp_password Value :
>>> GUC : gp_email_smtp_server Value : localhost:25
>>> GUC : gp_email_smtp_userid Value :
>>> GUC : gp_email_to Value :
>>> GUC : gp_enable_agg_distinct Value : on
>>> GUC : gp_enable_agg_distinct_pruning Value : on
>>> GUC : gp_enable_direct_dispatch Value : on
>>> GUC : gp_enable_fallback_plan Value : on
>>> GUC : gp_enable_fast_sri Value : on
>>> GUC : gp_enable_gpperfmon Value : off
>>> GUC : gp_enable_groupext_distinct_gather Value : on
>>> GUC : gp_enable_groupext_distinct_pruning Value : on
>>> GUC : gp_enable_multiphase_agg Value : on
>>> GUC : gp_enable_predicate_propagation Value : on
>>> GUC : gp_enable_preunique Value : on
>>> GUC : gp_enable_sequential_window_plans Value : on
>>> GUC : gp_enable_sort_distinct Value : on
>>> GUC : gp_enable_sort_limit Value : on
>>> GUC : gp_external_enable_exec Value : on
>>> GUC : gp_external_grant_privileges Value : off
>>> GUC : gp_external_max_segs Value : 64
>>> GUC : gp_filerep_tcp_keepalives_count Value : 2
>>> GUC : gp_filerep_tcp_keepalives_idle Value : 60
>>> GUC : gp_filerep_tcp_keepalives_interval Value : 30
>>> GUC : gp_force_use_default_temporary_directory Value : off
>>> GUC : gp_gpperfmon_send_interval Value : 1
>>> GUC : gp_hashjoin_tuples_per_bucket Value : 5
>>> GUC : gp_idf_deduplicate Value : auto
>>> GUC : gp_interconnect_cache_future_packets Value : on
>>> GUC : gp_interconnect_default_rtt Value : 20
>>> GUC : gp_interconnect_fc_method Value : LOSS
>>> GUC : gp_interconnect_hash_multiplier Value : 2
>>> GUC : gp_interconnect_min_retries_before_timeout Value : 100
>>> GUC : gp_interconnect_min_rto Value : 20
>>> GUC : gp_interconnect_queue_depth Value : 4
>>> GUC : gp_interconnect_setup_timeout Value : 7200
>>> GUC : gp_interconnect_snd_queue_depth Value : 2
>>> GUC : gp_interconnect_timer_checking_period Value : 20
>>> GUC : gp_interconnect_timer_period Value : 5
>>> GUC : gp_interconnect_transmit_timeout Value : 3600
>>> GUC : gp_interconnect_type Value : UDP
>>> GUC : gp_log_format Value : csv
>>> GUC : gp_max_csv_line_length Value : 1048576
>>> GUC : gp_max_databases Value : 16
>>> GUC : gp_max_filespaces Value : 8
>>> GUC : gp_max_packet_size Value : 8192
>>> GUC : gp_max_partition_level Value : 0
>>> GUC : gp_max_plan_size Value : 0
>>> GUC : gp_max_relations Value : 65536
>>> GUC : gp_max_tablespaces Value : 16
>>> GUC : gp_motion_cost_per_row Value : 0
>>> GUC : gp_num_contents_in_cluster Value : -10000
>>> GUC : gp_query_context_mem_limit Value : 102400
>>> GUC : gp_reject_percent_threshold Value : 300
>>> GUC : gp_reraise_signal Value : on
>>> GUC : gp_role Value : utility
>>> GUC : gp_safefswritesize Value : 0
>>> GUC : gp_segment_connect_timeout Value : 600
>>> GUC : gp_segments_for_planner Value : 0
>>> GUC : gp_session_id Value : -1
>>> GUC : gp_set_proc_affinity Value : off
>>> GUC : gp_snmp_community Value : public
>>> GUC : gp_snmp_monitor_address Value :
>>> GUC : gp_snmp_use_inform_or_trap Value : trap
>>> GUC : gp_statistics_pullup_from_child_partition Value : on
>>> GUC : gp_statistics_use_fkeys Value : on
>>> GUC : gp_subtrans_warn_limit Value : 16777216
>>> GUC : gp_temporary_directory_mark_error Value : 0
>>> GUC : gp_udp_bufsize_k Value : 0
>>> GUC : gp_vmem_protect_segworker_cache_limit Value : 500
>>> GUC : gp_workfile_checksumming Value : on
>>> GUC : gp_workfile_compress_algorithm Value : none
>>> GUC : gp_workfile_limit_per_query Value : 0
>>> GUC : gp_workfile_limit_per_segment Value : 0
>>> GUC : gpperfmon_port Value : 8888
>>> GUC : hawq_global_rm_type Value : yarn
>>> GUC : hawq_master_address_host Value : master.bigdata
>>> GUC : hawq_master_address_port Value : 6432
>>> GUC : hawq_master_temp_directory Value : /tmp
>>> GUC : hawq_metadata_cache_block_capacity Value : 2097152
>>> GUC : hawq_metadata_cache_check_interval Value : 30
>>> GUC : hawq_metadata_cache_flush_ratio Value : 0.85
>>> GUC : hawq_metadata_cache_free_block_max_ratio Value : 0.05
>>> GUC : hawq_metadata_cache_free_block_normal_ratio Value : 0.2
>>> GUC : hawq_metadata_cache_max_hdfs_file_num Value : 524288
>>> GUC : hawq_metadata_cache_reduce_ratio Value : 0.7
>>> GUC : hawq_metadata_cache_refresh_interval Value : 3600
>>> GUC : hawq_metadata_cache_refresh_max_num Value : 1000
>>> GUC : hawq_metadata_cache_refresh_timeout Value : 3600
>>> GUC : hawq_re_cgroup_hierarchy_name Value : hadoop-yarn
>>> GUC : hawq_re_cgroup_mount_point Value : /sys/fs/cgroup
>>> GUC : hawq_re_cpu_enable Value : off
>>> GUC : hawq_re_cpu_weight Value : 1024
>>> GUC : hawq_re_memory_overcommit_max Value : 8192
>>> GUC : hawq_re_vcore_pcore_ratio Value : 1
>>> GUC : hawq_rm_cluster_report_period Value : 60
>>> GUC : hawq_rm_clusterratio_core_to_memorygb_factor Value : 5
>>> GUC : hawq_rm_connpool_sameaddr_buffersize Value : 2
>>> GUC : hawq_rm_container_batch_limit Value : 1000
>>> GUC : hawq_rm_enable_connpool Value : on
>>> GUC : hawq_rm_force_alterqueue_cancel_queued_request Value : on
>>> GUC : hawq_rm_force_fifo_queuing Value : on
>>> GUC : hawq_rm_master_domain_port Value : 5436
>>> GUC : hawq_rm_master_port Value : 5437
>>> GUC : hawq_rm_memory_limit_perseg Value : 480GB
>>> GUC : hawq_rm_min_resource_perseg Value : 2
>>> GUC : hawq_rm_nocluster_timeout Value : 60
>>> GUC : hawq_rm_nresqueue_limit Value : 128
>>> GUC : hawq_rm_nslice_perseg_limit Value : 5000
>>> GUC : hawq_rm_nvcore_limit_perseg Value : 16
>>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_limit Value : 512
>>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_perseg_limit Value : 8
>>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_limit Value : 256
>>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_perseg_limit Value : 4
>>> GUC : hawq_rm_nvseg_for_copy_from_perquery Value : 6
>>> GUC : hawq_rm_nvseg_perquery_limit Value : 512
>>> GUC : hawq_rm_nvseg_perquery_perseg_limit Value : 6
>>> GUC : hawq_rm_nvseg_variance_amon_seg_limit Value : 1
>>> GUC : hawq_rm_nvseg_variance_amon_seg_respool_limit Value : 2
>>> GUC : hawq_rm_regularize_io_factor Value : 1
>>> GUC : hawq_rm_regularize_io_max Value : 1.37439e+11
>>> GUC : hawq_rm_regularize_nvseg_factor Value : 1
>>> GUC : hawq_rm_regularize_nvseg_max Value : 300
>>> GUC : hawq_rm_regularize_usage_factor Value : 1
>>> GUC : hawq_rm_rejectrequest_nseg_limit Value : 0.25
>>> GUC : hawq_rm_request_timeoutcheck_interval Value : 1
>>> GUC : hawq_rm_resource_allocation_timeout Value : 600
>>> GUC : hawq_rm_resource_idle_timeout Value : 300
>>> GUC : hawq_rm_respool_test_file Value :
>>> GUC : hawq_rm_return_percent_on_overcommit Value : 10
>>> GUC : hawq_rm_segment_config_refresh_interval Value : 30
>>> GUC : hawq_rm_segment_heartbeat_interval Value : 30
>>> GUC : hawq_rm_segment_heartbeat_timeout Value : 300
>>> GUC : hawq_rm_segment_port Value : 5438
>>> GUC : hawq_rm_segment_tmpdir_detect_interval Value : 300
>>> GUC : hawq_rm_session_lease_heartbeat_enable Value : on
>>> GUC : hawq_rm_session_lease_heartbeat_interval Value : 10
>>> GUC : hawq_rm_session_lease_timeout Value : 180
>>> GUC : hawq_rm_stmt_nvseg Value : 0
>>> GUC : hawq_rm_stmt_vseg_memory Value : 128mb
>>> GUC : hawq_rm_tolerate_nseg_limit Value : 0.25
>>> GUC : hawq_rm_yarn_address Value : worker1.bigdata:8050
>>> GUC : hawq_rm_yarn_app_name Value : hawq
>>> GUC : hawq_rm_yarn_queue_name Value : default
>>> GUC : hawq_rm_yarn_scheduler_address Value : worker1.bigdata:8030
>>> GUC : hawq_segment_address_port Value : 40000
>>> GUC : hawq_segment_history_keep_period Value : 365
>>> GUC : hawq_segment_temp_directory Value : /tmp
>>> GUC : integer_datetimes Value : on
>>> GUC : IntervalStyle Value : postgres
>>> GUC : join_collapse_limit Value : 20
>>> GUC : krb5_ccname Value : /tmp/postgres.ccname
>>> GUC : krb_caseins_users Value : off
>>> GUC : krb_server_keyfile Value : FILE:/data/pulse2-agent/agents
>>> /agent1/work/HAWQ-main-opt/rhel5_x86_64/src/hawq-db-dist/etc/krb5.keytab
>>> GUC : krb_srvname Value : postgres
>>> GUC : lc_collate Value : C
>>> GUC : lc_ctype Value : C
>>> GUC : lc_messages Value : en_US.utf8
>>> GUC : lc_monetary Value : en_US.utf8
>>> GUC : lc_numeric Value : en_US.utf8
>>> GUC : lc_time Value : en_US.utf8
>>> GUC : listen_addresses Value : *
>>> GUC : local_preload_libraries Value :
>>> GUC : log_autostats Value : off
>>> GUC : log_connections Value : off
>>> GUC : log_disconnections Value : off
>>> GUC : log_dispatch_stats Value : off
>>> GUC : log_duration Value : off
>>> GUC : log_error_verbosity Value : default
>>> GUC : log_executor_stats Value : off
>>> GUC : log_hostname Value : off
>>> GUC : log_min_duration_statement Value : -1
>>> GUC : log_min_error_statement Value : error
>>> GUC : log_min_messages Value : warning
>>> GUC : log_parser_stats Value : off
>>> GUC : log_planner_stats Value : off
>>> GUC : log_rotation_age Value : 1440
>>> GUC : log_rotation_size Value : 0
>>> GUC : log_statement Value : none
>>> GUC : log_statement_stats Value : off
>>> GUC : log_timezone Value : PRC
>>> GUC : log_truncate_on_rotation Value : off
>>> GUC : maintenance_work_mem Value : 65536
>>> GUC : master_directory Value :
>>> GUC : max_appendonly_segfiles Value : 262144
>>> GUC : max_appendonly_tables Value : 10000
>>> GUC : max_connections Value : 1280
>>> GUC : max_files_per_process Value : 150
>>> GUC : max_fsm_pages Value : 200000
>>> GUC : max_fsm_relations Value : 1000
>>> GUC : max_function_args Value : 100
>>> GUC : max_identifier_length Value : 63
>>> GUC : max_index_keys Value : 32
>>> GUC : max_locks_per_transaction Value : 128
>>> GUC : max_prepared_transactions Value : 250
>>> GUC : max_stack_depth Value : 2048
>>> GUC : max_work_mem Value : 1024000
>>> GUC : metadata_cache_testfile Value :
>>> GUC : optimizer Value : on
>>> GUC : optimizer_analyze_root_partition Value : on
>>> GUC : optimizer_minidump Value : onerror
>>> GUC : optimizer_parts_to_force_sort_on_insert Value : 160
>>> GUC : password_encryption Value : on
>>> GUC : password_hash_algorithm Value : MD5
>>> GUC : pljava_classpath Value :
>>> GUC : pljava_release_lingering_savepoints Value : off
>>> GUC : pljava_statement_cache_size Value : 0
>>> GUC : pljava_vmoptions Value :
>>> GUC : port Value : 6432
>>> GUC : pxf_enable_filter_pushdown Value : on
>>> GUC : pxf_enable_locality_optimizations Value : on
>>> GUC : pxf_enable_stat_collection Value : on
>>> GUC : pxf_remote_service_login Value :
>>> GUC : pxf_remote_service_secret Value :
>>> GUC : pxf_service_address Value : localhost:51200
>>> GUC : pxf_stat_max_fragments Value : 100
>>> GUC : random_page_cost Value : 100
>>> GUC : regex_flavor Value : advanced
>>> GUC : runaway_detector_activation_percent Value : 95
>>> GUC : search_path Value : "$user",public
>>> GUC : seg_max_connections Value : 3000
>>> GUC : segment_directory Value :
>>> GUC : seq_page_cost Value : 1
>>> GUC : server_encoding Value : UTF8
>>> GUC : server_ticket_renew_interval Value : 43200000
>>> GUC : server_version Value : 8.2.15
>>> GUC : server_version_num Value : 80215
>>> GUC : shared_buffers Value : 4000
>>> GUC : shared_preload_libraries Value :
>>> GUC : ssl Value : off
>>> GUC : ssl_ciphers Value : ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
>>> GUC : standard_conforming_strings Value : off
>>> GUC : standby_address_host Value : localhost
>>> GUC : statement_timeout Value : 0
>>> GUC : superuser_reserved_connections Value : 3
>>> GUC : tcp_keepalives_count Value : 9
>>> GUC : tcp_keepalives_idle Value : 7200
>>> GUC : tcp_keepalives_interval Value : 75
>>> GUC : temp_buffers Value : 1024
>>> GUC : TimeZone Value : PRC
>>> GUC : timezone_abbreviations Value : Default
>>> GUC : track_activities Value : on
>>> GUC : track_counts Value : off
>>> GUC : transaction_isolation Value : read committed
>>> GUC : transaction_read_only Value : off
>>> GUC : transform_null_equals Value : off
>>> GUC : unix_socket_directory Value :
>>> GUC : unix_socket_group Value :
>>> GUC : unix_socket_permissions Value : 511
>>> GUC : update_process_title Value : on
>>> GUC : vacuum_cost_delay Value : 0
>>> GUC : vacuum_cost_limit Value : 200
>>> GUC : vacuum_cost_page_dirty Value : 20
>>> GUC : vacuum_cost_page_miss Value : 10
>>> GUC : vacuum_freeze_min_age Value : 100000000
>>> GUC : work_mem Value : 51200
>>>
>>>
>>>
>>>
>>>
>>> At 2016-09-21 13:00:41, "Vineet Goel" <vv...@apache.org> wrote:
>>>
>>> Could you please post your SQL DDL statement? How many URLs do you have
>>> in your external table? Also, your HASH dist table - how many buckets are
>>> defined, if any? Are the # of URLs more than the # of buckets or
>>> default_hash_table_bucket_number value? Perhaps you can attach your
>>> hawq-site.xml file as well.
>>>
>>> Also see:
>>> http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html
>>>
>>> Thanks
>>> Vineet
>>>
>>>
>>> On Tue, Sep 20, 2016 at 7:07 PM 来熊 <yi...@163.com> wrote:
>>>
>>>> Hi,all:
>>>>     I am testing hawq 2.0.0 , and I find a problem like this:
>>>>  I load data from an external table (created using "like target_table"
>>>> statement) ,
>>>> if the target table was distributed by some column(s), it raise this
>>>> error:
>>>>  External scan error: There are more external files (URLs) than primary
>>>> segments that can read them (COptTasks.cpp:1756)
>>>> if the target table was distributed randomly, it works well,
>>>> I don't set any parameter special,does anybody know how to resolve this
>>>> problem?
>>>> thanks a lot.
>>>>
>>>


-- 
Thanks

Hubert Zhang

Re: Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)

Posted by Hubert Zhang <hz...@pivotal.io>.
+1 with Vineet's comment.
in your statement "insert into call_center select * from ext_call_center;"
There are two strict restriction of how many virtual segments need to be
started:
1 the bucket number of hash distributed result relation call_center (must
equal to #vseg)
2 the number of locations in  gpfdist ext table ext_call_center (<= #vseg)

You must set the the bucket number of call_center bigger than the the
number of locations in ext_call_center

Thanks
Hubert


On Thu, Sep 22, 2016 at 1:19 AM, Luis Macedo <lm...@pivotal.io> wrote:

> Also on your location clause you should not reference the same file more
> than one time.
>
> If you want to scale gpfdist process you need to use different range on
> each port for a same server. (Not sure if I explain myself :))
>
> If you use one gpfdist per server performance should be fine. One gpfdist
> process usually can do 250MB/s if underlying infra allows.
>
> Rgds
>
> --- Sent from my Nexus 5x
>
> Em 21 de set de 2016 12:14 PM, "Vineet Goel" <vv...@apache.org>
> escreveu:
>
>> Your default_hash_table_bucket_number value is set to 6. Typically, this
>> should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments, you
>> should set this value to 18. Any time you change this parameter, you should
>> redistribute your HASH distributed tables, if you have any (unless the
>> table DDL has # of buckets defined, I think).
>>
>> Increase default_hash_table_bucket_number to 18 and retry the insert.
>> Since you have 18 ext table URLs, it should work with the change.
>>
>> Thanks
>> Vineet
>>
>>
>> On Wed, Sep 21, 2016 at 12:26 AM 来熊 <yi...@163.com> wrote:
>>
>>>
>>> My environment is >>>> : 1 master 3 segments
>>> SQL >>>>>>:
>>> CREATE TABLE call_center (
>>>     cc_call_center_sk integer,
>>>     cc_call_center_id character varying(16),
>>>     cc_rec_start_date date,
>>>     cc_rec_end_date date,
>>>     cc_closed_date_sk integer,
>>>     cc_open_date_sk integer,
>>>     cc_name character varying(50),
>>>     cc_class character varying(50),
>>>     cc_employees integer,
>>>     cc_sq_ft integer,
>>>     cc_hours character varying(20),
>>>     cc_manager character varying(40),
>>>     cc_mkt_id integer,
>>>     cc_mkt_class character varying(50),
>>>     cc_mkt_desc character varying(100),
>>>     cc_market_manager character varying(40),
>>>     cc_division text,
>>>     cc_division_name character varying(50),
>>>     cc_company text,
>>>     cc_company_name character varying(50),
>>>     cc_street_number character varying(10),
>>>     cc_street_name character varying(60),
>>>     cc_street_type character varying(15),
>>>     cc_suite_number character varying(10),
>>>     cc_city character varying(60),
>>>     cc_county character varying(30),
>>>     cc_state text,
>>>     cc_zip character varying(10),
>>>     cc_country character varying(20),
>>>     cc_gmt_offset numeric(5,2),
>>>     cc_tax_percentage numeric(5,2)
>>> )
>>> WITH (appendonly=true, orientation=parquet)
>>> DISTRIBUTED BY (cc_call_center_sk);
>>>
>>> CREATE EXTERNAL TABLE ext_call_center (like call_center)
>>> LOCATION (
>>> 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat',
>>> 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
>>> FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');
>>>
>>> insert into call_center select * from ext_call_center;
>>>
>>> ERROR:  External scan error: There are more external files (URLs) than
>>> primary segments that can read them (COptTasks.cpp:1756)
>>>
>>> hawq config >>>>:
>>>
>>> GUC : add_missing_from Value : off
>>> GUC : application_name Value :
>>> GUC : array_nulls Value : on
>>> GUC : authentication_timeout Value : 60
>>> GUC : backslash_quote Value : safe_encoding
>>> GUC : block_size Value : 32768
>>> GUC : bonjour_name Value :
>>> GUC : check_function_bodies Value : on
>>> GUC : client_encoding Value : UTF8
>>> GUC : client_min_messages Value : ERROR
>>> GUC : cpu_index_tuple_cost Value : 0.005
>>> GUC : cpu_operator_cost Value : 0.0025
>>> GUC : cpu_tuple_cost Value : 0.01
>>> GUC : cursor_tuple_fraction Value : 1
>>> GUC : custom_variable_classes Value :
>>> GUC : DateStyle Value : ISO, MDY
>>> GUC : db_user_namespace Value : off
>>> GUC : deadlock_timeout Value : 1000
>>> GUC : debug_assertions Value : off
>>> GUC : debug_pretty_print Value : off
>>> GUC : debug_print_parse Value : off
>>> GUC : debug_print_plan Value : off
>>> GUC : debug_print_prelim_plan Value : off
>>> GUC : debug_print_rewritten Value : off
>>> GUC : debug_print_slice_table Value : off
>>> GUC : default_hash_table_bucket_number Value : 6
>>> GUC : default_statement_mem Value : 128000
>>> GUC : default_statistics_target Value : 25
>>> GUC : default_tablespace Value :
>>> GUC : default_transaction_isolation Value : read committed
>>> GUC : default_transaction_read_only Value : off
>>> GUC : dfs_url Value : localhost:8020/hawq
>>> GUC : dynamic_library_path Value : $libdir
>>> GUC : effective_cache_size Value : 16384
>>> GUC : enable_bitmapscan Value : on
>>> GUC : enable_groupagg Value : on
>>> GUC : enable_hashagg Value : on
>>> GUC : enable_hashjoin Value : on
>>> GUC : enable_indexscan Value : on
>>> GUC : enable_mergejoin Value : off
>>> GUC : enable_nestloop Value : off
>>> GUC : enable_secure_filesystem Value : off
>>> GUC : enable_seqscan Value : on
>>> GUC : enable_sort Value : on
>>> GUC : enable_tidscan Value : on
>>> GUC : escape_string_warning Value : on
>>> GUC : explain_memory_verbosity Value : suppress
>>> GUC : explain_pretty_print Value : on
>>> GUC : extra_float_digits Value : 0
>>> GUC : filesystem_support_truncate Value : on
>>> GUC : from_collapse_limit Value : 20
>>> GUC : gp_adjust_selectivity_for_outerjoins Value : on
>>> GUC : gp_analyze_relative_error Value : 0.25
>>> GUC : gp_autostats_mode Value : NONE
>>> GUC : gp_autostats_on_change_threshold Value : 2147483647
>>> GUC : gp_backup_directIO Value : off
>>> GUC : gp_backup_directIO_read_chunk_mb Value : 20
>>> GUC : gp_cached_segworkers_threshold Value : 5
>>> GUC : gp_command_count Value : 6
>>> GUC : gp_connections_per_thread Value : 64
>>> GUC : gp_debug_linger Value : 0
>>> GUC : gp_dynamic_partition_pruning Value : on
>>> GUC : gp_email_connect_avoid_duration Value : 7200
>>> GUC : gp_email_connect_failures Value : 5
>>> GUC : gp_email_connect_timeout Value : 15
>>> GUC : gp_email_from Value :
>>> GUC : gp_email_smtp_password Value :
>>> GUC : gp_email_smtp_server Value : localhost:25
>>> GUC : gp_email_smtp_userid Value :
>>> GUC : gp_email_to Value :
>>> GUC : gp_enable_agg_distinct Value : on
>>> GUC : gp_enable_agg_distinct_pruning Value : on
>>> GUC : gp_enable_direct_dispatch Value : on
>>> GUC : gp_enable_fallback_plan Value : on
>>> GUC : gp_enable_fast_sri Value : on
>>> GUC : gp_enable_gpperfmon Value : off
>>> GUC : gp_enable_groupext_distinct_gather Value : on
>>> GUC : gp_enable_groupext_distinct_pruning Value : on
>>> GUC : gp_enable_multiphase_agg Value : on
>>> GUC : gp_enable_predicate_propagation Value : on
>>> GUC : gp_enable_preunique Value : on
>>> GUC : gp_enable_sequential_window_plans Value : on
>>> GUC : gp_enable_sort_distinct Value : on
>>> GUC : gp_enable_sort_limit Value : on
>>> GUC : gp_external_enable_exec Value : on
>>> GUC : gp_external_grant_privileges Value : off
>>> GUC : gp_external_max_segs Value : 64
>>> GUC : gp_filerep_tcp_keepalives_count Value : 2
>>> GUC : gp_filerep_tcp_keepalives_idle Value : 60
>>> GUC : gp_filerep_tcp_keepalives_interval Value : 30
>>> GUC : gp_force_use_default_temporary_directory Value : off
>>> GUC : gp_gpperfmon_send_interval Value : 1
>>> GUC : gp_hashjoin_tuples_per_bucket Value : 5
>>> GUC : gp_idf_deduplicate Value : auto
>>> GUC : gp_interconnect_cache_future_packets Value : on
>>> GUC : gp_interconnect_default_rtt Value : 20
>>> GUC : gp_interconnect_fc_method Value : LOSS
>>> GUC : gp_interconnect_hash_multiplier Value : 2
>>> GUC : gp_interconnect_min_retries_before_timeout Value : 100
>>> GUC : gp_interconnect_min_rto Value : 20
>>> GUC : gp_interconnect_queue_depth Value : 4
>>> GUC : gp_interconnect_setup_timeout Value : 7200
>>> GUC : gp_interconnect_snd_queue_depth Value : 2
>>> GUC : gp_interconnect_timer_checking_period Value : 20
>>> GUC : gp_interconnect_timer_period Value : 5
>>> GUC : gp_interconnect_transmit_timeout Value : 3600
>>> GUC : gp_interconnect_type Value : UDP
>>> GUC : gp_log_format Value : csv
>>> GUC : gp_max_csv_line_length Value : 1048576
>>> GUC : gp_max_databases Value : 16
>>> GUC : gp_max_filespaces Value : 8
>>> GUC : gp_max_packet_size Value : 8192
>>> GUC : gp_max_partition_level Value : 0
>>> GUC : gp_max_plan_size Value : 0
>>> GUC : gp_max_relations Value : 65536
>>> GUC : gp_max_tablespaces Value : 16
>>> GUC : gp_motion_cost_per_row Value : 0
>>> GUC : gp_num_contents_in_cluster Value : -10000
>>> GUC : gp_query_context_mem_limit Value : 102400
>>> GUC : gp_reject_percent_threshold Value : 300
>>> GUC : gp_reraise_signal Value : on
>>> GUC : gp_role Value : utility
>>> GUC : gp_safefswritesize Value : 0
>>> GUC : gp_segment_connect_timeout Value : 600
>>> GUC : gp_segments_for_planner Value : 0
>>> GUC : gp_session_id Value : -1
>>> GUC : gp_set_proc_affinity Value : off
>>> GUC : gp_snmp_community Value : public
>>> GUC : gp_snmp_monitor_address Value :
>>> GUC : gp_snmp_use_inform_or_trap Value : trap
>>> GUC : gp_statistics_pullup_from_child_partition Value : on
>>> GUC : gp_statistics_use_fkeys Value : on
>>> GUC : gp_subtrans_warn_limit Value : 16777216
>>> GUC : gp_temporary_directory_mark_error Value : 0
>>> GUC : gp_udp_bufsize_k Value : 0
>>> GUC : gp_vmem_protect_segworker_cache_limit Value : 500
>>> GUC : gp_workfile_checksumming Value : on
>>> GUC : gp_workfile_compress_algorithm Value : none
>>> GUC : gp_workfile_limit_per_query Value : 0
>>> GUC : gp_workfile_limit_per_segment Value : 0
>>> GUC : gpperfmon_port Value : 8888
>>> GUC : hawq_global_rm_type Value : yarn
>>> GUC : hawq_master_address_host Value : master.bigdata
>>> GUC : hawq_master_address_port Value : 6432
>>> GUC : hawq_master_temp_directory Value : /tmp
>>> GUC : hawq_metadata_cache_block_capacity Value : 2097152
>>> GUC : hawq_metadata_cache_check_interval Value : 30
>>> GUC : hawq_metadata_cache_flush_ratio Value : 0.85
>>> GUC : hawq_metadata_cache_free_block_max_ratio Value : 0.05
>>> GUC : hawq_metadata_cache_free_block_normal_ratio Value : 0.2
>>> GUC : hawq_metadata_cache_max_hdfs_file_num Value : 524288
>>> GUC : hawq_metadata_cache_reduce_ratio Value : 0.7
>>> GUC : hawq_metadata_cache_refresh_interval Value : 3600
>>> GUC : hawq_metadata_cache_refresh_max_num Value : 1000
>>> GUC : hawq_metadata_cache_refresh_timeout Value : 3600
>>> GUC : hawq_re_cgroup_hierarchy_name Value : hadoop-yarn
>>> GUC : hawq_re_cgroup_mount_point Value : /sys/fs/cgroup
>>> GUC : hawq_re_cpu_enable Value : off
>>> GUC : hawq_re_cpu_weight Value : 1024
>>> GUC : hawq_re_memory_overcommit_max Value : 8192
>>> GUC : hawq_re_vcore_pcore_ratio Value : 1
>>> GUC : hawq_rm_cluster_report_period Value : 60
>>> GUC : hawq_rm_clusterratio_core_to_memorygb_factor Value : 5
>>> GUC : hawq_rm_connpool_sameaddr_buffersize Value : 2
>>> GUC : hawq_rm_container_batch_limit Value : 1000
>>> GUC : hawq_rm_enable_connpool Value : on
>>> GUC : hawq_rm_force_alterqueue_cancel_queued_request Value : on
>>> GUC : hawq_rm_force_fifo_queuing Value : on
>>> GUC : hawq_rm_master_domain_port Value : 5436
>>> GUC : hawq_rm_master_port Value : 5437
>>> GUC : hawq_rm_memory_limit_perseg Value : 480GB
>>> GUC : hawq_rm_min_resource_perseg Value : 2
>>> GUC : hawq_rm_nocluster_timeout Value : 60
>>> GUC : hawq_rm_nresqueue_limit Value : 128
>>> GUC : hawq_rm_nslice_perseg_limit Value : 5000
>>> GUC : hawq_rm_nvcore_limit_perseg Value : 16
>>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_limit Value : 512
>>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_perseg_limit Value : 8
>>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_limit Value : 256
>>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_perseg_limit Value : 4
>>> GUC : hawq_rm_nvseg_for_copy_from_perquery Value : 6
>>> GUC : hawq_rm_nvseg_perquery_limit Value : 512
>>> GUC : hawq_rm_nvseg_perquery_perseg_limit Value : 6
>>> GUC : hawq_rm_nvseg_variance_amon_seg_limit Value : 1
>>> GUC : hawq_rm_nvseg_variance_amon_seg_respool_limit Value : 2
>>> GUC : hawq_rm_regularize_io_factor Value : 1
>>> GUC : hawq_rm_regularize_io_max Value : 1.37439e+11
>>> GUC : hawq_rm_regularize_nvseg_factor Value : 1
>>> GUC : hawq_rm_regularize_nvseg_max Value : 300
>>> GUC : hawq_rm_regularize_usage_factor Value : 1
>>> GUC : hawq_rm_rejectrequest_nseg_limit Value : 0.25
>>> GUC : hawq_rm_request_timeoutcheck_interval Value : 1
>>> GUC : hawq_rm_resource_allocation_timeout Value : 600
>>> GUC : hawq_rm_resource_idle_timeout Value : 300
>>> GUC : hawq_rm_respool_test_file Value :
>>> GUC : hawq_rm_return_percent_on_overcommit Value : 10
>>> GUC : hawq_rm_segment_config_refresh_interval Value : 30
>>> GUC : hawq_rm_segment_heartbeat_interval Value : 30
>>> GUC : hawq_rm_segment_heartbeat_timeout Value : 300
>>> GUC : hawq_rm_segment_port Value : 5438
>>> GUC : hawq_rm_segment_tmpdir_detect_interval Value : 300
>>> GUC : hawq_rm_session_lease_heartbeat_enable Value : on
>>> GUC : hawq_rm_session_lease_heartbeat_interval Value : 10
>>> GUC : hawq_rm_session_lease_timeout Value : 180
>>> GUC : hawq_rm_stmt_nvseg Value : 0
>>> GUC : hawq_rm_stmt_vseg_memory Value : 128mb
>>> GUC : hawq_rm_tolerate_nseg_limit Value : 0.25
>>> GUC : hawq_rm_yarn_address Value : worker1.bigdata:8050
>>> GUC : hawq_rm_yarn_app_name Value : hawq
>>> GUC : hawq_rm_yarn_queue_name Value : default
>>> GUC : hawq_rm_yarn_scheduler_address Value : worker1.bigdata:8030
>>> GUC : hawq_segment_address_port Value : 40000
>>> GUC : hawq_segment_history_keep_period Value : 365
>>> GUC : hawq_segment_temp_directory Value : /tmp
>>> GUC : integer_datetimes Value : on
>>> GUC : IntervalStyle Value : postgres
>>> GUC : join_collapse_limit Value : 20
>>> GUC : krb5_ccname Value : /tmp/postgres.ccname
>>> GUC : krb_caseins_users Value : off
>>> GUC : krb_server_keyfile Value : FILE:/data/pulse2-agent/agents
>>> /agent1/work/HAWQ-main-opt/rhel5_x86_64/src/hawq-db-dist/etc/krb5.keytab
>>> GUC : krb_srvname Value : postgres
>>> GUC : lc_collate Value : C
>>> GUC : lc_ctype Value : C
>>> GUC : lc_messages Value : en_US.utf8
>>> GUC : lc_monetary Value : en_US.utf8
>>> GUC : lc_numeric Value : en_US.utf8
>>> GUC : lc_time Value : en_US.utf8
>>> GUC : listen_addresses Value : *
>>> GUC : local_preload_libraries Value :
>>> GUC : log_autostats Value : off
>>> GUC : log_connections Value : off
>>> GUC : log_disconnections Value : off
>>> GUC : log_dispatch_stats Value : off
>>> GUC : log_duration Value : off
>>> GUC : log_error_verbosity Value : default
>>> GUC : log_executor_stats Value : off
>>> GUC : log_hostname Value : off
>>> GUC : log_min_duration_statement Value : -1
>>> GUC : log_min_error_statement Value : error
>>> GUC : log_min_messages Value : warning
>>> GUC : log_parser_stats Value : off
>>> GUC : log_planner_stats Value : off
>>> GUC : log_rotation_age Value : 1440
>>> GUC : log_rotation_size Value : 0
>>> GUC : log_statement Value : none
>>> GUC : log_statement_stats Value : off
>>> GUC : log_timezone Value : PRC
>>> GUC : log_truncate_on_rotation Value : off
>>> GUC : maintenance_work_mem Value : 65536
>>> GUC : master_directory Value :
>>> GUC : max_appendonly_segfiles Value : 262144
>>> GUC : max_appendonly_tables Value : 10000
>>> GUC : max_connections Value : 1280
>>> GUC : max_files_per_process Value : 150
>>> GUC : max_fsm_pages Value : 200000
>>> GUC : max_fsm_relations Value : 1000
>>> GUC : max_function_args Value : 100
>>> GUC : max_identifier_length Value : 63
>>> GUC : max_index_keys Value : 32
>>> GUC : max_locks_per_transaction Value : 128
>>> GUC : max_prepared_transactions Value : 250
>>> GUC : max_stack_depth Value : 2048
>>> GUC : max_work_mem Value : 1024000
>>> GUC : metadata_cache_testfile Value :
>>> GUC : optimizer Value : on
>>> GUC : optimizer_analyze_root_partition Value : on
>>> GUC : optimizer_minidump Value : onerror
>>> GUC : optimizer_parts_to_force_sort_on_insert Value : 160
>>> GUC : password_encryption Value : on
>>> GUC : password_hash_algorithm Value : MD5
>>> GUC : pljava_classpath Value :
>>> GUC : pljava_release_lingering_savepoints Value : off
>>> GUC : pljava_statement_cache_size Value : 0
>>> GUC : pljava_vmoptions Value :
>>> GUC : port Value : 6432
>>> GUC : pxf_enable_filter_pushdown Value : on
>>> GUC : pxf_enable_locality_optimizations Value : on
>>> GUC : pxf_enable_stat_collection Value : on
>>> GUC : pxf_remote_service_login Value :
>>> GUC : pxf_remote_service_secret Value :
>>> GUC : pxf_service_address Value : localhost:51200
>>> GUC : pxf_stat_max_fragments Value : 100
>>> GUC : random_page_cost Value : 100
>>> GUC : regex_flavor Value : advanced
>>> GUC : runaway_detector_activation_percent Value : 95
>>> GUC : search_path Value : "$user",public
>>> GUC : seg_max_connections Value : 3000
>>> GUC : segment_directory Value :
>>> GUC : seq_page_cost Value : 1
>>> GUC : server_encoding Value : UTF8
>>> GUC : server_ticket_renew_interval Value : 43200000
>>> GUC : server_version Value : 8.2.15
>>> GUC : server_version_num Value : 80215
>>> GUC : shared_buffers Value : 4000
>>> GUC : shared_preload_libraries Value :
>>> GUC : ssl Value : off
>>> GUC : ssl_ciphers Value : ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
>>> GUC : standard_conforming_strings Value : off
>>> GUC : standby_address_host Value : localhost
>>> GUC : statement_timeout Value : 0
>>> GUC : superuser_reserved_connections Value : 3
>>> GUC : tcp_keepalives_count Value : 9
>>> GUC : tcp_keepalives_idle Value : 7200
>>> GUC : tcp_keepalives_interval Value : 75
>>> GUC : temp_buffers Value : 1024
>>> GUC : TimeZone Value : PRC
>>> GUC : timezone_abbreviations Value : Default
>>> GUC : track_activities Value : on
>>> GUC : track_counts Value : off
>>> GUC : transaction_isolation Value : read committed
>>> GUC : transaction_read_only Value : off
>>> GUC : transform_null_equals Value : off
>>> GUC : unix_socket_directory Value :
>>> GUC : unix_socket_group Value :
>>> GUC : unix_socket_permissions Value : 511
>>> GUC : update_process_title Value : on
>>> GUC : vacuum_cost_delay Value : 0
>>> GUC : vacuum_cost_limit Value : 200
>>> GUC : vacuum_cost_page_dirty Value : 20
>>> GUC : vacuum_cost_page_miss Value : 10
>>> GUC : vacuum_freeze_min_age Value : 100000000
>>> GUC : work_mem Value : 51200
>>>
>>>
>>>
>>>
>>>
>>> At 2016-09-21 13:00:41, "Vineet Goel" <vv...@apache.org> wrote:
>>>
>>> Could you please post your SQL DDL statement? How many URLs do you have
>>> in your external table? Also, your HASH dist table - how many buckets are
>>> defined, if any? Are the # of URLs more than the # of buckets or
>>> default_hash_table_bucket_number value? Perhaps you can attach your
>>> hawq-site.xml file as well.
>>>
>>> Also see:
>>> http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html
>>>
>>> Thanks
>>> Vineet
>>>
>>>
>>> On Tue, Sep 20, 2016 at 7:07 PM 来熊 <yi...@163.com> wrote:
>>>
>>>> Hi,all:
>>>>     I am testing hawq 2.0.0 , and I find a problem like this:
>>>>  I load data from an external table (created using "like target_table"
>>>> statement) ,
>>>> if the target table was distributed by some column(s), it raise this
>>>> error:
>>>>  External scan error: There are more external files (URLs) than primary
>>>> segments that can read them (COptTasks.cpp:1756)
>>>> if the target table was distributed randomly, it works well,
>>>> I don't set any parameter special,does anybody know how to resolve this
>>>> problem?
>>>> thanks a lot.
>>>>
>>>


-- 
Thanks

Hubert Zhang

Re: Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)

Posted by Luis Macedo <lm...@pivotal.io>.
Also on your location clause you should not reference the same file more
than one time.

If you want to scale gpfdist process you need to use different range on
each port for a same server. (Not sure if I explain myself :))

If you use one gpfdist per server performance should be fine. One gpfdist
process usually can do 250MB/s if underlying infra allows.

Rgds

--- Sent from my Nexus 5x

Em 21 de set de 2016 12:14 PM, "Vineet Goel" <vv...@apache.org> escreveu:

> Your default_hash_table_bucket_number value is set to 6. Typically, this
> should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments, you
> should set this value to 18. Any time you change this parameter, you should
> redistribute your HASH distributed tables, if you have any (unless the
> table DDL has # of buckets defined, I think).
>
> Increase default_hash_table_bucket_number to 18 and retry the insert.
> Since you have 18 ext table URLs, it should work with the change.
>
> Thanks
> Vineet
>
>
> On Wed, Sep 21, 2016 at 12:26 AM 来熊 <yi...@163.com> wrote:
>
>>
>> My environment is >>>> : 1 master 3 segments
>> SQL >>>>>>:
>> CREATE TABLE call_center (
>>     cc_call_center_sk integer,
>>     cc_call_center_id character varying(16),
>>     cc_rec_start_date date,
>>     cc_rec_end_date date,
>>     cc_closed_date_sk integer,
>>     cc_open_date_sk integer,
>>     cc_name character varying(50),
>>     cc_class character varying(50),
>>     cc_employees integer,
>>     cc_sq_ft integer,
>>     cc_hours character varying(20),
>>     cc_manager character varying(40),
>>     cc_mkt_id integer,
>>     cc_mkt_class character varying(50),
>>     cc_mkt_desc character varying(100),
>>     cc_market_manager character varying(40),
>>     cc_division text,
>>     cc_division_name character varying(50),
>>     cc_company text,
>>     cc_company_name character varying(50),
>>     cc_street_number character varying(10),
>>     cc_street_name character varying(60),
>>     cc_street_type character varying(15),
>>     cc_suite_number character varying(10),
>>     cc_city character varying(60),
>>     cc_county character varying(30),
>>     cc_state text,
>>     cc_zip character varying(10),
>>     cc_country character varying(20),
>>     cc_gmt_offset numeric(5,2),
>>     cc_tax_percentage numeric(5,2)
>> )
>> WITH (appendonly=true, orientation=parquet)
>> DISTRIBUTED BY (cc_call_center_sk);
>>
>> CREATE EXTERNAL TABLE ext_call_center (like call_center)
>> LOCATION (
>> 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
>> FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');
>>
>> insert into call_center select * from ext_call_center;
>>
>> ERROR:  External scan error: There are more external files (URLs) than
>> primary segments that can read them (COptTasks.cpp:1756)
>>
>> hawq config >>>>:
>>
>> GUC : add_missing_from Value : off
>> GUC : application_name Value :
>> GUC : array_nulls Value : on
>> GUC : authentication_timeout Value : 60
>> GUC : backslash_quote Value : safe_encoding
>> GUC : block_size Value : 32768
>> GUC : bonjour_name Value :
>> GUC : check_function_bodies Value : on
>> GUC : client_encoding Value : UTF8
>> GUC : client_min_messages Value : ERROR
>> GUC : cpu_index_tuple_cost Value : 0.005
>> GUC : cpu_operator_cost Value : 0.0025
>> GUC : cpu_tuple_cost Value : 0.01
>> GUC : cursor_tuple_fraction Value : 1
>> GUC : custom_variable_classes Value :
>> GUC : DateStyle Value : ISO, MDY
>> GUC : db_user_namespace Value : off
>> GUC : deadlock_timeout Value : 1000
>> GUC : debug_assertions Value : off
>> GUC : debug_pretty_print Value : off
>> GUC : debug_print_parse Value : off
>> GUC : debug_print_plan Value : off
>> GUC : debug_print_prelim_plan Value : off
>> GUC : debug_print_rewritten Value : off
>> GUC : debug_print_slice_table Value : off
>> GUC : default_hash_table_bucket_number Value : 6
>> GUC : default_statement_mem Value : 128000
>> GUC : default_statistics_target Value : 25
>> GUC : default_tablespace Value :
>> GUC : default_transaction_isolation Value : read committed
>> GUC : default_transaction_read_only Value : off
>> GUC : dfs_url Value : localhost:8020/hawq
>> GUC : dynamic_library_path Value : $libdir
>> GUC : effective_cache_size Value : 16384
>> GUC : enable_bitmapscan Value : on
>> GUC : enable_groupagg Value : on
>> GUC : enable_hashagg Value : on
>> GUC : enable_hashjoin Value : on
>> GUC : enable_indexscan Value : on
>> GUC : enable_mergejoin Value : off
>> GUC : enable_nestloop Value : off
>> GUC : enable_secure_filesystem Value : off
>> GUC : enable_seqscan Value : on
>> GUC : enable_sort Value : on
>> GUC : enable_tidscan Value : on
>> GUC : escape_string_warning Value : on
>> GUC : explain_memory_verbosity Value : suppress
>> GUC : explain_pretty_print Value : on
>> GUC : extra_float_digits Value : 0
>> GUC : filesystem_support_truncate Value : on
>> GUC : from_collapse_limit Value : 20
>> GUC : gp_adjust_selectivity_for_outerjoins Value : on
>> GUC : gp_analyze_relative_error Value : 0.25
>> GUC : gp_autostats_mode Value : NONE
>> GUC : gp_autostats_on_change_threshold Value : 2147483647
>> GUC : gp_backup_directIO Value : off
>> GUC : gp_backup_directIO_read_chunk_mb Value : 20
>> GUC : gp_cached_segworkers_threshold Value : 5
>> GUC : gp_command_count Value : 6
>> GUC : gp_connections_per_thread Value : 64
>> GUC : gp_debug_linger Value : 0
>> GUC : gp_dynamic_partition_pruning Value : on
>> GUC : gp_email_connect_avoid_duration Value : 7200
>> GUC : gp_email_connect_failures Value : 5
>> GUC : gp_email_connect_timeout Value : 15
>> GUC : gp_email_from Value :
>> GUC : gp_email_smtp_password Value :
>> GUC : gp_email_smtp_server Value : localhost:25
>> GUC : gp_email_smtp_userid Value :
>> GUC : gp_email_to Value :
>> GUC : gp_enable_agg_distinct Value : on
>> GUC : gp_enable_agg_distinct_pruning Value : on
>> GUC : gp_enable_direct_dispatch Value : on
>> GUC : gp_enable_fallback_plan Value : on
>> GUC : gp_enable_fast_sri Value : on
>> GUC : gp_enable_gpperfmon Value : off
>> GUC : gp_enable_groupext_distinct_gather Value : on
>> GUC : gp_enable_groupext_distinct_pruning Value : on
>> GUC : gp_enable_multiphase_agg Value : on
>> GUC : gp_enable_predicate_propagation Value : on
>> GUC : gp_enable_preunique Value : on
>> GUC : gp_enable_sequential_window_plans Value : on
>> GUC : gp_enable_sort_distinct Value : on
>> GUC : gp_enable_sort_limit Value : on
>> GUC : gp_external_enable_exec Value : on
>> GUC : gp_external_grant_privileges Value : off
>> GUC : gp_external_max_segs Value : 64
>> GUC : gp_filerep_tcp_keepalives_count Value : 2
>> GUC : gp_filerep_tcp_keepalives_idle Value : 60
>> GUC : gp_filerep_tcp_keepalives_interval Value : 30
>> GUC : gp_force_use_default_temporary_directory Value : off
>> GUC : gp_gpperfmon_send_interval Value : 1
>> GUC : gp_hashjoin_tuples_per_bucket Value : 5
>> GUC : gp_idf_deduplicate Value : auto
>> GUC : gp_interconnect_cache_future_packets Value : on
>> GUC : gp_interconnect_default_rtt Value : 20
>> GUC : gp_interconnect_fc_method Value : LOSS
>> GUC : gp_interconnect_hash_multiplier Value : 2
>> GUC : gp_interconnect_min_retries_before_timeout Value : 100
>> GUC : gp_interconnect_min_rto Value : 20
>> GUC : gp_interconnect_queue_depth Value : 4
>> GUC : gp_interconnect_setup_timeout Value : 7200
>> GUC : gp_interconnect_snd_queue_depth Value : 2
>> GUC : gp_interconnect_timer_checking_period Value : 20
>> GUC : gp_interconnect_timer_period Value : 5
>> GUC : gp_interconnect_transmit_timeout Value : 3600
>> GUC : gp_interconnect_type Value : UDP
>> GUC : gp_log_format Value : csv
>> GUC : gp_max_csv_line_length Value : 1048576
>> GUC : gp_max_databases Value : 16
>> GUC : gp_max_filespaces Value : 8
>> GUC : gp_max_packet_size Value : 8192
>> GUC : gp_max_partition_level Value : 0
>> GUC : gp_max_plan_size Value : 0
>> GUC : gp_max_relations Value : 65536
>> GUC : gp_max_tablespaces Value : 16
>> GUC : gp_motion_cost_per_row Value : 0
>> GUC : gp_num_contents_in_cluster Value : -10000
>> GUC : gp_query_context_mem_limit Value : 102400
>> GUC : gp_reject_percent_threshold Value : 300
>> GUC : gp_reraise_signal Value : on
>> GUC : gp_role Value : utility
>> GUC : gp_safefswritesize Value : 0
>> GUC : gp_segment_connect_timeout Value : 600
>> GUC : gp_segments_for_planner Value : 0
>> GUC : gp_session_id Value : -1
>> GUC : gp_set_proc_affinity Value : off
>> GUC : gp_snmp_community Value : public
>> GUC : gp_snmp_monitor_address Value :
>> GUC : gp_snmp_use_inform_or_trap Value : trap
>> GUC : gp_statistics_pullup_from_child_partition Value : on
>> GUC : gp_statistics_use_fkeys Value : on
>> GUC : gp_subtrans_warn_limit Value : 16777216
>> GUC : gp_temporary_directory_mark_error Value : 0
>> GUC : gp_udp_bufsize_k Value : 0
>> GUC : gp_vmem_protect_segworker_cache_limit Value : 500
>> GUC : gp_workfile_checksumming Value : on
>> GUC : gp_workfile_compress_algorithm Value : none
>> GUC : gp_workfile_limit_per_query Value : 0
>> GUC : gp_workfile_limit_per_segment Value : 0
>> GUC : gpperfmon_port Value : 8888
>> GUC : hawq_global_rm_type Value : yarn
>> GUC : hawq_master_address_host Value : master.bigdata
>> GUC : hawq_master_address_port Value : 6432
>> GUC : hawq_master_temp_directory Value : /tmp
>> GUC : hawq_metadata_cache_block_capacity Value : 2097152
>> GUC : hawq_metadata_cache_check_interval Value : 30
>> GUC : hawq_metadata_cache_flush_ratio Value : 0.85
>> GUC : hawq_metadata_cache_free_block_max_ratio Value : 0.05
>> GUC : hawq_metadata_cache_free_block_normal_ratio Value : 0.2
>> GUC : hawq_metadata_cache_max_hdfs_file_num Value : 524288
>> GUC : hawq_metadata_cache_reduce_ratio Value : 0.7
>> GUC : hawq_metadata_cache_refresh_interval Value : 3600
>> GUC : hawq_metadata_cache_refresh_max_num Value : 1000
>> GUC : hawq_metadata_cache_refresh_timeout Value : 3600
>> GUC : hawq_re_cgroup_hierarchy_name Value : hadoop-yarn
>> GUC : hawq_re_cgroup_mount_point Value : /sys/fs/cgroup
>> GUC : hawq_re_cpu_enable Value : off
>> GUC : hawq_re_cpu_weight Value : 1024
>> GUC : hawq_re_memory_overcommit_max Value : 8192
>> GUC : hawq_re_vcore_pcore_ratio Value : 1
>> GUC : hawq_rm_cluster_report_period Value : 60
>> GUC : hawq_rm_clusterratio_core_to_memorygb_factor Value : 5
>> GUC : hawq_rm_connpool_sameaddr_buffersize Value : 2
>> GUC : hawq_rm_container_batch_limit Value : 1000
>> GUC : hawq_rm_enable_connpool Value : on
>> GUC : hawq_rm_force_alterqueue_cancel_queued_request Value : on
>> GUC : hawq_rm_force_fifo_queuing Value : on
>> GUC : hawq_rm_master_domain_port Value : 5436
>> GUC : hawq_rm_master_port Value : 5437
>> GUC : hawq_rm_memory_limit_perseg Value : 480GB
>> GUC : hawq_rm_min_resource_perseg Value : 2
>> GUC : hawq_rm_nocluster_timeout Value : 60
>> GUC : hawq_rm_nresqueue_limit Value : 128
>> GUC : hawq_rm_nslice_perseg_limit Value : 5000
>> GUC : hawq_rm_nvcore_limit_perseg Value : 16
>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_limit Value : 512
>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_perseg_limit Value : 8
>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_limit Value : 256
>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_perseg_limit Value : 4
>> GUC : hawq_rm_nvseg_for_copy_from_perquery Value : 6
>> GUC : hawq_rm_nvseg_perquery_limit Value : 512
>> GUC : hawq_rm_nvseg_perquery_perseg_limit Value : 6
>> GUC : hawq_rm_nvseg_variance_amon_seg_limit Value : 1
>> GUC : hawq_rm_nvseg_variance_amon_seg_respool_limit Value : 2
>> GUC : hawq_rm_regularize_io_factor Value : 1
>> GUC : hawq_rm_regularize_io_max Value : 1.37439e+11
>> GUC : hawq_rm_regularize_nvseg_factor Value : 1
>> GUC : hawq_rm_regularize_nvseg_max Value : 300
>> GUC : hawq_rm_regularize_usage_factor Value : 1
>> GUC : hawq_rm_rejectrequest_nseg_limit Value : 0.25
>> GUC : hawq_rm_request_timeoutcheck_interval Value : 1
>> GUC : hawq_rm_resource_allocation_timeout Value : 600
>> GUC : hawq_rm_resource_idle_timeout Value : 300
>> GUC : hawq_rm_respool_test_file Value :
>> GUC : hawq_rm_return_percent_on_overcommit Value : 10
>> GUC : hawq_rm_segment_config_refresh_interval Value : 30
>> GUC : hawq_rm_segment_heartbeat_interval Value : 30
>> GUC : hawq_rm_segment_heartbeat_timeout Value : 300
>> GUC : hawq_rm_segment_port Value : 5438
>> GUC : hawq_rm_segment_tmpdir_detect_interval Value : 300
>> GUC : hawq_rm_session_lease_heartbeat_enable Value : on
>> GUC : hawq_rm_session_lease_heartbeat_interval Value : 10
>> GUC : hawq_rm_session_lease_timeout Value : 180
>> GUC : hawq_rm_stmt_nvseg Value : 0
>> GUC : hawq_rm_stmt_vseg_memory Value : 128mb
>> GUC : hawq_rm_tolerate_nseg_limit Value : 0.25
>> GUC : hawq_rm_yarn_address Value : worker1.bigdata:8050
>> GUC : hawq_rm_yarn_app_name Value : hawq
>> GUC : hawq_rm_yarn_queue_name Value : default
>> GUC : hawq_rm_yarn_scheduler_address Value : worker1.bigdata:8030
>> GUC : hawq_segment_address_port Value : 40000
>> GUC : hawq_segment_history_keep_period Value : 365
>> GUC : hawq_segment_temp_directory Value : /tmp
>> GUC : integer_datetimes Value : on
>> GUC : IntervalStyle Value : postgres
>> GUC : join_collapse_limit Value : 20
>> GUC : krb5_ccname Value : /tmp/postgres.ccname
>> GUC : krb_caseins_users Value : off
>> GUC : krb_server_keyfile Value : FILE:/data/pulse2-agent/
>> agents/agent1/work/HAWQ-main-opt/rhel5_x86_64/src/hawq-db-
>> dist/etc/krb5.keytab
>> GUC : krb_srvname Value : postgres
>> GUC : lc_collate Value : C
>> GUC : lc_ctype Value : C
>> GUC : lc_messages Value : en_US.utf8
>> GUC : lc_monetary Value : en_US.utf8
>> GUC : lc_numeric Value : en_US.utf8
>> GUC : lc_time Value : en_US.utf8
>> GUC : listen_addresses Value : *
>> GUC : local_preload_libraries Value :
>> GUC : log_autostats Value : off
>> GUC : log_connections Value : off
>> GUC : log_disconnections Value : off
>> GUC : log_dispatch_stats Value : off
>> GUC : log_duration Value : off
>> GUC : log_error_verbosity Value : default
>> GUC : log_executor_stats Value : off
>> GUC : log_hostname Value : off
>> GUC : log_min_duration_statement Value : -1
>> GUC : log_min_error_statement Value : error
>> GUC : log_min_messages Value : warning
>> GUC : log_parser_stats Value : off
>> GUC : log_planner_stats Value : off
>> GUC : log_rotation_age Value : 1440
>> GUC : log_rotation_size Value : 0
>> GUC : log_statement Value : none
>> GUC : log_statement_stats Value : off
>> GUC : log_timezone Value : PRC
>> GUC : log_truncate_on_rotation Value : off
>> GUC : maintenance_work_mem Value : 65536
>> GUC : master_directory Value :
>> GUC : max_appendonly_segfiles Value : 262144
>> GUC : max_appendonly_tables Value : 10000
>> GUC : max_connections Value : 1280
>> GUC : max_files_per_process Value : 150
>> GUC : max_fsm_pages Value : 200000
>> GUC : max_fsm_relations Value : 1000
>> GUC : max_function_args Value : 100
>> GUC : max_identifier_length Value : 63
>> GUC : max_index_keys Value : 32
>> GUC : max_locks_per_transaction Value : 128
>> GUC : max_prepared_transactions Value : 250
>> GUC : max_stack_depth Value : 2048
>> GUC : max_work_mem Value : 1024000
>> GUC : metadata_cache_testfile Value :
>> GUC : optimizer Value : on
>> GUC : optimizer_analyze_root_partition Value : on
>> GUC : optimizer_minidump Value : onerror
>> GUC : optimizer_parts_to_force_sort_on_insert Value : 160
>> GUC : password_encryption Value : on
>> GUC : password_hash_algorithm Value : MD5
>> GUC : pljava_classpath Value :
>> GUC : pljava_release_lingering_savepoints Value : off
>> GUC : pljava_statement_cache_size Value : 0
>> GUC : pljava_vmoptions Value :
>> GUC : port Value : 6432
>> GUC : pxf_enable_filter_pushdown Value : on
>> GUC : pxf_enable_locality_optimizations Value : on
>> GUC : pxf_enable_stat_collection Value : on
>> GUC : pxf_remote_service_login Value :
>> GUC : pxf_remote_service_secret Value :
>> GUC : pxf_service_address Value : localhost:51200
>> GUC : pxf_stat_max_fragments Value : 100
>> GUC : random_page_cost Value : 100
>> GUC : regex_flavor Value : advanced
>> GUC : runaway_detector_activation_percent Value : 95
>> GUC : search_path Value : "$user",public
>> GUC : seg_max_connections Value : 3000
>> GUC : segment_directory Value :
>> GUC : seq_page_cost Value : 1
>> GUC : server_encoding Value : UTF8
>> GUC : server_ticket_renew_interval Value : 43200000
>> GUC : server_version Value : 8.2.15
>> GUC : server_version_num Value : 80215
>> GUC : shared_buffers Value : 4000
>> GUC : shared_preload_libraries Value :
>> GUC : ssl Value : off
>> GUC : ssl_ciphers Value : ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
>> GUC : standard_conforming_strings Value : off
>> GUC : standby_address_host Value : localhost
>> GUC : statement_timeout Value : 0
>> GUC : superuser_reserved_connections Value : 3
>> GUC : tcp_keepalives_count Value : 9
>> GUC : tcp_keepalives_idle Value : 7200
>> GUC : tcp_keepalives_interval Value : 75
>> GUC : temp_buffers Value : 1024
>> GUC : TimeZone Value : PRC
>> GUC : timezone_abbreviations Value : Default
>> GUC : track_activities Value : on
>> GUC : track_counts Value : off
>> GUC : transaction_isolation Value : read committed
>> GUC : transaction_read_only Value : off
>> GUC : transform_null_equals Value : off
>> GUC : unix_socket_directory Value :
>> GUC : unix_socket_group Value :
>> GUC : unix_socket_permissions Value : 511
>> GUC : update_process_title Value : on
>> GUC : vacuum_cost_delay Value : 0
>> GUC : vacuum_cost_limit Value : 200
>> GUC : vacuum_cost_page_dirty Value : 20
>> GUC : vacuum_cost_page_miss Value : 10
>> GUC : vacuum_freeze_min_age Value : 100000000
>> GUC : work_mem Value : 51200
>>
>>
>>
>>
>>
>> At 2016-09-21 13:00:41, "Vineet Goel" <vv...@apache.org> wrote:
>>
>> Could you please post your SQL DDL statement? How many URLs do you have
>> in your external table? Also, your HASH dist table - how many buckets are
>> defined, if any? Are the # of URLs more than the # of buckets or
>> default_hash_table_bucket_number value? Perhaps you can attach your
>> hawq-site.xml file as well.
>>
>> Also see:
>> http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html
>>
>> Thanks
>> Vineet
>>
>>
>> On Tue, Sep 20, 2016 at 7:07 PM 来熊 <yi...@163.com> wrote:
>>
>>> Hi,all:
>>>     I am testing hawq 2.0.0 , and I find a problem like this:
>>>  I load data from an external table (created using "like target_table"
>>> statement) ,
>>> if the target table was distributed by some column(s), it raise this
>>> error:
>>>  External scan error: There are more external files (URLs) than primary
>>> segments that can read them (COptTasks.cpp:1756)
>>> if the target table was distributed randomly, it works well,
>>> I don't set any parameter special,does anybody know how to resolve this
>>> problem?
>>> thanks a lot.
>>>
>>

Re: Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)

Posted by Luis Macedo <lm...@pivotal.io>.
Also on your location clause you should not reference the same file more
than one time.

If you want to scale gpfdist process you need to use different range on
each port for a same server. (Not sure if I explain myself :))

If you use one gpfdist per server performance should be fine. One gpfdist
process usually can do 250MB/s if underlying infra allows.

Rgds

--- Sent from my Nexus 5x

Em 21 de set de 2016 12:14 PM, "Vineet Goel" <vv...@apache.org> escreveu:

> Your default_hash_table_bucket_number value is set to 6. Typically, this
> should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments, you
> should set this value to 18. Any time you change this parameter, you should
> redistribute your HASH distributed tables, if you have any (unless the
> table DDL has # of buckets defined, I think).
>
> Increase default_hash_table_bucket_number to 18 and retry the insert.
> Since you have 18 ext table URLs, it should work with the change.
>
> Thanks
> Vineet
>
>
> On Wed, Sep 21, 2016 at 12:26 AM 来熊 <yi...@163.com> wrote:
>
>>
>> My environment is >>>> : 1 master 3 segments
>> SQL >>>>>>:
>> CREATE TABLE call_center (
>>     cc_call_center_sk integer,
>>     cc_call_center_id character varying(16),
>>     cc_rec_start_date date,
>>     cc_rec_end_date date,
>>     cc_closed_date_sk integer,
>>     cc_open_date_sk integer,
>>     cc_name character varying(50),
>>     cc_class character varying(50),
>>     cc_employees integer,
>>     cc_sq_ft integer,
>>     cc_hours character varying(20),
>>     cc_manager character varying(40),
>>     cc_mkt_id integer,
>>     cc_mkt_class character varying(50),
>>     cc_mkt_desc character varying(100),
>>     cc_market_manager character varying(40),
>>     cc_division text,
>>     cc_division_name character varying(50),
>>     cc_company text,
>>     cc_company_name character varying(50),
>>     cc_street_number character varying(10),
>>     cc_street_name character varying(60),
>>     cc_street_type character varying(15),
>>     cc_suite_number character varying(10),
>>     cc_city character varying(60),
>>     cc_county character varying(30),
>>     cc_state text,
>>     cc_zip character varying(10),
>>     cc_country character varying(20),
>>     cc_gmt_offset numeric(5,2),
>>     cc_tax_percentage numeric(5,2)
>> )
>> WITH (appendonly=true, orientation=parquet)
>> DISTRIBUTED BY (cc_call_center_sk);
>>
>> CREATE EXTERNAL TABLE ext_call_center (like call_center)
>> LOCATION (
>> 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
>> FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');
>>
>> insert into call_center select * from ext_call_center;
>>
>> ERROR:  External scan error: There are more external files (URLs) than
>> primary segments that can read them (COptTasks.cpp:1756)
>>
>> hawq config >>>>:
>>
>> GUC : add_missing_from Value : off
>> GUC : application_name Value :
>> GUC : array_nulls Value : on
>> GUC : authentication_timeout Value : 60
>> GUC : backslash_quote Value : safe_encoding
>> GUC : block_size Value : 32768
>> GUC : bonjour_name Value :
>> GUC : check_function_bodies Value : on
>> GUC : client_encoding Value : UTF8
>> GUC : client_min_messages Value : ERROR
>> GUC : cpu_index_tuple_cost Value : 0.005
>> GUC : cpu_operator_cost Value : 0.0025
>> GUC : cpu_tuple_cost Value : 0.01
>> GUC : cursor_tuple_fraction Value : 1
>> GUC : custom_variable_classes Value :
>> GUC : DateStyle Value : ISO, MDY
>> GUC : db_user_namespace Value : off
>> GUC : deadlock_timeout Value : 1000
>> GUC : debug_assertions Value : off
>> GUC : debug_pretty_print Value : off
>> GUC : debug_print_parse Value : off
>> GUC : debug_print_plan Value : off
>> GUC : debug_print_prelim_plan Value : off
>> GUC : debug_print_rewritten Value : off
>> GUC : debug_print_slice_table Value : off
>> GUC : default_hash_table_bucket_number Value : 6
>> GUC : default_statement_mem Value : 128000
>> GUC : default_statistics_target Value : 25
>> GUC : default_tablespace Value :
>> GUC : default_transaction_isolation Value : read committed
>> GUC : default_transaction_read_only Value : off
>> GUC : dfs_url Value : localhost:8020/hawq
>> GUC : dynamic_library_path Value : $libdir
>> GUC : effective_cache_size Value : 16384
>> GUC : enable_bitmapscan Value : on
>> GUC : enable_groupagg Value : on
>> GUC : enable_hashagg Value : on
>> GUC : enable_hashjoin Value : on
>> GUC : enable_indexscan Value : on
>> GUC : enable_mergejoin Value : off
>> GUC : enable_nestloop Value : off
>> GUC : enable_secure_filesystem Value : off
>> GUC : enable_seqscan Value : on
>> GUC : enable_sort Value : on
>> GUC : enable_tidscan Value : on
>> GUC : escape_string_warning Value : on
>> GUC : explain_memory_verbosity Value : suppress
>> GUC : explain_pretty_print Value : on
>> GUC : extra_float_digits Value : 0
>> GUC : filesystem_support_truncate Value : on
>> GUC : from_collapse_limit Value : 20
>> GUC : gp_adjust_selectivity_for_outerjoins Value : on
>> GUC : gp_analyze_relative_error Value : 0.25
>> GUC : gp_autostats_mode Value : NONE
>> GUC : gp_autostats_on_change_threshold Value : 2147483647
>> GUC : gp_backup_directIO Value : off
>> GUC : gp_backup_directIO_read_chunk_mb Value : 20
>> GUC : gp_cached_segworkers_threshold Value : 5
>> GUC : gp_command_count Value : 6
>> GUC : gp_connections_per_thread Value : 64
>> GUC : gp_debug_linger Value : 0
>> GUC : gp_dynamic_partition_pruning Value : on
>> GUC : gp_email_connect_avoid_duration Value : 7200
>> GUC : gp_email_connect_failures Value : 5
>> GUC : gp_email_connect_timeout Value : 15
>> GUC : gp_email_from Value :
>> GUC : gp_email_smtp_password Value :
>> GUC : gp_email_smtp_server Value : localhost:25
>> GUC : gp_email_smtp_userid Value :
>> GUC : gp_email_to Value :
>> GUC : gp_enable_agg_distinct Value : on
>> GUC : gp_enable_agg_distinct_pruning Value : on
>> GUC : gp_enable_direct_dispatch Value : on
>> GUC : gp_enable_fallback_plan Value : on
>> GUC : gp_enable_fast_sri Value : on
>> GUC : gp_enable_gpperfmon Value : off
>> GUC : gp_enable_groupext_distinct_gather Value : on
>> GUC : gp_enable_groupext_distinct_pruning Value : on
>> GUC : gp_enable_multiphase_agg Value : on
>> GUC : gp_enable_predicate_propagation Value : on
>> GUC : gp_enable_preunique Value : on
>> GUC : gp_enable_sequential_window_plans Value : on
>> GUC : gp_enable_sort_distinct Value : on
>> GUC : gp_enable_sort_limit Value : on
>> GUC : gp_external_enable_exec Value : on
>> GUC : gp_external_grant_privileges Value : off
>> GUC : gp_external_max_segs Value : 64
>> GUC : gp_filerep_tcp_keepalives_count Value : 2
>> GUC : gp_filerep_tcp_keepalives_idle Value : 60
>> GUC : gp_filerep_tcp_keepalives_interval Value : 30
>> GUC : gp_force_use_default_temporary_directory Value : off
>> GUC : gp_gpperfmon_send_interval Value : 1
>> GUC : gp_hashjoin_tuples_per_bucket Value : 5
>> GUC : gp_idf_deduplicate Value : auto
>> GUC : gp_interconnect_cache_future_packets Value : on
>> GUC : gp_interconnect_default_rtt Value : 20
>> GUC : gp_interconnect_fc_method Value : LOSS
>> GUC : gp_interconnect_hash_multiplier Value : 2
>> GUC : gp_interconnect_min_retries_before_timeout Value : 100
>> GUC : gp_interconnect_min_rto Value : 20
>> GUC : gp_interconnect_queue_depth Value : 4
>> GUC : gp_interconnect_setup_timeout Value : 7200
>> GUC : gp_interconnect_snd_queue_depth Value : 2
>> GUC : gp_interconnect_timer_checking_period Value : 20
>> GUC : gp_interconnect_timer_period Value : 5
>> GUC : gp_interconnect_transmit_timeout Value : 3600
>> GUC : gp_interconnect_type Value : UDP
>> GUC : gp_log_format Value : csv
>> GUC : gp_max_csv_line_length Value : 1048576
>> GUC : gp_max_databases Value : 16
>> GUC : gp_max_filespaces Value : 8
>> GUC : gp_max_packet_size Value : 8192
>> GUC : gp_max_partition_level Value : 0
>> GUC : gp_max_plan_size Value : 0
>> GUC : gp_max_relations Value : 65536
>> GUC : gp_max_tablespaces Value : 16
>> GUC : gp_motion_cost_per_row Value : 0
>> GUC : gp_num_contents_in_cluster Value : -10000
>> GUC : gp_query_context_mem_limit Value : 102400
>> GUC : gp_reject_percent_threshold Value : 300
>> GUC : gp_reraise_signal Value : on
>> GUC : gp_role Value : utility
>> GUC : gp_safefswritesize Value : 0
>> GUC : gp_segment_connect_timeout Value : 600
>> GUC : gp_segments_for_planner Value : 0
>> GUC : gp_session_id Value : -1
>> GUC : gp_set_proc_affinity Value : off
>> GUC : gp_snmp_community Value : public
>> GUC : gp_snmp_monitor_address Value :
>> GUC : gp_snmp_use_inform_or_trap Value : trap
>> GUC : gp_statistics_pullup_from_child_partition Value : on
>> GUC : gp_statistics_use_fkeys Value : on
>> GUC : gp_subtrans_warn_limit Value : 16777216
>> GUC : gp_temporary_directory_mark_error Value : 0
>> GUC : gp_udp_bufsize_k Value : 0
>> GUC : gp_vmem_protect_segworker_cache_limit Value : 500
>> GUC : gp_workfile_checksumming Value : on
>> GUC : gp_workfile_compress_algorithm Value : none
>> GUC : gp_workfile_limit_per_query Value : 0
>> GUC : gp_workfile_limit_per_segment Value : 0
>> GUC : gpperfmon_port Value : 8888
>> GUC : hawq_global_rm_type Value : yarn
>> GUC : hawq_master_address_host Value : master.bigdata
>> GUC : hawq_master_address_port Value : 6432
>> GUC : hawq_master_temp_directory Value : /tmp
>> GUC : hawq_metadata_cache_block_capacity Value : 2097152
>> GUC : hawq_metadata_cache_check_interval Value : 30
>> GUC : hawq_metadata_cache_flush_ratio Value : 0.85
>> GUC : hawq_metadata_cache_free_block_max_ratio Value : 0.05
>> GUC : hawq_metadata_cache_free_block_normal_ratio Value : 0.2
>> GUC : hawq_metadata_cache_max_hdfs_file_num Value : 524288
>> GUC : hawq_metadata_cache_reduce_ratio Value : 0.7
>> GUC : hawq_metadata_cache_refresh_interval Value : 3600
>> GUC : hawq_metadata_cache_refresh_max_num Value : 1000
>> GUC : hawq_metadata_cache_refresh_timeout Value : 3600
>> GUC : hawq_re_cgroup_hierarchy_name Value : hadoop-yarn
>> GUC : hawq_re_cgroup_mount_point Value : /sys/fs/cgroup
>> GUC : hawq_re_cpu_enable Value : off
>> GUC : hawq_re_cpu_weight Value : 1024
>> GUC : hawq_re_memory_overcommit_max Value : 8192
>> GUC : hawq_re_vcore_pcore_ratio Value : 1
>> GUC : hawq_rm_cluster_report_period Value : 60
>> GUC : hawq_rm_clusterratio_core_to_memorygb_factor Value : 5
>> GUC : hawq_rm_connpool_sameaddr_buffersize Value : 2
>> GUC : hawq_rm_container_batch_limit Value : 1000
>> GUC : hawq_rm_enable_connpool Value : on
>> GUC : hawq_rm_force_alterqueue_cancel_queued_request Value : on
>> GUC : hawq_rm_force_fifo_queuing Value : on
>> GUC : hawq_rm_master_domain_port Value : 5436
>> GUC : hawq_rm_master_port Value : 5437
>> GUC : hawq_rm_memory_limit_perseg Value : 480GB
>> GUC : hawq_rm_min_resource_perseg Value : 2
>> GUC : hawq_rm_nocluster_timeout Value : 60
>> GUC : hawq_rm_nresqueue_limit Value : 128
>> GUC : hawq_rm_nslice_perseg_limit Value : 5000
>> GUC : hawq_rm_nvcore_limit_perseg Value : 16
>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_limit Value : 512
>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_perseg_limit Value : 8
>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_limit Value : 256
>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_perseg_limit Value : 4
>> GUC : hawq_rm_nvseg_for_copy_from_perquery Value : 6
>> GUC : hawq_rm_nvseg_perquery_limit Value : 512
>> GUC : hawq_rm_nvseg_perquery_perseg_limit Value : 6
>> GUC : hawq_rm_nvseg_variance_amon_seg_limit Value : 1
>> GUC : hawq_rm_nvseg_variance_amon_seg_respool_limit Value : 2
>> GUC : hawq_rm_regularize_io_factor Value : 1
>> GUC : hawq_rm_regularize_io_max Value : 1.37439e+11
>> GUC : hawq_rm_regularize_nvseg_factor Value : 1
>> GUC : hawq_rm_regularize_nvseg_max Value : 300
>> GUC : hawq_rm_regularize_usage_factor Value : 1
>> GUC : hawq_rm_rejectrequest_nseg_limit Value : 0.25
>> GUC : hawq_rm_request_timeoutcheck_interval Value : 1
>> GUC : hawq_rm_resource_allocation_timeout Value : 600
>> GUC : hawq_rm_resource_idle_timeout Value : 300
>> GUC : hawq_rm_respool_test_file Value :
>> GUC : hawq_rm_return_percent_on_overcommit Value : 10
>> GUC : hawq_rm_segment_config_refresh_interval Value : 30
>> GUC : hawq_rm_segment_heartbeat_interval Value : 30
>> GUC : hawq_rm_segment_heartbeat_timeout Value : 300
>> GUC : hawq_rm_segment_port Value : 5438
>> GUC : hawq_rm_segment_tmpdir_detect_interval Value : 300
>> GUC : hawq_rm_session_lease_heartbeat_enable Value : on
>> GUC : hawq_rm_session_lease_heartbeat_interval Value : 10
>> GUC : hawq_rm_session_lease_timeout Value : 180
>> GUC : hawq_rm_stmt_nvseg Value : 0
>> GUC : hawq_rm_stmt_vseg_memory Value : 128mb
>> GUC : hawq_rm_tolerate_nseg_limit Value : 0.25
>> GUC : hawq_rm_yarn_address Value : worker1.bigdata:8050
>> GUC : hawq_rm_yarn_app_name Value : hawq
>> GUC : hawq_rm_yarn_queue_name Value : default
>> GUC : hawq_rm_yarn_scheduler_address Value : worker1.bigdata:8030
>> GUC : hawq_segment_address_port Value : 40000
>> GUC : hawq_segment_history_keep_period Value : 365
>> GUC : hawq_segment_temp_directory Value : /tmp
>> GUC : integer_datetimes Value : on
>> GUC : IntervalStyle Value : postgres
>> GUC : join_collapse_limit Value : 20
>> GUC : krb5_ccname Value : /tmp/postgres.ccname
>> GUC : krb_caseins_users Value : off
>> GUC : krb_server_keyfile Value : FILE:/data/pulse2-agent/
>> agents/agent1/work/HAWQ-main-opt/rhel5_x86_64/src/hawq-db-
>> dist/etc/krb5.keytab
>> GUC : krb_srvname Value : postgres
>> GUC : lc_collate Value : C
>> GUC : lc_ctype Value : C
>> GUC : lc_messages Value : en_US.utf8
>> GUC : lc_monetary Value : en_US.utf8
>> GUC : lc_numeric Value : en_US.utf8
>> GUC : lc_time Value : en_US.utf8
>> GUC : listen_addresses Value : *
>> GUC : local_preload_libraries Value :
>> GUC : log_autostats Value : off
>> GUC : log_connections Value : off
>> GUC : log_disconnections Value : off
>> GUC : log_dispatch_stats Value : off
>> GUC : log_duration Value : off
>> GUC : log_error_verbosity Value : default
>> GUC : log_executor_stats Value : off
>> GUC : log_hostname Value : off
>> GUC : log_min_duration_statement Value : -1
>> GUC : log_min_error_statement Value : error
>> GUC : log_min_messages Value : warning
>> GUC : log_parser_stats Value : off
>> GUC : log_planner_stats Value : off
>> GUC : log_rotation_age Value : 1440
>> GUC : log_rotation_size Value : 0
>> GUC : log_statement Value : none
>> GUC : log_statement_stats Value : off
>> GUC : log_timezone Value : PRC
>> GUC : log_truncate_on_rotation Value : off
>> GUC : maintenance_work_mem Value : 65536
>> GUC : master_directory Value :
>> GUC : max_appendonly_segfiles Value : 262144
>> GUC : max_appendonly_tables Value : 10000
>> GUC : max_connections Value : 1280
>> GUC : max_files_per_process Value : 150
>> GUC : max_fsm_pages Value : 200000
>> GUC : max_fsm_relations Value : 1000
>> GUC : max_function_args Value : 100
>> GUC : max_identifier_length Value : 63
>> GUC : max_index_keys Value : 32
>> GUC : max_locks_per_transaction Value : 128
>> GUC : max_prepared_transactions Value : 250
>> GUC : max_stack_depth Value : 2048
>> GUC : max_work_mem Value : 1024000
>> GUC : metadata_cache_testfile Value :
>> GUC : optimizer Value : on
>> GUC : optimizer_analyze_root_partition Value : on
>> GUC : optimizer_minidump Value : onerror
>> GUC : optimizer_parts_to_force_sort_on_insert Value : 160
>> GUC : password_encryption Value : on
>> GUC : password_hash_algorithm Value : MD5
>> GUC : pljava_classpath Value :
>> GUC : pljava_release_lingering_savepoints Value : off
>> GUC : pljava_statement_cache_size Value : 0
>> GUC : pljava_vmoptions Value :
>> GUC : port Value : 6432
>> GUC : pxf_enable_filter_pushdown Value : on
>> GUC : pxf_enable_locality_optimizations Value : on
>> GUC : pxf_enable_stat_collection Value : on
>> GUC : pxf_remote_service_login Value :
>> GUC : pxf_remote_service_secret Value :
>> GUC : pxf_service_address Value : localhost:51200
>> GUC : pxf_stat_max_fragments Value : 100
>> GUC : random_page_cost Value : 100
>> GUC : regex_flavor Value : advanced
>> GUC : runaway_detector_activation_percent Value : 95
>> GUC : search_path Value : "$user",public
>> GUC : seg_max_connections Value : 3000
>> GUC : segment_directory Value :
>> GUC : seq_page_cost Value : 1
>> GUC : server_encoding Value : UTF8
>> GUC : server_ticket_renew_interval Value : 43200000
>> GUC : server_version Value : 8.2.15
>> GUC : server_version_num Value : 80215
>> GUC : shared_buffers Value : 4000
>> GUC : shared_preload_libraries Value :
>> GUC : ssl Value : off
>> GUC : ssl_ciphers Value : ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
>> GUC : standard_conforming_strings Value : off
>> GUC : standby_address_host Value : localhost
>> GUC : statement_timeout Value : 0
>> GUC : superuser_reserved_connections Value : 3
>> GUC : tcp_keepalives_count Value : 9
>> GUC : tcp_keepalives_idle Value : 7200
>> GUC : tcp_keepalives_interval Value : 75
>> GUC : temp_buffers Value : 1024
>> GUC : TimeZone Value : PRC
>> GUC : timezone_abbreviations Value : Default
>> GUC : track_activities Value : on
>> GUC : track_counts Value : off
>> GUC : transaction_isolation Value : read committed
>> GUC : transaction_read_only Value : off
>> GUC : transform_null_equals Value : off
>> GUC : unix_socket_directory Value :
>> GUC : unix_socket_group Value :
>> GUC : unix_socket_permissions Value : 511
>> GUC : update_process_title Value : on
>> GUC : vacuum_cost_delay Value : 0
>> GUC : vacuum_cost_limit Value : 200
>> GUC : vacuum_cost_page_dirty Value : 20
>> GUC : vacuum_cost_page_miss Value : 10
>> GUC : vacuum_freeze_min_age Value : 100000000
>> GUC : work_mem Value : 51200
>>
>>
>>
>>
>>
>> At 2016-09-21 13:00:41, "Vineet Goel" <vv...@apache.org> wrote:
>>
>> Could you please post your SQL DDL statement? How many URLs do you have
>> in your external table? Also, your HASH dist table - how many buckets are
>> defined, if any? Are the # of URLs more than the # of buckets or
>> default_hash_table_bucket_number value? Perhaps you can attach your
>> hawq-site.xml file as well.
>>
>> Also see:
>> http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html
>>
>> Thanks
>> Vineet
>>
>>
>> On Tue, Sep 20, 2016 at 7:07 PM 来熊 <yi...@163.com> wrote:
>>
>>> Hi,all:
>>>     I am testing hawq 2.0.0 , and I find a problem like this:
>>>  I load data from an external table (created using "like target_table"
>>> statement) ,
>>> if the target table was distributed by some column(s), it raise this
>>> error:
>>>  External scan error: There are more external files (URLs) than primary
>>> segments that can read them (COptTasks.cpp:1756)
>>> if the target table was distributed randomly, it works well,
>>> I don't set any parameter special,does anybody know how to resolve this
>>> problem?
>>> thanks a lot.
>>>
>>

Re: Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)

Posted by Vineet Goel <vv...@apache.org>.
Your default_hash_table_bucket_number value is set to 6. Typically, this
should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments, you
should set this value to 18. Any time you change this parameter, you should
redistribute your HASH distributed tables, if you have any (unless the
table DDL has # of buckets defined, I think).

Increase default_hash_table_bucket_number to 18 and retry the insert. Since
you have 18 ext table URLs, it should work with the change.

Thanks
Vineet


On Wed, Sep 21, 2016 at 12:26 AM 来熊 <yi...@163.com> wrote:

>
> My environment is >>>> : 1 master 3 segments
> SQL >>>>>>:
> CREATE TABLE call_center (
>     cc_call_center_sk integer,
>     cc_call_center_id character varying(16),
>     cc_rec_start_date date,
>     cc_rec_end_date date,
>     cc_closed_date_sk integer,
>     cc_open_date_sk integer,
>     cc_name character varying(50),
>     cc_class character varying(50),
>     cc_employees integer,
>     cc_sq_ft integer,
>     cc_hours character varying(20),
>     cc_manager character varying(40),
>     cc_mkt_id integer,
>     cc_mkt_class character varying(50),
>     cc_mkt_desc character varying(100),
>     cc_market_manager character varying(40),
>     cc_division text,
>     cc_division_name character varying(50),
>     cc_company text,
>     cc_company_name character varying(50),
>     cc_street_number character varying(10),
>     cc_street_name character varying(60),
>     cc_street_type character varying(15),
>     cc_suite_number character varying(10),
>     cc_city character varying(60),
>     cc_county character varying(30),
>     cc_state text,
>     cc_zip character varying(10),
>     cc_country character varying(20),
>     cc_gmt_offset numeric(5,2),
>     cc_tax_percentage numeric(5,2)
> )
> WITH (appendonly=true, orientation=parquet)
> DISTRIBUTED BY (cc_call_center_sk);
>
> CREATE EXTERNAL TABLE ext_call_center (like call_center)
> LOCATION (
> 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
> FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');
>
> insert into call_center select * from ext_call_center;
>
> ERROR:  External scan error: There are more external files (URLs) than
> primary segments that can read them (COptTasks.cpp:1756)
>
> hawq config >>>>:
>
> GUC : add_missing_from Value : off
> GUC : application_name Value :
> GUC : array_nulls Value : on
> GUC : authentication_timeout Value : 60
> GUC : backslash_quote Value : safe_encoding
> GUC : block_size Value : 32768
> GUC : bonjour_name Value :
> GUC : check_function_bodies Value : on
> GUC : client_encoding Value : UTF8
> GUC : client_min_messages Value : ERROR
> GUC : cpu_index_tuple_cost Value : 0.005
> GUC : cpu_operator_cost Value : 0.0025
> GUC : cpu_tuple_cost Value : 0.01
> GUC : cursor_tuple_fraction Value : 1
> GUC : custom_variable_classes Value :
> GUC : DateStyle Value : ISO, MDY
> GUC : db_user_namespace Value : off
> GUC : deadlock_timeout Value : 1000
> GUC : debug_assertions Value : off
> GUC : debug_pretty_print Value : off
> GUC : debug_print_parse Value : off
> GUC : debug_print_plan Value : off
> GUC : debug_print_prelim_plan Value : off
> GUC : debug_print_rewritten Value : off
> GUC : debug_print_slice_table Value : off
> GUC : default_hash_table_bucket_number Value : 6
> GUC : default_statement_mem Value : 128000
> GUC : default_statistics_target Value : 25
> GUC : default_tablespace Value :
> GUC : default_transaction_isolation Value : read committed
> GUC : default_transaction_read_only Value : off
> GUC : dfs_url Value : localhost:8020/hawq
> GUC : dynamic_library_path Value : $libdir
> GUC : effective_cache_size Value : 16384
> GUC : enable_bitmapscan Value : on
> GUC : enable_groupagg Value : on
> GUC : enable_hashagg Value : on
> GUC : enable_hashjoin Value : on
> GUC : enable_indexscan Value : on
> GUC : enable_mergejoin Value : off
> GUC : enable_nestloop Value : off
> GUC : enable_secure_filesystem Value : off
> GUC : enable_seqscan Value : on
> GUC : enable_sort Value : on
> GUC : enable_tidscan Value : on
> GUC : escape_string_warning Value : on
> GUC : explain_memory_verbosity Value : suppress
> GUC : explain_pretty_print Value : on
> GUC : extra_float_digits Value : 0
> GUC : filesystem_support_truncate Value : on
> GUC : from_collapse_limit Value : 20
> GUC : gp_adjust_selectivity_for_outerjoins Value : on
> GUC : gp_analyze_relative_error Value : 0.25
> GUC : gp_autostats_mode Value : NONE
> GUC : gp_autostats_on_change_threshold Value : 2147483647
> GUC : gp_backup_directIO Value : off
> GUC : gp_backup_directIO_read_chunk_mb Value : 20
> GUC : gp_cached_segworkers_threshold Value : 5
> GUC : gp_command_count Value : 6
> GUC : gp_connections_per_thread Value : 64
> GUC : gp_debug_linger Value : 0
> GUC : gp_dynamic_partition_pruning Value : on
> GUC : gp_email_connect_avoid_duration Value : 7200
> GUC : gp_email_connect_failures Value : 5
> GUC : gp_email_connect_timeout Value : 15
> GUC : gp_email_from Value :
> GUC : gp_email_smtp_password Value :
> GUC : gp_email_smtp_server Value : localhost:25
> GUC : gp_email_smtp_userid Value :
> GUC : gp_email_to Value :
> GUC : gp_enable_agg_distinct Value : on
> GUC : gp_enable_agg_distinct_pruning Value : on
> GUC : gp_enable_direct_dispatch Value : on
> GUC : gp_enable_fallback_plan Value : on
> GUC : gp_enable_fast_sri Value : on
> GUC : gp_enable_gpperfmon Value : off
> GUC : gp_enable_groupext_distinct_gather Value : on
> GUC : gp_enable_groupext_distinct_pruning Value : on
> GUC : gp_enable_multiphase_agg Value : on
> GUC : gp_enable_predicate_propagation Value : on
> GUC : gp_enable_preunique Value : on
> GUC : gp_enable_sequential_window_plans Value : on
> GUC : gp_enable_sort_distinct Value : on
> GUC : gp_enable_sort_limit Value : on
> GUC : gp_external_enable_exec Value : on
> GUC : gp_external_grant_privileges Value : off
> GUC : gp_external_max_segs Value : 64
> GUC : gp_filerep_tcp_keepalives_count Value : 2
> GUC : gp_filerep_tcp_keepalives_idle Value : 60
> GUC : gp_filerep_tcp_keepalives_interval Value : 30
> GUC : gp_force_use_default_temporary_directory Value : off
> GUC : gp_gpperfmon_send_interval Value : 1
> GUC : gp_hashjoin_tuples_per_bucket Value : 5
> GUC : gp_idf_deduplicate Value : auto
> GUC : gp_interconnect_cache_future_packets Value : on
> GUC : gp_interconnect_default_rtt Value : 20
> GUC : gp_interconnect_fc_method Value : LOSS
> GUC : gp_interconnect_hash_multiplier Value : 2
> GUC : gp_interconnect_min_retries_before_timeout Value : 100
> GUC : gp_interconnect_min_rto Value : 20
> GUC : gp_interconnect_queue_depth Value : 4
> GUC : gp_interconnect_setup_timeout Value : 7200
> GUC : gp_interconnect_snd_queue_depth Value : 2
> GUC : gp_interconnect_timer_checking_period Value : 20
> GUC : gp_interconnect_timer_period Value : 5
> GUC : gp_interconnect_transmit_timeout Value : 3600
> GUC : gp_interconnect_type Value : UDP
> GUC : gp_log_format Value : csv
> GUC : gp_max_csv_line_length Value : 1048576
> GUC : gp_max_databases Value : 16
> GUC : gp_max_filespaces Value : 8
> GUC : gp_max_packet_size Value : 8192
> GUC : gp_max_partition_level Value : 0
> GUC : gp_max_plan_size Value : 0
> GUC : gp_max_relations Value : 65536
> GUC : gp_max_tablespaces Value : 16
> GUC : gp_motion_cost_per_row Value : 0
> GUC : gp_num_contents_in_cluster Value : -10000
> GUC : gp_query_context_mem_limit Value : 102400
> GUC : gp_reject_percent_threshold Value : 300
> GUC : gp_reraise_signal Value : on
> GUC : gp_role Value : utility
> GUC : gp_safefswritesize Value : 0
> GUC : gp_segment_connect_timeout Value : 600
> GUC : gp_segments_for_planner Value : 0
> GUC : gp_session_id Value : -1
> GUC : gp_set_proc_affinity Value : off
> GUC : gp_snmp_community Value : public
> GUC : gp_snmp_monitor_address Value :
> GUC : gp_snmp_use_inform_or_trap Value : trap
> GUC : gp_statistics_pullup_from_child_partition Value : on
> GUC : gp_statistics_use_fkeys Value : on
> GUC : gp_subtrans_warn_limit Value : 16777216
> GUC : gp_temporary_directory_mark_error Value : 0
> GUC : gp_udp_bufsize_k Value : 0
> GUC : gp_vmem_protect_segworker_cache_limit Value : 500
> GUC : gp_workfile_checksumming Value : on
> GUC : gp_workfile_compress_algorithm Value : none
> GUC : gp_workfile_limit_per_query Value : 0
> GUC : gp_workfile_limit_per_segment Value : 0
> GUC : gpperfmon_port Value : 8888
> GUC : hawq_global_rm_type Value : yarn
> GUC : hawq_master_address_host Value : master.bigdata
> GUC : hawq_master_address_port Value : 6432
> GUC : hawq_master_temp_directory Value : /tmp
> GUC : hawq_metadata_cache_block_capacity Value : 2097152
> GUC : hawq_metadata_cache_check_interval Value : 30
> GUC : hawq_metadata_cache_flush_ratio Value : 0.85
> GUC : hawq_metadata_cache_free_block_max_ratio Value : 0.05
> GUC : hawq_metadata_cache_free_block_normal_ratio Value : 0.2
> GUC : hawq_metadata_cache_max_hdfs_file_num Value : 524288
> GUC : hawq_metadata_cache_reduce_ratio Value : 0.7
> GUC : hawq_metadata_cache_refresh_interval Value : 3600
> GUC : hawq_metadata_cache_refresh_max_num Value : 1000
> GUC : hawq_metadata_cache_refresh_timeout Value : 3600
> GUC : hawq_re_cgroup_hierarchy_name Value : hadoop-yarn
> GUC : hawq_re_cgroup_mount_point Value : /sys/fs/cgroup
> GUC : hawq_re_cpu_enable Value : off
> GUC : hawq_re_cpu_weight Value : 1024
> GUC : hawq_re_memory_overcommit_max Value : 8192
> GUC : hawq_re_vcore_pcore_ratio Value : 1
> GUC : hawq_rm_cluster_report_period Value : 60
> GUC : hawq_rm_clusterratio_core_to_memorygb_factor Value : 5
> GUC : hawq_rm_connpool_sameaddr_buffersize Value : 2
> GUC : hawq_rm_container_batch_limit Value : 1000
> GUC : hawq_rm_enable_connpool Value : on
> GUC : hawq_rm_force_alterqueue_cancel_queued_request Value : on
> GUC : hawq_rm_force_fifo_queuing Value : on
> GUC : hawq_rm_master_domain_port Value : 5436
> GUC : hawq_rm_master_port Value : 5437
> GUC : hawq_rm_memory_limit_perseg Value : 480GB
> GUC : hawq_rm_min_resource_perseg Value : 2
> GUC : hawq_rm_nocluster_timeout Value : 60
> GUC : hawq_rm_nresqueue_limit Value : 128
> GUC : hawq_rm_nslice_perseg_limit Value : 5000
> GUC : hawq_rm_nvcore_limit_perseg Value : 16
> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_limit Value : 512
> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_perseg_limit Value : 8
> GUC : hawq_rm_nvseg_for_analyze_part_perquery_limit Value : 256
> GUC : hawq_rm_nvseg_for_analyze_part_perquery_perseg_limit Value : 4
> GUC : hawq_rm_nvseg_for_copy_from_perquery Value : 6
> GUC : hawq_rm_nvseg_perquery_limit Value : 512
> GUC : hawq_rm_nvseg_perquery_perseg_limit Value : 6
> GUC : hawq_rm_nvseg_variance_amon_seg_limit Value : 1
> GUC : hawq_rm_nvseg_variance_amon_seg_respool_limit Value : 2
> GUC : hawq_rm_regularize_io_factor Value : 1
> GUC : hawq_rm_regularize_io_max Value : 1.37439e+11
> GUC : hawq_rm_regularize_nvseg_factor Value : 1
> GUC : hawq_rm_regularize_nvseg_max Value : 300
> GUC : hawq_rm_regularize_usage_factor Value : 1
> GUC : hawq_rm_rejectrequest_nseg_limit Value : 0.25
> GUC : hawq_rm_request_timeoutcheck_interval Value : 1
> GUC : hawq_rm_resource_allocation_timeout Value : 600
> GUC : hawq_rm_resource_idle_timeout Value : 300
> GUC : hawq_rm_respool_test_file Value :
> GUC : hawq_rm_return_percent_on_overcommit Value : 10
> GUC : hawq_rm_segment_config_refresh_interval Value : 30
> GUC : hawq_rm_segment_heartbeat_interval Value : 30
> GUC : hawq_rm_segment_heartbeat_timeout Value : 300
> GUC : hawq_rm_segment_port Value : 5438
> GUC : hawq_rm_segment_tmpdir_detect_interval Value : 300
> GUC : hawq_rm_session_lease_heartbeat_enable Value : on
> GUC : hawq_rm_session_lease_heartbeat_interval Value : 10
> GUC : hawq_rm_session_lease_timeout Value : 180
> GUC : hawq_rm_stmt_nvseg Value : 0
> GUC : hawq_rm_stmt_vseg_memory Value : 128mb
> GUC : hawq_rm_tolerate_nseg_limit Value : 0.25
> GUC : hawq_rm_yarn_address Value : worker1.bigdata:8050
> GUC : hawq_rm_yarn_app_name Value : hawq
> GUC : hawq_rm_yarn_queue_name Value : default
> GUC : hawq_rm_yarn_scheduler_address Value : worker1.bigdata:8030
> GUC : hawq_segment_address_port Value : 40000
> GUC : hawq_segment_history_keep_period Value : 365
> GUC : hawq_segment_temp_directory Value : /tmp
> GUC : integer_datetimes Value : on
> GUC : IntervalStyle Value : postgres
> GUC : join_collapse_limit Value : 20
> GUC : krb5_ccname Value : /tmp/postgres.ccname
> GUC : krb_caseins_users Value : off
> GUC : krb_server_keyfile Value :
> FILE:/data/pulse2-agent/agents/agent1/work/HAWQ-main-opt/rhel5_x86_64/src/hawq-db-dist/etc/krb5.keytab
> GUC : krb_srvname Value : postgres
> GUC : lc_collate Value : C
> GUC : lc_ctype Value : C
> GUC : lc_messages Value : en_US.utf8
> GUC : lc_monetary Value : en_US.utf8
> GUC : lc_numeric Value : en_US.utf8
> GUC : lc_time Value : en_US.utf8
> GUC : listen_addresses Value : *
> GUC : local_preload_libraries Value :
> GUC : log_autostats Value : off
> GUC : log_connections Value : off
> GUC : log_disconnections Value : off
> GUC : log_dispatch_stats Value : off
> GUC : log_duration Value : off
> GUC : log_error_verbosity Value : default
> GUC : log_executor_stats Value : off
> GUC : log_hostname Value : off
> GUC : log_min_duration_statement Value : -1
> GUC : log_min_error_statement Value : error
> GUC : log_min_messages Value : warning
> GUC : log_parser_stats Value : off
> GUC : log_planner_stats Value : off
> GUC : log_rotation_age Value : 1440
> GUC : log_rotation_size Value : 0
> GUC : log_statement Value : none
> GUC : log_statement_stats Value : off
> GUC : log_timezone Value : PRC
> GUC : log_truncate_on_rotation Value : off
> GUC : maintenance_work_mem Value : 65536
> GUC : master_directory Value :
> GUC : max_appendonly_segfiles Value : 262144
> GUC : max_appendonly_tables Value : 10000
> GUC : max_connections Value : 1280
> GUC : max_files_per_process Value : 150
> GUC : max_fsm_pages Value : 200000
> GUC : max_fsm_relations Value : 1000
> GUC : max_function_args Value : 100
> GUC : max_identifier_length Value : 63
> GUC : max_index_keys Value : 32
> GUC : max_locks_per_transaction Value : 128
> GUC : max_prepared_transactions Value : 250
> GUC : max_stack_depth Value : 2048
> GUC : max_work_mem Value : 1024000
> GUC : metadata_cache_testfile Value :
> GUC : optimizer Value : on
> GUC : optimizer_analyze_root_partition Value : on
> GUC : optimizer_minidump Value : onerror
> GUC : optimizer_parts_to_force_sort_on_insert Value : 160
> GUC : password_encryption Value : on
> GUC : password_hash_algorithm Value : MD5
> GUC : pljava_classpath Value :
> GUC : pljava_release_lingering_savepoints Value : off
> GUC : pljava_statement_cache_size Value : 0
> GUC : pljava_vmoptions Value :
> GUC : port Value : 6432
> GUC : pxf_enable_filter_pushdown Value : on
> GUC : pxf_enable_locality_optimizations Value : on
> GUC : pxf_enable_stat_collection Value : on
> GUC : pxf_remote_service_login Value :
> GUC : pxf_remote_service_secret Value :
> GUC : pxf_service_address Value : localhost:51200
> GUC : pxf_stat_max_fragments Value : 100
> GUC : random_page_cost Value : 100
> GUC : regex_flavor Value : advanced
> GUC : runaway_detector_activation_percent Value : 95
> GUC : search_path Value : "$user",public
> GUC : seg_max_connections Value : 3000
> GUC : segment_directory Value :
> GUC : seq_page_cost Value : 1
> GUC : server_encoding Value : UTF8
> GUC : server_ticket_renew_interval Value : 43200000
> GUC : server_version Value : 8.2.15
> GUC : server_version_num Value : 80215
> GUC : shared_buffers Value : 4000
> GUC : shared_preload_libraries Value :
> GUC : ssl Value : off
> GUC : ssl_ciphers Value : ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
> GUC : standard_conforming_strings Value : off
> GUC : standby_address_host Value : localhost
> GUC : statement_timeout Value : 0
> GUC : superuser_reserved_connections Value : 3
> GUC : tcp_keepalives_count Value : 9
> GUC : tcp_keepalives_idle Value : 7200
> GUC : tcp_keepalives_interval Value : 75
> GUC : temp_buffers Value : 1024
> GUC : TimeZone Value : PRC
> GUC : timezone_abbreviations Value : Default
> GUC : track_activities Value : on
> GUC : track_counts Value : off
> GUC : transaction_isolation Value : read committed
> GUC : transaction_read_only Value : off
> GUC : transform_null_equals Value : off
> GUC : unix_socket_directory Value :
> GUC : unix_socket_group Value :
> GUC : unix_socket_permissions Value : 511
> GUC : update_process_title Value : on
> GUC : vacuum_cost_delay Value : 0
> GUC : vacuum_cost_limit Value : 200
> GUC : vacuum_cost_page_dirty Value : 20
> GUC : vacuum_cost_page_miss Value : 10
> GUC : vacuum_freeze_min_age Value : 100000000
> GUC : work_mem Value : 51200
>
>
>
>
>
> At 2016-09-21 13:00:41, "Vineet Goel" <vv...@apache.org> wrote:
>
> Could you please post your SQL DDL statement? How many URLs do you have in
> your external table? Also, your HASH dist table - how many buckets are
> defined, if any? Are the # of URLs more than the # of buckets or
> default_hash_table_bucket_number value? Perhaps you can attach your
> hawq-site.xml file as well.
>
> Also see:
> http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html
>
> Thanks
> Vineet
>
>
> On Tue, Sep 20, 2016 at 7:07 PM 来熊 <yi...@163.com> wrote:
>
>> Hi,all:
>>     I am testing hawq 2.0.0 , and I find a problem like this:
>>  I load data from an external table (created using "like target_table"
>> statement) ,
>> if the target table was distributed by some column(s), it raise this
>> error:
>>  External scan error: There are more external files (URLs) than primary
>> segments that can read them (COptTasks.cpp:1756)
>> if the target table was distributed randomly, it works well,
>> I don't set any parameter special,does anybody know how to resolve this
>> problem?
>> thanks a lot.
>>
>

Re: Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)

Posted by Vineet Goel <vv...@apache.org>.
Your default_hash_table_bucket_number value is set to 6. Typically, this
should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments, you
should set this value to 18. Any time you change this parameter, you should
redistribute your HASH distributed tables, if you have any (unless the
table DDL has # of buckets defined, I think).

Increase default_hash_table_bucket_number to 18 and retry the insert. Since
you have 18 ext table URLs, it should work with the change.

Thanks
Vineet


On Wed, Sep 21, 2016 at 12:26 AM 来熊 <yi...@163.com> wrote:

>
> My environment is >>>> : 1 master 3 segments
> SQL >>>>>>:
> CREATE TABLE call_center (
>     cc_call_center_sk integer,
>     cc_call_center_id character varying(16),
>     cc_rec_start_date date,
>     cc_rec_end_date date,
>     cc_closed_date_sk integer,
>     cc_open_date_sk integer,
>     cc_name character varying(50),
>     cc_class character varying(50),
>     cc_employees integer,
>     cc_sq_ft integer,
>     cc_hours character varying(20),
>     cc_manager character varying(40),
>     cc_mkt_id integer,
>     cc_mkt_class character varying(50),
>     cc_mkt_desc character varying(100),
>     cc_market_manager character varying(40),
>     cc_division text,
>     cc_division_name character varying(50),
>     cc_company text,
>     cc_company_name character varying(50),
>     cc_street_number character varying(10),
>     cc_street_name character varying(60),
>     cc_street_type character varying(15),
>     cc_suite_number character varying(10),
>     cc_city character varying(60),
>     cc_county character varying(30),
>     cc_state text,
>     cc_zip character varying(10),
>     cc_country character varying(20),
>     cc_gmt_offset numeric(5,2),
>     cc_tax_percentage numeric(5,2)
> )
> WITH (appendonly=true, orientation=parquet)
> DISTRIBUTED BY (cc_call_center_sk);
>
> CREATE EXTERNAL TABLE ext_call_center (like call_center)
> LOCATION (
> 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
> FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');
>
> insert into call_center select * from ext_call_center;
>
> ERROR:  External scan error: There are more external files (URLs) than
> primary segments that can read them (COptTasks.cpp:1756)
>
> hawq config >>>>:
>
> GUC : add_missing_from Value : off
> GUC : application_name Value :
> GUC : array_nulls Value : on
> GUC : authentication_timeout Value : 60
> GUC : backslash_quote Value : safe_encoding
> GUC : block_size Value : 32768
> GUC : bonjour_name Value :
> GUC : check_function_bodies Value : on
> GUC : client_encoding Value : UTF8
> GUC : client_min_messages Value : ERROR
> GUC : cpu_index_tuple_cost Value : 0.005
> GUC : cpu_operator_cost Value : 0.0025
> GUC : cpu_tuple_cost Value : 0.01
> GUC : cursor_tuple_fraction Value : 1
> GUC : custom_variable_classes Value :
> GUC : DateStyle Value : ISO, MDY
> GUC : db_user_namespace Value : off
> GUC : deadlock_timeout Value : 1000
> GUC : debug_assertions Value : off
> GUC : debug_pretty_print Value : off
> GUC : debug_print_parse Value : off
> GUC : debug_print_plan Value : off
> GUC : debug_print_prelim_plan Value : off
> GUC : debug_print_rewritten Value : off
> GUC : debug_print_slice_table Value : off
> GUC : default_hash_table_bucket_number Value : 6
> GUC : default_statement_mem Value : 128000
> GUC : default_statistics_target Value : 25
> GUC : default_tablespace Value :
> GUC : default_transaction_isolation Value : read committed
> GUC : default_transaction_read_only Value : off
> GUC : dfs_url Value : localhost:8020/hawq
> GUC : dynamic_library_path Value : $libdir
> GUC : effective_cache_size Value : 16384
> GUC : enable_bitmapscan Value : on
> GUC : enable_groupagg Value : on
> GUC : enable_hashagg Value : on
> GUC : enable_hashjoin Value : on
> GUC : enable_indexscan Value : on
> GUC : enable_mergejoin Value : off
> GUC : enable_nestloop Value : off
> GUC : enable_secure_filesystem Value : off
> GUC : enable_seqscan Value : on
> GUC : enable_sort Value : on
> GUC : enable_tidscan Value : on
> GUC : escape_string_warning Value : on
> GUC : explain_memory_verbosity Value : suppress
> GUC : explain_pretty_print Value : on
> GUC : extra_float_digits Value : 0
> GUC : filesystem_support_truncate Value : on
> GUC : from_collapse_limit Value : 20
> GUC : gp_adjust_selectivity_for_outerjoins Value : on
> GUC : gp_analyze_relative_error Value : 0.25
> GUC : gp_autostats_mode Value : NONE
> GUC : gp_autostats_on_change_threshold Value : 2147483647
> GUC : gp_backup_directIO Value : off
> GUC : gp_backup_directIO_read_chunk_mb Value : 20
> GUC : gp_cached_segworkers_threshold Value : 5
> GUC : gp_command_count Value : 6
> GUC : gp_connections_per_thread Value : 64
> GUC : gp_debug_linger Value : 0
> GUC : gp_dynamic_partition_pruning Value : on
> GUC : gp_email_connect_avoid_duration Value : 7200
> GUC : gp_email_connect_failures Value : 5
> GUC : gp_email_connect_timeout Value : 15
> GUC : gp_email_from Value :
> GUC : gp_email_smtp_password Value :
> GUC : gp_email_smtp_server Value : localhost:25
> GUC : gp_email_smtp_userid Value :
> GUC : gp_email_to Value :
> GUC : gp_enable_agg_distinct Value : on
> GUC : gp_enable_agg_distinct_pruning Value : on
> GUC : gp_enable_direct_dispatch Value : on
> GUC : gp_enable_fallback_plan Value : on
> GUC : gp_enable_fast_sri Value : on
> GUC : gp_enable_gpperfmon Value : off
> GUC : gp_enable_groupext_distinct_gather Value : on
> GUC : gp_enable_groupext_distinct_pruning Value : on
> GUC : gp_enable_multiphase_agg Value : on
> GUC : gp_enable_predicate_propagation Value : on
> GUC : gp_enable_preunique Value : on
> GUC : gp_enable_sequential_window_plans Value : on
> GUC : gp_enable_sort_distinct Value : on
> GUC : gp_enable_sort_limit Value : on
> GUC : gp_external_enable_exec Value : on
> GUC : gp_external_grant_privileges Value : off
> GUC : gp_external_max_segs Value : 64
> GUC : gp_filerep_tcp_keepalives_count Value : 2
> GUC : gp_filerep_tcp_keepalives_idle Value : 60
> GUC : gp_filerep_tcp_keepalives_interval Value : 30
> GUC : gp_force_use_default_temporary_directory Value : off
> GUC : gp_gpperfmon_send_interval Value : 1
> GUC : gp_hashjoin_tuples_per_bucket Value : 5
> GUC : gp_idf_deduplicate Value : auto
> GUC : gp_interconnect_cache_future_packets Value : on
> GUC : gp_interconnect_default_rtt Value : 20
> GUC : gp_interconnect_fc_method Value : LOSS
> GUC : gp_interconnect_hash_multiplier Value : 2
> GUC : gp_interconnect_min_retries_before_timeout Value : 100
> GUC : gp_interconnect_min_rto Value : 20
> GUC : gp_interconnect_queue_depth Value : 4
> GUC : gp_interconnect_setup_timeout Value : 7200
> GUC : gp_interconnect_snd_queue_depth Value : 2
> GUC : gp_interconnect_timer_checking_period Value : 20
> GUC : gp_interconnect_timer_period Value : 5
> GUC : gp_interconnect_transmit_timeout Value : 3600
> GUC : gp_interconnect_type Value : UDP
> GUC : gp_log_format Value : csv
> GUC : gp_max_csv_line_length Value : 1048576
> GUC : gp_max_databases Value : 16
> GUC : gp_max_filespaces Value : 8
> GUC : gp_max_packet_size Value : 8192
> GUC : gp_max_partition_level Value : 0
> GUC : gp_max_plan_size Value : 0
> GUC : gp_max_relations Value : 65536
> GUC : gp_max_tablespaces Value : 16
> GUC : gp_motion_cost_per_row Value : 0
> GUC : gp_num_contents_in_cluster Value : -10000
> GUC : gp_query_context_mem_limit Value : 102400
> GUC : gp_reject_percent_threshold Value : 300
> GUC : gp_reraise_signal Value : on
> GUC : gp_role Value : utility
> GUC : gp_safefswritesize Value : 0
> GUC : gp_segment_connect_timeout Value : 600
> GUC : gp_segments_for_planner Value : 0
> GUC : gp_session_id Value : -1
> GUC : gp_set_proc_affinity Value : off
> GUC : gp_snmp_community Value : public
> GUC : gp_snmp_monitor_address Value :
> GUC : gp_snmp_use_inform_or_trap Value : trap
> GUC : gp_statistics_pullup_from_child_partition Value : on
> GUC : gp_statistics_use_fkeys Value : on
> GUC : gp_subtrans_warn_limit Value : 16777216
> GUC : gp_temporary_directory_mark_error Value : 0
> GUC : gp_udp_bufsize_k Value : 0
> GUC : gp_vmem_protect_segworker_cache_limit Value : 500
> GUC : gp_workfile_checksumming Value : on
> GUC : gp_workfile_compress_algorithm Value : none
> GUC : gp_workfile_limit_per_query Value : 0
> GUC : gp_workfile_limit_per_segment Value : 0
> GUC : gpperfmon_port Value : 8888
> GUC : hawq_global_rm_type Value : yarn
> GUC : hawq_master_address_host Value : master.bigdata
> GUC : hawq_master_address_port Value : 6432
> GUC : hawq_master_temp_directory Value : /tmp
> GUC : hawq_metadata_cache_block_capacity Value : 2097152
> GUC : hawq_metadata_cache_check_interval Value : 30
> GUC : hawq_metadata_cache_flush_ratio Value : 0.85
> GUC : hawq_metadata_cache_free_block_max_ratio Value : 0.05
> GUC : hawq_metadata_cache_free_block_normal_ratio Value : 0.2
> GUC : hawq_metadata_cache_max_hdfs_file_num Value : 524288
> GUC : hawq_metadata_cache_reduce_ratio Value : 0.7
> GUC : hawq_metadata_cache_refresh_interval Value : 3600
> GUC : hawq_metadata_cache_refresh_max_num Value : 1000
> GUC : hawq_metadata_cache_refresh_timeout Value : 3600
> GUC : hawq_re_cgroup_hierarchy_name Value : hadoop-yarn
> GUC : hawq_re_cgroup_mount_point Value : /sys/fs/cgroup
> GUC : hawq_re_cpu_enable Value : off
> GUC : hawq_re_cpu_weight Value : 1024
> GUC : hawq_re_memory_overcommit_max Value : 8192
> GUC : hawq_re_vcore_pcore_ratio Value : 1
> GUC : hawq_rm_cluster_report_period Value : 60
> GUC : hawq_rm_clusterratio_core_to_memorygb_factor Value : 5
> GUC : hawq_rm_connpool_sameaddr_buffersize Value : 2
> GUC : hawq_rm_container_batch_limit Value : 1000
> GUC : hawq_rm_enable_connpool Value : on
> GUC : hawq_rm_force_alterqueue_cancel_queued_request Value : on
> GUC : hawq_rm_force_fifo_queuing Value : on
> GUC : hawq_rm_master_domain_port Value : 5436
> GUC : hawq_rm_master_port Value : 5437
> GUC : hawq_rm_memory_limit_perseg Value : 480GB
> GUC : hawq_rm_min_resource_perseg Value : 2
> GUC : hawq_rm_nocluster_timeout Value : 60
> GUC : hawq_rm_nresqueue_limit Value : 128
> GUC : hawq_rm_nslice_perseg_limit Value : 5000
> GUC : hawq_rm_nvcore_limit_perseg Value : 16
> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_limit Value : 512
> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_perseg_limit Value : 8
> GUC : hawq_rm_nvseg_for_analyze_part_perquery_limit Value : 256
> GUC : hawq_rm_nvseg_for_analyze_part_perquery_perseg_limit Value : 4
> GUC : hawq_rm_nvseg_for_copy_from_perquery Value : 6
> GUC : hawq_rm_nvseg_perquery_limit Value : 512
> GUC : hawq_rm_nvseg_perquery_perseg_limit Value : 6
> GUC : hawq_rm_nvseg_variance_amon_seg_limit Value : 1
> GUC : hawq_rm_nvseg_variance_amon_seg_respool_limit Value : 2
> GUC : hawq_rm_regularize_io_factor Value : 1
> GUC : hawq_rm_regularize_io_max Value : 1.37439e+11
> GUC : hawq_rm_regularize_nvseg_factor Value : 1
> GUC : hawq_rm_regularize_nvseg_max Value : 300
> GUC : hawq_rm_regularize_usage_factor Value : 1
> GUC : hawq_rm_rejectrequest_nseg_limit Value : 0.25
> GUC : hawq_rm_request_timeoutcheck_interval Value : 1
> GUC : hawq_rm_resource_allocation_timeout Value : 600
> GUC : hawq_rm_resource_idle_timeout Value : 300
> GUC : hawq_rm_respool_test_file Value :
> GUC : hawq_rm_return_percent_on_overcommit Value : 10
> GUC : hawq_rm_segment_config_refresh_interval Value : 30
> GUC : hawq_rm_segment_heartbeat_interval Value : 30
> GUC : hawq_rm_segment_heartbeat_timeout Value : 300
> GUC : hawq_rm_segment_port Value : 5438
> GUC : hawq_rm_segment_tmpdir_detect_interval Value : 300
> GUC : hawq_rm_session_lease_heartbeat_enable Value : on
> GUC : hawq_rm_session_lease_heartbeat_interval Value : 10
> GUC : hawq_rm_session_lease_timeout Value : 180
> GUC : hawq_rm_stmt_nvseg Value : 0
> GUC : hawq_rm_stmt_vseg_memory Value : 128mb
> GUC : hawq_rm_tolerate_nseg_limit Value : 0.25
> GUC : hawq_rm_yarn_address Value : worker1.bigdata:8050
> GUC : hawq_rm_yarn_app_name Value : hawq
> GUC : hawq_rm_yarn_queue_name Value : default
> GUC : hawq_rm_yarn_scheduler_address Value : worker1.bigdata:8030
> GUC : hawq_segment_address_port Value : 40000
> GUC : hawq_segment_history_keep_period Value : 365
> GUC : hawq_segment_temp_directory Value : /tmp
> GUC : integer_datetimes Value : on
> GUC : IntervalStyle Value : postgres
> GUC : join_collapse_limit Value : 20
> GUC : krb5_ccname Value : /tmp/postgres.ccname
> GUC : krb_caseins_users Value : off
> GUC : krb_server_keyfile Value :
> FILE:/data/pulse2-agent/agents/agent1/work/HAWQ-main-opt/rhel5_x86_64/src/hawq-db-dist/etc/krb5.keytab
> GUC : krb_srvname Value : postgres
> GUC : lc_collate Value : C
> GUC : lc_ctype Value : C
> GUC : lc_messages Value : en_US.utf8
> GUC : lc_monetary Value : en_US.utf8
> GUC : lc_numeric Value : en_US.utf8
> GUC : lc_time Value : en_US.utf8
> GUC : listen_addresses Value : *
> GUC : local_preload_libraries Value :
> GUC : log_autostats Value : off
> GUC : log_connections Value : off
> GUC : log_disconnections Value : off
> GUC : log_dispatch_stats Value : off
> GUC : log_duration Value : off
> GUC : log_error_verbosity Value : default
> GUC : log_executor_stats Value : off
> GUC : log_hostname Value : off
> GUC : log_min_duration_statement Value : -1
> GUC : log_min_error_statement Value : error
> GUC : log_min_messages Value : warning
> GUC : log_parser_stats Value : off
> GUC : log_planner_stats Value : off
> GUC : log_rotation_age Value : 1440
> GUC : log_rotation_size Value : 0
> GUC : log_statement Value : none
> GUC : log_statement_stats Value : off
> GUC : log_timezone Value : PRC
> GUC : log_truncate_on_rotation Value : off
> GUC : maintenance_work_mem Value : 65536
> GUC : master_directory Value :
> GUC : max_appendonly_segfiles Value : 262144
> GUC : max_appendonly_tables Value : 10000
> GUC : max_connections Value : 1280
> GUC : max_files_per_process Value : 150
> GUC : max_fsm_pages Value : 200000
> GUC : max_fsm_relations Value : 1000
> GUC : max_function_args Value : 100
> GUC : max_identifier_length Value : 63
> GUC : max_index_keys Value : 32
> GUC : max_locks_per_transaction Value : 128
> GUC : max_prepared_transactions Value : 250
> GUC : max_stack_depth Value : 2048
> GUC : max_work_mem Value : 1024000
> GUC : metadata_cache_testfile Value :
> GUC : optimizer Value : on
> GUC : optimizer_analyze_root_partition Value : on
> GUC : optimizer_minidump Value : onerror
> GUC : optimizer_parts_to_force_sort_on_insert Value : 160
> GUC : password_encryption Value : on
> GUC : password_hash_algorithm Value : MD5
> GUC : pljava_classpath Value :
> GUC : pljava_release_lingering_savepoints Value : off
> GUC : pljava_statement_cache_size Value : 0
> GUC : pljava_vmoptions Value :
> GUC : port Value : 6432
> GUC : pxf_enable_filter_pushdown Value : on
> GUC : pxf_enable_locality_optimizations Value : on
> GUC : pxf_enable_stat_collection Value : on
> GUC : pxf_remote_service_login Value :
> GUC : pxf_remote_service_secret Value :
> GUC : pxf_service_address Value : localhost:51200
> GUC : pxf_stat_max_fragments Value : 100
> GUC : random_page_cost Value : 100
> GUC : regex_flavor Value : advanced
> GUC : runaway_detector_activation_percent Value : 95
> GUC : search_path Value : "$user",public
> GUC : seg_max_connections Value : 3000
> GUC : segment_directory Value :
> GUC : seq_page_cost Value : 1
> GUC : server_encoding Value : UTF8
> GUC : server_ticket_renew_interval Value : 43200000
> GUC : server_version Value : 8.2.15
> GUC : server_version_num Value : 80215
> GUC : shared_buffers Value : 4000
> GUC : shared_preload_libraries Value :
> GUC : ssl Value : off
> GUC : ssl_ciphers Value : ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
> GUC : standard_conforming_strings Value : off
> GUC : standby_address_host Value : localhost
> GUC : statement_timeout Value : 0
> GUC : superuser_reserved_connections Value : 3
> GUC : tcp_keepalives_count Value : 9
> GUC : tcp_keepalives_idle Value : 7200
> GUC : tcp_keepalives_interval Value : 75
> GUC : temp_buffers Value : 1024
> GUC : TimeZone Value : PRC
> GUC : timezone_abbreviations Value : Default
> GUC : track_activities Value : on
> GUC : track_counts Value : off
> GUC : transaction_isolation Value : read committed
> GUC : transaction_read_only Value : off
> GUC : transform_null_equals Value : off
> GUC : unix_socket_directory Value :
> GUC : unix_socket_group Value :
> GUC : unix_socket_permissions Value : 511
> GUC : update_process_title Value : on
> GUC : vacuum_cost_delay Value : 0
> GUC : vacuum_cost_limit Value : 200
> GUC : vacuum_cost_page_dirty Value : 20
> GUC : vacuum_cost_page_miss Value : 10
> GUC : vacuum_freeze_min_age Value : 100000000
> GUC : work_mem Value : 51200
>
>
>
>
>
> At 2016-09-21 13:00:41, "Vineet Goel" <vv...@apache.org> wrote:
>
> Could you please post your SQL DDL statement? How many URLs do you have in
> your external table? Also, your HASH dist table - how many buckets are
> defined, if any? Are the # of URLs more than the # of buckets or
> default_hash_table_bucket_number value? Perhaps you can attach your
> hawq-site.xml file as well.
>
> Also see:
> http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html
>
> Thanks
> Vineet
>
>
> On Tue, Sep 20, 2016 at 7:07 PM 来熊 <yi...@163.com> wrote:
>
>> Hi,all:
>>     I am testing hawq 2.0.0 , and I find a problem like this:
>>  I load data from an external table (created using "like target_table"
>> statement) ,
>> if the target table was distributed by some column(s), it raise this
>> error:
>>  External scan error: There are more external files (URLs) than primary
>> segments that can read them (COptTasks.cpp:1756)
>> if the target table was distributed randomly, it works well,
>> I don't set any parameter special,does anybody know how to resolve this
>> problem?
>> thanks a lot.
>>
>