You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hawq.apache.org by "yin.zhb@163.com" <yi...@163.com> on 2016/04/07 17:56:54 UTC

May I mention some gpload issues?

Whan I using gpload in my work, I got some problems on it, 
May I mention some gpload issues here?



yin.zhb@163.com

Re: May I mention some gpload issues?

Posted by Wales Wang <wo...@yahoo.com.INVALID>.
to:yin
I can help u
Pls contact me.
to: lei
greenplum dev is not active developing 

Wales Wang

在 2016-4-9,下午5:47,Lei Chang <le...@apache.org> 写道:

> Looks you are using gpdb. This mailing list is for HAWQ.
> 
> So you can reach pivotal support or ask gpdb questions on the mailing list
> shown here: http://greenplum.org/
> 
> Cheers
> Lei
> 
> 
> 
> On Sat, Apr 9, 2016 at 10:36 AM, yin.zhb@163.com <yi...@163.com> wrote:
> 
>> 
>> Our db team using greenplum for one year,we have 680g(one billion+ lines)
>> data need to load to greenplum every day.
>> we writed a program to call gpload to load data every 10min. every time
>> will load 100000 to 10000000+ lines.
>> we can accept abount 1/100000 error lines.
>> 
>> our environment:
>> os version: rhel 6.3
>> greenplum : 4.3.5.2
>> gpload    : 4.3.5.2
>> 
>> there's some problems we made when using gpload:
>> 
>> 1、"line too long"
>> this error make gpload failed, even if there is only one line in all my
>> files needed load.
>> we set "error_limit","segment reject limit"  but not effected.
>> if i try to find the error line,it is very hard.so we set max_line_length
>> to "1048576"
>> 
>> 2、"no partition key"
>> this also make us headache,
>> maybe there is only one line not correct(a delimiter in column not
>> expected), or encoding not recognized;
>> this will make gpload failed like problem 1;
>> 
>> 
>> 3、column too long
>> this will make gpload failed,too.
>> we replace all data type to text,to skip this question.
>> 
>> 4、in my product environment,when the greenplum cluster got error,logged
>> like this:
>> fatal","57m01","the database system is in mirror or uninitialized
>> mode",,,,,,,0,,"postmaster.c",2994,
>> 
>> but gpload and gpfdist process sleeped,not exit.
>> 
>> we visit the gpload.py script,we found there is a problem not considered.
>> 
>> gpload.py load data like this steps:
>> step1: read_config()
>> step2: setup_connection() --connect db the first time
>> step3: read_table_metadata()
>> step4: read_columns()
>> step5: read_mapping()
>> step6: start_gpfdists()
>> step7: do_method()
>> 
>> finally,it will:
>> step8: removing temporary data --connect db the second time
>> step9:killing gpfdist
>> 
>> we find when step8 got error(db was not connected),the process will
>> sleeping.
>> 
>> thanks for visit.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> yin.zhb@163.com
>> 
>> From: Lei Chang
>> Date: 2016-04-08 07:52
>> To: user
>> CC: dev
>> Subject: Re: May I mention some gpload issues?
>> 
>> please. thanks!
>> 
>> Cheers
>> Lei
>> 
>> 
>> On Thu, Apr 7, 2016 at 11:56 PM, yin.zhb@163.com <yi...@163.com> wrote:
>> 
>> Whan I using gpload in my work, I got some problems on it,
>> May I mention some gpload issues here?
>> 
>> 
>> 
>> yin.zhb@163.com
>> 
>> 

Re: Re: May I mention some gpload issues?

Posted by Lei Chang <le...@apache.org>.
Looks you are using gpdb. This mailing list is for HAWQ.

So you can reach pivotal support or ask gpdb questions on the mailing list
shown here: http://greenplum.org/

Cheers
Lei



On Sat, Apr 9, 2016 at 10:36 AM, yin.zhb@163.com <yi...@163.com> wrote:

>
> Our db team using greenplum for one year,we have 680g(one billion+ lines)
> data need to load to greenplum every day.
> we writed a program to call gpload to load data every 10min. every time
> will load 100000 to 10000000+ lines.
> we can accept abount 1/100000 error lines.
>
> our environment:
> os version: rhel 6.3
> greenplum : 4.3.5.2
> gpload    : 4.3.5.2
>
> there's some problems we made when using gpload:
>
> 1、"line too long"
> this error make gpload failed, even if there is only one line in all my
> files needed load.
> we set "error_limit","segment reject limit"  but not effected.
> if i try to find the error line,it is very hard.so we set max_line_length
> to "1048576"
>
> 2、"no partition key"
> this also make us headache,
> maybe there is only one line not correct(a delimiter in column not
> expected), or encoding not recognized;
> this will make gpload failed like problem 1;
>
>
> 3、column too long
> this will make gpload failed,too.
> we replace all data type to text,to skip this question.
>
> 4、in my product environment,when the greenplum cluster got error,logged
> like this:
> fatal","57m01","the database system is in mirror or uninitialized
> mode",,,,,,,0,,"postmaster.c",2994,
>
> but gpload and gpfdist process sleeped,not exit.
>
> we visit the gpload.py script,we found there is a problem not considered.
>
> gpload.py load data like this steps:
> step1: read_config()
> step2: setup_connection() --connect db the first time
> step3: read_table_metadata()
> step4: read_columns()
> step5: read_mapping()
> step6: start_gpfdists()
> step7: do_method()
>
> finally,it will:
> step8: removing temporary data --connect db the second time
> step9:killing gpfdist
>
> we find when step8 got error(db was not connected),the process will
> sleeping.
>
> thanks for visit.
>
>
>
>
>
>
>
>
> yin.zhb@163.com
>
> From: Lei Chang
> Date: 2016-04-08 07:52
> To: user
> CC: dev
> Subject: Re: May I mention some gpload issues?
>
> please. thanks!
>
> Cheers
> Lei
>
>
> On Thu, Apr 7, 2016 at 11:56 PM, yin.zhb@163.com <yi...@163.com> wrote:
>
> Whan I using gpload in my work, I got some problems on it,
> May I mention some gpload issues here?
>
>
>
> yin.zhb@163.com
>
>

Re: Re: May I mention some gpload issues?

Posted by Lei Chang <le...@apache.org>.
Looks you are using gpdb. This mailing list is for HAWQ.

So you can reach pivotal support or ask gpdb questions on the mailing list
shown here: http://greenplum.org/

Cheers
Lei



On Sat, Apr 9, 2016 at 10:36 AM, yin.zhb@163.com <yi...@163.com> wrote:

>
> Our db team using greenplum for one year,we have 680g(one billion+ lines)
> data need to load to greenplum every day.
> we writed a program to call gpload to load data every 10min. every time
> will load 100000 to 10000000+ lines.
> we can accept abount 1/100000 error lines.
>
> our environment:
> os version: rhel 6.3
> greenplum : 4.3.5.2
> gpload    : 4.3.5.2
>
> there's some problems we made when using gpload:
>
> 1、"line too long"
> this error make gpload failed, even if there is only one line in all my
> files needed load.
> we set "error_limit","segment reject limit"  but not effected.
> if i try to find the error line,it is very hard.so we set max_line_length
> to "1048576"
>
> 2、"no partition key"
> this also make us headache,
> maybe there is only one line not correct(a delimiter in column not
> expected), or encoding not recognized;
> this will make gpload failed like problem 1;
>
>
> 3、column too long
> this will make gpload failed,too.
> we replace all data type to text,to skip this question.
>
> 4、in my product environment,when the greenplum cluster got error,logged
> like this:
> fatal","57m01","the database system is in mirror or uninitialized
> mode",,,,,,,0,,"postmaster.c",2994,
>
> but gpload and gpfdist process sleeped,not exit.
>
> we visit the gpload.py script,we found there is a problem not considered.
>
> gpload.py load data like this steps:
> step1: read_config()
> step2: setup_connection() --connect db the first time
> step3: read_table_metadata()
> step4: read_columns()
> step5: read_mapping()
> step6: start_gpfdists()
> step7: do_method()
>
> finally,it will:
> step8: removing temporary data --connect db the second time
> step9:killing gpfdist
>
> we find when step8 got error(db was not connected),the process will
> sleeping.
>
> thanks for visit.
>
>
>
>
>
>
>
>
> yin.zhb@163.com
>
> From: Lei Chang
> Date: 2016-04-08 07:52
> To: user
> CC: dev
> Subject: Re: May I mention some gpload issues?
>
> please. thanks!
>
> Cheers
> Lei
>
>
> On Thu, Apr 7, 2016 at 11:56 PM, yin.zhb@163.com <yi...@163.com> wrote:
>
> Whan I using gpload in my work, I got some problems on it,
> May I mention some gpload issues here?
>
>
>
> yin.zhb@163.com
>
>

Re: Re: May I mention some gpload issues?

Posted by "yin.zhb@163.com" <yi...@163.com>.
Our db team using greenplum for one year,we have 680g(one billion+ lines) data need to load to greenplum every day.
we writed a program to call gpload to load data every 10min. every time will load 100000 to 10000000+ lines.
we can accept abount 1/100000 error lines.

our environment:
os version: rhel 6.3
greenplum : 4.3.5.2
gpload    : 4.3.5.2

there's some problems we made when using gpload:

1、"line too long" 
this error make gpload failed, even if there is only one line in all my files needed load.
we set "error_limit","segment reject limit"  but not effected.
if i try to find the error line,it is very hard.so we set max_line_length to "1048576"

2、"no partition key"
this also make us headache, 
maybe there is only one line not correct(a delimiter in column not expected), or encoding not recognized;
this will make gpload failed like problem 1;


3、column too long
this will make gpload failed,too.
we replace all data type to text,to skip this question.

4、in my product environment,when the greenplum cluster got error,logged like this:
fatal","57m01","the database system is in mirror or uninitialized mode",,,,,,,0,,"postmaster.c",2994,

but gpload and gpfdist process sleeped,not exit.

we visit the gpload.py script,we found there is a problem not considered.

gpload.py load data like this steps:
step1: read_config()
step2: setup_connection() --connect db the first time
step3: read_table_metadata()
step4: read_columns()
step5: read_mapping()
step6: start_gpfdists()
step7: do_method()

finally,it will:
step8: removing temporary data --connect db the second time
step9:killing gpfdist

we find when step8 got error(db was not connected),the process will sleeping.

thanks for visit.








yin.zhb@163.com
 
From: Lei Chang
Date: 2016-04-08 07:52
To: user
CC: dev
Subject: Re: May I mention some gpload issues?

please. thanks!

Cheers
Lei


On Thu, Apr 7, 2016 at 11:56 PM, yin.zhb@163.com <yi...@163.com> wrote:

Whan I using gpload in my work, I got some problems on it, 
May I mention some gpload issues here?



yin.zhb@163.com


Re: Re: May I mention some gpload issues?

Posted by "yin.zhb@163.com" <yi...@163.com>.
Our db team using greenplum for one year,we have 680g(one billion+ lines) data need to load to greenplum every day.
we writed a program to call gpload to load data every 10min. every time will load 100000 to 10000000+ lines.
we can accept abount 1/100000 error lines.

our environment:
os version: rhel 6.3
greenplum : 4.3.5.2
gpload    : 4.3.5.2

there's some problems we made when using gpload:

1、"line too long" 
this error make gpload failed, even if there is only one line in all my files needed load.
we set "error_limit","segment reject limit"  but not effected.
if i try to find the error line,it is very hard.so we set max_line_length to "1048576"

2、"no partition key"
this also make us headache, 
maybe there is only one line not correct(a delimiter in column not expected), or encoding not recognized;
this will make gpload failed like problem 1;


3、column too long
this will make gpload failed,too.
we replace all data type to text,to skip this question.

4、in my product environment,when the greenplum cluster got error,logged like this:
fatal","57m01","the database system is in mirror or uninitialized mode",,,,,,,0,,"postmaster.c",2994,

but gpload and gpfdist process sleeped,not exit.

we visit the gpload.py script,we found there is a problem not considered.

gpload.py load data like this steps:
step1: read_config()
step2: setup_connection() --connect db the first time
step3: read_table_metadata()
step4: read_columns()
step5: read_mapping()
step6: start_gpfdists()
step7: do_method()

finally,it will:
step8: removing temporary data --connect db the second time
step9:killing gpfdist

we find when step8 got error(db was not connected),the process will sleeping.

thanks for visit.








yin.zhb@163.com
 
From: Lei Chang
Date: 2016-04-08 07:52
To: user
CC: dev
Subject: Re: May I mention some gpload issues?

please. thanks!

Cheers
Lei


On Thu, Apr 7, 2016 at 11:56 PM, yin.zhb@163.com <yi...@163.com> wrote:

Whan I using gpload in my work, I got some problems on it, 
May I mention some gpload issues here?



yin.zhb@163.com


Re: May I mention some gpload issues?

Posted by Lei Chang <le...@apache.org>.
please. thanks!

Cheers
Lei


On Thu, Apr 7, 2016 at 11:56 PM, yin.zhb@163.com <yi...@163.com> wrote:

>
> Whan I using gpload in my work, I got some problems on it,
> May I mention some gpload issues here?
>
> ------------------------------
> yin.zhb@163.com
>

Re: May I mention some gpload issues?

Posted by Lei Chang <le...@apache.org>.
please. thanks!

Cheers
Lei


On Thu, Apr 7, 2016 at 11:56 PM, yin.zhb@163.com <yi...@163.com> wrote:

>
> Whan I using gpload in my work, I got some problems on it,
> May I mention some gpload issues here?
>
> ------------------------------
> yin.zhb@163.com
>