You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by 杨海乐 <ya...@letv.com> on 2016/11/18 06:59:32 UTC

Kylin Job Server is single-point

Hello all,
     Kylin's job server must be  single-point ? and if job server crashs,
all job must restart .  There is some method to  solve this problem?

--
View this message in context: http://apache-kylin.74782.x6.nabble.com/Kylin-Job-Server-is-single-point-tp6336.html
Sent from the Apache Kylin mailing list archive at Nabble.com.

Re: Kylin Job Server is single-point

Posted by lujia17 <lu...@126.com>.
excellent,thx Shaofeng!
we are tring to find the root cause

Lu Jia

> 在 2016年11月21日,下午3:37,ShaoFeng Shi <sh...@apache.org> 写道:
> 
> Share a small "watch dog" script that we used before, you can run it with
> Linux cron periodically to check whether Kylin is running; But this is not
> the way we proposed; we strongly suggest you to investigate the root cause
> of crashes. Usually it was caused by bad-designed cube, like using
> Dictionary encoding for a UHC column.
> 
> #!/bin/bash
> 
> export KYLIN_HOME="/kylin/kylin-1.5.4.1-bin"
> 
> if [ ! -f "${KYLIN_HOME}/pid" ]
> 
>   then
> 
>    echo "$(date) kylin is stopped, do nothing"
> 
>    exit 0
> 
> fi
> 
> PID=`cat $KYLIN_HOME/pid`
> 
> if ps -p $PID > /dev/null
> 
> then
> 
>  echo "$(date): Process is running, do nothing"
> 
> else
> 
>  echo "$(date): Pid $(PID) not exists, start kylin"
> 
>  export
> PATH=/usr/lib64/qt-3.3/bin:/usr/bin:/bin:/usr/local/bin::/usr/local/sbin:/usr/sbin:/sbin:/apache/hadoop/bin:/apache/hbase/bin:/apache/pig/bin:/apache/hive/bin
> 
>  $KYLIN_HOME/bin/kylin.sh start
> 
> fi
> 
> 
> 
> 2016-11-21 14:06 GMT+08:00 Li Yang <li...@apache.org>:
> 
>> Just to be clear, you still need to restart the Kylin process manually. And
>> once Kylin process is up, it will resume all running jobs automatically.
>> 
>> To auto-restart a dead Kylin process, you need some tools. Could be as
>> simple as a cron job, to detect Kylin PID periodically and restart it when
>> it's dead.
>> 
>> Yang
>> 
>>> On Sat, Nov 19, 2016 at 6:00 PM, 路加126 <lu...@126.com> wrote:
>>> 
>>> Thanks Shaofeng!
>>> 
>>> 
>>> Best Regards,
>>> Lu Jia(Luke)
>>> 
>>> 
>>> 
>>>> 在 2016年11月19日,下午4:01,ShaoFeng Shi <sh...@apache.org> 写道:
>>>> 
>>>> The auto-resume has been there for some time; 1.5.4 should have it,
>>> suggest
>>>> to upgrade to the latest version.
>>>> 
>>>> 2016-11-19 14:55 GMT+08:00 路加126 <lu...@126.com>:
>>>> 
>>>>> hi Yang:
>>>>> 
>>>>> Could you tell me how to configure resuming automatically?
>>>>> I met several times of job server process crash, resumed manually or
>> via
>>>>> shell script. My version is 1.5.2.
>>>>> 
>>>>> 
>>>>> Best Regards,
>>>>> Lu Jia(Luke)
>>>>> 
>>>>> 
>>>>> 
>>>>>> 在 2016年11月18日,下午5:43,Li Yang <li...@apache.org> 写道:
>>>>>> 
>>>>>> Just to be clear, even now, job won't have to restart from the
>>> beginning
>>>>>> after a job server crash. After job server bounces, all job will
>> resume
>>>>>> automatically from its last running step. Even better, if the last
>>>>> running
>>>>>> step is a MR job, MR job will continue to run without any loss. That
>> is
>>>>>> because job server is just a coordinator, it does not do any actual
>>> work
>>>>> by
>>>>>> itself.
>>>>>> 
>>>>>> Yang
>>>>>> 
>>>>>>> On Fri, Nov 18, 2016 at 3:48 PM, 杨海乐 <ya...@letv.com> wrote:
>>>>>>> 
>>>>>>> Thanks very much @康凯森
>>>>>>> waiting upgrade
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> View this message in context: http://apache-kylin.74782.x6.
>>>>>>> nabble.com/Kylin-Job-Server-is-single-point-tp6336p6342.html
>>>>>>> Sent from the Apache Kylin mailing list archive at Nabble.com.
>>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> 
>>>> Shaofeng Shi 史少锋
>>> 
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi 史少锋



Re: Kylin Job Server is single-point

Posted by ShaoFeng Shi <sh...@apache.org>.
Share a small "watch dog" script that we used before, you can run it with
Linux cron periodically to check whether Kylin is running; But this is not
the way we proposed; we strongly suggest you to investigate the root cause
of crashes. Usually it was caused by bad-designed cube, like using
Dictionary encoding for a UHC column.

#!/bin/bash

export KYLIN_HOME="/kylin/kylin-1.5.4.1-bin"

if [ ! -f "${KYLIN_HOME}/pid" ]

   then

    echo "$(date) kylin is stopped, do nothing"

    exit 0

fi

PID=`cat $KYLIN_HOME/pid`

if ps -p $PID > /dev/null

then

  echo "$(date): Process is running, do nothing"

else

  echo "$(date): Pid $(PID) not exists, start kylin"

  export
PATH=/usr/lib64/qt-3.3/bin:/usr/bin:/bin:/usr/local/bin::/usr/local/sbin:/usr/sbin:/sbin:/apache/hadoop/bin:/apache/hbase/bin:/apache/pig/bin:/apache/hive/bin

  $KYLIN_HOME/bin/kylin.sh start

fi



2016-11-21 14:06 GMT+08:00 Li Yang <li...@apache.org>:

> Just to be clear, you still need to restart the Kylin process manually. And
> once Kylin process is up, it will resume all running jobs automatically.
>
> To auto-restart a dead Kylin process, you need some tools. Could be as
> simple as a cron job, to detect Kylin PID periodically and restart it when
> it's dead.
>
> Yang
>
> On Sat, Nov 19, 2016 at 6:00 PM, 路加126 <lu...@126.com> wrote:
>
> > Thanks Shaofeng!
> >
> >
> > Best Regards,
> > Lu Jia(Luke)
> >
> >
> >
> > > 在 2016年11月19日,下午4:01,ShaoFeng Shi <sh...@apache.org> 写道:
> > >
> > > The auto-resume has been there for some time; 1.5.4 should have it,
> > suggest
> > > to upgrade to the latest version.
> > >
> > > 2016-11-19 14:55 GMT+08:00 路加126 <lu...@126.com>:
> > >
> > >> hi Yang:
> > >>
> > >> Could you tell me how to configure resuming automatically?
> > >> I met several times of job server process crash, resumed manually or
> via
> > >> shell script. My version is 1.5.2.
> > >>
> > >>
> > >> Best Regards,
> > >> Lu Jia(Luke)
> > >>
> > >>
> > >>
> > >>> 在 2016年11月18日,下午5:43,Li Yang <li...@apache.org> 写道:
> > >>>
> > >>> Just to be clear, even now, job won't have to restart from the
> > beginning
> > >>> after a job server crash. After job server bounces, all job will
> resume
> > >>> automatically from its last running step. Even better, if the last
> > >> running
> > >>> step is a MR job, MR job will continue to run without any loss. That
> is
> > >>> because job server is just a coordinator, it does not do any actual
> > work
> > >> by
> > >>> itself.
> > >>>
> > >>> Yang
> > >>>
> > >>> On Fri, Nov 18, 2016 at 3:48 PM, 杨海乐 <ya...@letv.com> wrote:
> > >>>
> > >>>> Thanks very much @康凯森
> > >>>> waiting upgrade
> > >>>>
> > >>>>
> > >>>> --
> > >>>> View this message in context: http://apache-kylin.74782.x6.
> > >>>> nabble.com/Kylin-Job-Server-is-single-point-tp6336p6342.html
> > >>>> Sent from the Apache Kylin mailing list archive at Nabble.com.
> > >>>>
> > >>
> > >>
> > >>
> > >
> > >
> > > --
> > > Best regards,
> > >
> > > Shaofeng Shi 史少锋
> >
> >
> >
>



-- 
Best regards,

Shaofeng Shi 史少锋

Re: Kylin Job Server is single-point

Posted by Li Yang <li...@apache.org>.
Just to be clear, you still need to restart the Kylin process manually. And
once Kylin process is up, it will resume all running jobs automatically.

To auto-restart a dead Kylin process, you need some tools. Could be as
simple as a cron job, to detect Kylin PID periodically and restart it when
it's dead.

Yang

On Sat, Nov 19, 2016 at 6:00 PM, 路加126 <lu...@126.com> wrote:

> Thanks Shaofeng!
>
>
> Best Regards,
> Lu Jia(Luke)
>
>
>
> > 在 2016年11月19日,下午4:01,ShaoFeng Shi <sh...@apache.org> 写道:
> >
> > The auto-resume has been there for some time; 1.5.4 should have it,
> suggest
> > to upgrade to the latest version.
> >
> > 2016-11-19 14:55 GMT+08:00 路加126 <lu...@126.com>:
> >
> >> hi Yang:
> >>
> >> Could you tell me how to configure resuming automatically?
> >> I met several times of job server process crash, resumed manually or via
> >> shell script. My version is 1.5.2.
> >>
> >>
> >> Best Regards,
> >> Lu Jia(Luke)
> >>
> >>
> >>
> >>> 在 2016年11月18日,下午5:43,Li Yang <li...@apache.org> 写道:
> >>>
> >>> Just to be clear, even now, job won't have to restart from the
> beginning
> >>> after a job server crash. After job server bounces, all job will resume
> >>> automatically from its last running step. Even better, if the last
> >> running
> >>> step is a MR job, MR job will continue to run without any loss. That is
> >>> because job server is just a coordinator, it does not do any actual
> work
> >> by
> >>> itself.
> >>>
> >>> Yang
> >>>
> >>> On Fri, Nov 18, 2016 at 3:48 PM, 杨海乐 <ya...@letv.com> wrote:
> >>>
> >>>> Thanks very much @康凯森
> >>>> waiting upgrade
> >>>>
> >>>>
> >>>> --
> >>>> View this message in context: http://apache-kylin.74782.x6.
> >>>> nabble.com/Kylin-Job-Server-is-single-point-tp6336p6342.html
> >>>> Sent from the Apache Kylin mailing list archive at Nabble.com.
> >>>>
> >>
> >>
> >>
> >
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi 史少锋
>
>
>

Re: Kylin Job Server is single-point

Posted by 路加126 <lu...@126.com>.
Thanks Shaofeng!


Best Regards,
Lu Jia(Luke)



> 在 2016年11月19日,下午4:01,ShaoFeng Shi <sh...@apache.org> 写道:
> 
> The auto-resume has been there for some time; 1.5.4 should have it, suggest
> to upgrade to the latest version.
> 
> 2016-11-19 14:55 GMT+08:00 路加126 <lu...@126.com>:
> 
>> hi Yang:
>> 
>> Could you tell me how to configure resuming automatically?
>> I met several times of job server process crash, resumed manually or via
>> shell script. My version is 1.5.2.
>> 
>> 
>> Best Regards,
>> Lu Jia(Luke)
>> 
>> 
>> 
>>> 在 2016年11月18日,下午5:43,Li Yang <li...@apache.org> 写道:
>>> 
>>> Just to be clear, even now, job won't have to restart from the beginning
>>> after a job server crash. After job server bounces, all job will resume
>>> automatically from its last running step. Even better, if the last
>> running
>>> step is a MR job, MR job will continue to run without any loss. That is
>>> because job server is just a coordinator, it does not do any actual work
>> by
>>> itself.
>>> 
>>> Yang
>>> 
>>> On Fri, Nov 18, 2016 at 3:48 PM, 杨海乐 <ya...@letv.com> wrote:
>>> 
>>>> Thanks very much @康凯森
>>>> waiting upgrade
>>>> 
>>>> 
>>>> --
>>>> View this message in context: http://apache-kylin.74782.x6.
>>>> nabble.com/Kylin-Job-Server-is-single-point-tp6336p6342.html
>>>> Sent from the Apache Kylin mailing list archive at Nabble.com.
>>>> 
>> 
>> 
>> 
> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi 史少锋



Re: Kylin Job Server is single-point

Posted by ShaoFeng Shi <sh...@apache.org>.
The auto-resume has been there for some time; 1.5.4 should have it, suggest
to upgrade to the latest version.

2016-11-19 14:55 GMT+08:00 路加126 <lu...@126.com>:

> hi Yang:
>
> Could you tell me how to configure resuming automatically?
> I met several times of job server process crash, resumed manually or via
> shell script. My version is 1.5.2.
>
>
> Best Regards,
> Lu Jia(Luke)
>
>
>
> > 在 2016年11月18日,下午5:43,Li Yang <li...@apache.org> 写道:
> >
> > Just to be clear, even now, job won't have to restart from the beginning
> > after a job server crash. After job server bounces, all job will resume
> > automatically from its last running step. Even better, if the last
> running
> > step is a MR job, MR job will continue to run without any loss. That is
> > because job server is just a coordinator, it does not do any actual work
> by
> > itself.
> >
> > Yang
> >
> > On Fri, Nov 18, 2016 at 3:48 PM, 杨海乐 <ya...@letv.com> wrote:
> >
> >> Thanks very much @康凯森
> >> waiting upgrade
> >>
> >>
> >> --
> >> View this message in context: http://apache-kylin.74782.x6.
> >> nabble.com/Kylin-Job-Server-is-single-point-tp6336p6342.html
> >> Sent from the Apache Kylin mailing list archive at Nabble.com.
> >>
>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: Kylin Job Server is single-point

Posted by 路加126 <lu...@126.com>.
hi Yang:

Could you tell me how to configure resuming automatically?
I met several times of job server process crash, resumed manually or via shell script. My version is 1.5.2.


Best Regards,
Lu Jia(Luke)



> 在 2016年11月18日,下午5:43,Li Yang <li...@apache.org> 写道:
> 
> Just to be clear, even now, job won't have to restart from the beginning
> after a job server crash. After job server bounces, all job will resume
> automatically from its last running step. Even better, if the last running
> step is a MR job, MR job will continue to run without any loss. That is
> because job server is just a coordinator, it does not do any actual work by
> itself.
> 
> Yang
> 
> On Fri, Nov 18, 2016 at 3:48 PM, 杨海乐 <ya...@letv.com> wrote:
> 
>> Thanks very much @康凯森
>> waiting upgrade
>> 
>> 
>> --
>> View this message in context: http://apache-kylin.74782.x6.
>> nabble.com/Kylin-Job-Server-is-single-point-tp6336p6342.html
>> Sent from the Apache Kylin mailing list archive at Nabble.com.
>> 



Re: Kylin Job Server is single-point

Posted by Li Yang <li...@apache.org>.
Just to be clear, even now, job won't have to restart from the beginning
after a job server crash. After job server bounces, all job will resume
automatically from its last running step. Even better, if the last running
step is a MR job, MR job will continue to run without any loss. That is
because job server is just a coordinator, it does not do any actual work by
itself.

Yang

On Fri, Nov 18, 2016 at 3:48 PM, 杨海乐 <ya...@letv.com> wrote:

> Thanks very much @康凯森
> waiting upgrade
>
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Kylin-Job-Server-is-single-point-tp6336p6342.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>

Re: Kylin Job Server is single-point

Posted by 杨海乐 <ya...@letv.com>.
Thanks very much @康凯森
waiting upgrade


--
View this message in context: http://apache-kylin.74782.x6.nabble.com/Kylin-Job-Server-is-single-point-tp6336p6342.html
Sent from the Apache Kylin mailing list archive at Nabble.com.

Re: Kylin Job Server is single-point

Posted by 康凯森 <ka...@qq.com>.
Hi, Haile.
Since 1.6.1, Kylin will support distributed job build server and make job server more extensible, available and reliable.
The related jira is KYLIN-2006.


------------------ Original ------------------
From:  "杨海乐";<ya...@letv.com>;
Date:  Fri, Nov 18, 2016 02:59 PM
To:  "dev"<de...@kylin.apache.org>; 

Subject:  Kylin Job Server is single-point



Hello all,
     Kylin's job server must be  single-point ? and if job server crashs,
all job must restart .  There is some method to  solve this problem?

--
View this message in context: http://apache-kylin.74782.x6.nabble.com/Kylin-Job-Server-is-single-point-tp6336.html
Sent from the Apache Kylin mailing list archive at Nabble.com.