You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@twill.apache.org by Ivan Balashov <ib...@gmail.com> on 2016/01/27 09:22:20 UTC

Using Twill without ZK or Kafka

Hi,

Is it possible to use Twill with bare Yarn cluster, without ZK or Kafka?
Also, does Yarn cluster need to have HDFS in order to benefit from Twill?
Or buckets like AWS or GCS can be enough?

Thanks,

Re: Using Twill without ZK or Kafka

Posted by Terence Yim <ch...@gmail.com>.

Hi Ivan,

The Twill uses ZK and Kafka internally and is mostly hidden from the user.
It simplifies YARN by providing a simple and intuitive API with additional
features that are commonly needed in distributed application such as
coordination, discovery, log collection and recovery through the help of ZK
and Kafka underneath.

It is possible for twill not to have dependencies on ZK, however that will
require certain code change to make that happen. Would you mind firing a
JIRA for that?

Thanks,
Terence

On Thu, Jan 28, 2016 at 7:59 AM, Ivan Balashov <ib...@gmail.com> wrote:

> Terence,
>
> Thanks for the detailed answer.
>
> What if client app does not need state recovery and other features
> implemented with use of Zk?
>
> It was a little surprising to hear about Zk and Kafka since they are
> complex systems in themselves,
> and Twill's primary goal is to lose complexity (of Yarn), but not acquire
> two new ones.
>
> Any chance Twill could make Zk dependency optional?
>
>
> 2016-01-27 20:41 GMT+02:00 Terence Yim <ch...@gmail.com>:
>
> > Hi Ivan,
> >
> > Twill relies on ZK for couple core functionalities, such as application
> > state recovery, service discovery and messaging, hence would be difficult
> > for the current Twill version to run without ZK.
> >
> > Twill doesn't require Kafka to run, however, each AM starts an embedded
> > Kafka (running in the same JVM as the AM) for the log collection purpose
> > for all the TwillRunnables controlled by that AM. We have a JIRA
> TWILL-147
> > <https://issues.apache.org/jira/browse/TWILL-147> to allow using an
> > external Kafka/turning off log collection support.
> >
> > For the filesystem, it only needs a distributed file system which is
> > accessible through the HDFS API, but not necessarily HDFS implementation.
> > We've tested Twill can works on MapR FS and Azure FS as well. I believe
> the
> > same should go for AWS or GCS.
> >
> > Terence
> >
> > On Wed, Jan 27, 2016 at 12:22 AM, Ivan Balashov <ib...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Is it possible to use Twill with bare Yarn cluster, without ZK or
> Kafka?
> > > Also, does Yarn cluster need to have HDFS in order to benefit from
> Twill?
> > > Or buckets like AWS or GCS can be enough?
> > >
> > > Thanks,
> > >
> >
>

Re: Using Twill without ZK or Kafka

Posted by Ivan Balashov <ib...@gmail.com>.

Terence,

Thanks for the detailed answer.

What if client app does not need state recovery and other features
implemented with use of Zk?

It was a little surprising to hear about Zk and Kafka since they are
complex systems in themselves,
and Twill's primary goal is to lose complexity (of Yarn), but not acquire
two new ones.

Any chance Twill could make Zk dependency optional?


2016-01-27 20:41 GMT+02:00 Terence Yim <ch...@gmail.com>:

> Hi Ivan,
>
> Twill relies on ZK for couple core functionalities, such as application
> state recovery, service discovery and messaging, hence would be difficult
> for the current Twill version to run without ZK.
>
> Twill doesn't require Kafka to run, however, each AM starts an embedded
> Kafka (running in the same JVM as the AM) for the log collection purpose
> for all the TwillRunnables controlled by that AM. We have a JIRA TWILL-147
> <https://issues.apache.org/jira/browse/TWILL-147> to allow using an
> external Kafka/turning off log collection support.
>
> For the filesystem, it only needs a distributed file system which is
> accessible through the HDFS API, but not necessarily HDFS implementation.
> We've tested Twill can works on MapR FS and Azure FS as well. I believe the
> same should go for AWS or GCS.
>
> Terence
>
> On Wed, Jan 27, 2016 at 12:22 AM, Ivan Balashov <ib...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Is it possible to use Twill with bare Yarn cluster, without ZK or Kafka?
> > Also, does Yarn cluster need to have HDFS in order to benefit from Twill?
> > Or buckets like AWS or GCS can be enough?
> >
> > Thanks,
> >
>

Re: Using Twill without ZK or Kafka

Posted by Terence Yim <ch...@gmail.com>.

Hi Ivan,

Twill relies on ZK for couple core functionalities, such as application
state recovery, service discovery and messaging, hence would be difficult
for the current Twill version to run without ZK.

Twill doesn't require Kafka to run, however, each AM starts an embedded
Kafka (running in the same JVM as the AM) for the log collection purpose
for all the TwillRunnables controlled by that AM. We have a JIRA TWILL-147
<https://issues.apache.org/jira/browse/TWILL-147> to allow using an
external Kafka/turning off log collection support.

For the filesystem, it only needs a distributed file system which is
accessible through the HDFS API, but not necessarily HDFS implementation.
We've tested Twill can works on MapR FS and Azure FS as well. I believe the
same should go for AWS or GCS.

Terence

On Wed, Jan 27, 2016 at 12:22 AM, Ivan Balashov <ib...@gmail.com> wrote:

> Hi,
>
> Is it possible to use Twill with bare Yarn cluster, without ZK or Kafka?
> Also, does Yarn cluster need to have HDFS in order to benefit from Twill?
> Or buckets like AWS or GCS can be enough?
>
> Thanks,
>