You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@vxquery.apache.org by Preston Carman <pr...@apache.org> on 2015/08/03 22:59:25 UTC
Re: Slider, Twill, and Flink

I looped in the dev list for this conversation.

A few thoughts:

On Sun, Aug 2, 2015 at 9:58 AM, Efi <ef...@gmail.com> wrote:

> Thank you Steven,
>
> There are 4 problems I encountered with slider and made me reconsider
> using it.
>
> 1. Slider requires from you to provide with one or more jar files to start
> and run your application.VXQuery does start the execution from the cli jar,
> instead it is initiated by a bash script that contains a lot of setup and
> configuration prior to running the query.So one problem is changing vxquery
> to run from the cli-jar too.
>
> First, the VXQuery cli jar is not the one to be run on the cluster. As I
understand the cluster process, the cluster controller (cc) and X number of
node controllers (nc) will be started by yarn.  Then the VXQuery cli is run
locally (or on a remote server) with the ip address of the cc. After
running VXQuery cli for each of the user's queries, the cluster could be
shutdown.
Second, two types of parameters are specified to the bash scripts to start
the jar files: java configuration and VXQuery cluster configuration
details. These will need to be accounted for during set up. It may be
better to store these settings in a configuration files instead of
parameters to the jar file.


> 2. When our users download vxquery they need to build it in order to be
> able to run the queries.Which means that we dont provide them with the
> necessary jars and other executable files that slider needs to run the
> application.So if that continues, the user, after the build, will have to
> add the files needed from slider in a zip along with the configuration that
> we will provide, configure slider for his yarn setup and create the cluster
> with that zip file.Otherwise we should provide the users with pre-build
> vxquery packages that contain the zip file and he will just do the rest for
> slider and yarn.
>
> The maven dependencies will need to be updated to support yarn and what
ever other libraries you will need. The users will always (at this point)
have to download the source and build VXQuery to run our system. Apache
does not supply binary files. Please plan on this use case as being the
default setting. Its ok to depend on these other libraries.


> 3. Slider requires a lot of configuration that we cannot do because it
> depends from the yarn setup each one has.So the user will have to figure
> out the way slider works and set it up for yarn.I believe that this is not
> very easy because the documentation is not good and I got lost a lot of
> times before finally figuring it out.The same goes for the documentation of
> Twill.
>
> My thoughts are that the user will already have a hadoop cluster running
with yarn. I don't think we want to make them change their configuration if
possible. I guess we could have certain requirements for the yarn cluster.
They should be reasonable.


> 4. Zookeeper is required for both twill and slider,along with yarn.

I believe Hyracks (or it may be AsterixDB) is already dependent on
Zookeeper so this is not new. Even if only AsterixDB is dependent on
Zookeeper, I think this is ok for our system.


> I would prefer implementing the yarn cluster configuration the way flink
> has it because after working with flink,slider and twill I found flink the
> easiest to setup,run and use.
>
> All that being said, I am looking for your recommendation and what you
think is the best solution. The suggestion to look at Twill and Slider was
to help make implementation and management easier. I am open to using
flink's solution if we can have a good implementation, user friendly set
up, and low maintenance (for code base and cluster management). You could
even suggestion that Flicks solution be the first implementation that is
later upgraded to one of these tools (or not upgraded all).

Based on the above questions, I am wondering if we have the same vision for
the Yarn cluster set up. Could you post a short overview of the cluster
management process? The overview could later be expanded for our website
documentation.



> Please tell me any questions/objections you have regarding these issues.
>
> Best regards,
> Efi
>
>
> On 02/08/2015 07:24 μμ, Steven Jacobs wrote:
>
>> Hi,
>> Efi has been looking at both Twill and Slider as possible ways to
>> integrate YARN with VXQuery, and has been hitting several roadblocks. She
>> only has two weeks left of actual coding, and was thinking of switching to
>> an Apache Flink solution. I wanted to see what your thoughts are on the
>> issues that she is having (she will elaborate more here) and whether it
>> would be better to get something working with Flink, which should be fine
>> within two weeks, or continue exploring with Slider, which might not reach
>> a conclusion within two weeks.
>>
>> Efi-Please elaborate more on your issues here.
>>
>> Steven
>>
>
>