You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@guacamole.apache.org by Nick Couchman <vn...@apache.org> on 2019/11/01 02:14:03 UTC

Re: Apache Guacamole Senior Project for HA

On Mon, Oct 28, 2019 at 4:41 PM Barkdoll, Michael A <mb...@cs.siu.edu>
wrote:

> Hello Apache Guacamole Developers,
>
> We have several computer science students who have selected to work on
> Apache Guacamole to help provide high availability.  This is related to:
>
> https://issues.apache.org/jira/browse/GUACAMOLE-283
>
> Our situation is pretty straightforward, we'd love to use Apache Guacamole
> for offering some online labs from a pool of virtual machines.  This is
> known inside Apache Guacamole as a balancing connection group.  We
> previously attempted to use multiple instances of Apache guacamole-client
> containers behind a reverse proxy load balancer (nginx) from the web
> browser's websocket connections.  We noticed that the user's session data
> inside the balancing connection group wasn't shared between different
> instances of guacamole-client in a highly concurrent nature.  This senior
> project will primarily focus on making the balancing connection group
> feature work across multiple guacamole-client instances behind a load
> balancer and fixing related bugs (e.g.,
> https://issues.apache.org/jira/browse/GUACAMOLE-791).
>
>
Welcome!  This sounds great, and I, personally think this is a really cool
thing to do with CS students!


> We'd like this work merged into Apache Guacamole so that other end-users
> can scale up their Apache Guacamole deployment inside environments like
> kubernetes.  I'm helping the students construct the project requirements as
> a mentor and we have a Computer Science professor who will also be advising
> the students.  Since Apache Guacamole is a community project we're looking
> for feedback and collaboration with other Apache Guacamole developers to
> make all of this possible.  We essentially need to craft the project
> requirements in the next few weeks so we can determine what portion of the
> work the students will be required to complete.  The students have roughly
> a semester to get familiar with the code base and submit the feature
> request so we're hoping to overdeliver but leave the initial requirements
> simplistic.  Our end goal is to submit a pull request to Apache Guacamole
> that will meet the expected requirements for their portion of the work.
> The overall community expectations for high availability are likely to be
> far-reaching and will likely fall outside of the scope of the student's
> availability.  However, if time allows students will make their best
> efforts to accomodiate and collaborate on these matters.
>
>
Sounds like a good plan.


> Here are the current main requirements that I've started to draft:
>
>   1.  Store connection tracking for active connections in a database.
>   2.  Implement a shared/distributed caching mechanism (Hazelcast,
> memcached) where multiple systems can access the active connection data.
>
>
Those are certainly a couple of possible routes that jump out to me, but
there may be others that are worth exploring.  I'm not familiar with how
other projects/products handle synchronization of data among multiple
nodes?  Maybe another option could be some sort of cluster communicate
service/port?  Or maybe some of the Java Application Servers (Tomcat,
Websphere, Weblogic, JBOSS) have some existing mechanisms for this sort of
thing that we should be looking at leveraging?  I don't know...I'm not
expert or this stuff!! :-).


> Currently, it is my understanding that guacamole-client's tomcat .WAR has
> a java servlet that communicates asynchronously with the AngularJS client.
> We plan to look into the code and documentation surrounding that process.
> My current best understanding of that process is listed at:
>
> http://apache-guacamole-general-user-mailing-list.2363388.n4.nabble.com/Is-there-a-way-to-get-the-list-of-guacamole-users-thru-API-td4394.html#a4397
>
>
This is reasonably accurate - there are really two major components of the
servlet (the WAR file and related extensions):
- The RESTful API, allowing the servlet and the AngularJS client to
communicate with one another.
- The tunnel, which allows the AngularJS client to communicate with guacd
(Guacamole Server) to handle the actual data associated with the connection
(the Guacamole protocol).

I think most of what needs to be implemented would be in the RESTful API
side of things - keeping track of connections and synchronizing that data
and the history among multiple nodes.  Beyond just tracking active numbers
of connections, you'd also probably want to be able to scale out to
multiple guacd instances, and keep track of which guacd instance is being
used by a certain connection, so that if someone else is joining an
existing connection the client knows where to send them to connect.


> There are also some python tools for communicating with the RESTFUL API:
> https://github.com/necouchman/guacamole-python
> https://github.com/pschmitt/guacapy
>
>
Yeah, I threw together the first one just as an experiment and to prove out
it could be done - it's very incomplete and hasn't developed much in the
past couple of years, but it does some basic demonstration of how to use
the API outside of the AngularJS application.


> Concerning the project requirements, we require some clarification about
> the communities needs and what areas the project will require work related
> to:
>
>   1.  Should the connection tracking for active connections work
> standalone for multiple guacamole-clients with a database connection
> (without a seperate docker container)?
>      *   This would seem to be the most compatible method.
>   2.  Should the shared/distributed caching mechanism be inside a separate
> docker container?
>      *   This would allow us to offer high availability of different
> microservices.
>

These questions seem to be very focused on containers, which I realize is
the goal of the project; however, I think things should be architected such
that they can work both inside and outside of docker.  For these particular
changes, this means that, if some sort of external mechanism is used to
synchronize connection data (memcached, for example), it should be
something that operates in an environment where I can manually deploy
multiple Guacamole Client WAR files into separate Tomcat instances and
point them at the data sharing mechanism myself.

How that is implemented in a Docker environment is a related, but separate
question.  For that, Docker best-practices should be followed, which I
believe would involve running the data-sharing mechanism in a separate
container.

Unless, of course, we go with something internal to the WAR file (a
port/service/etc.) ;-).


>   3.  Is there a preferred shared/distributed caching mechanism (Hazlecat,
> memcached)?
>

At this point, no.  I would say whatever is freely-available, in keeping
with the open source nature of the project.  But, I would also say options
are better.  So, if there's a way to implement it in a modular fashion,
that would be good - but, one thing at a time.


>   4.  Do we require that the Hazlercast/memcached fallback to a database
> for recovery?
>

Not exactly sure what you mean, here, but I would say that some of the
information - like history - needs to be persistently stored within a
database.  I would not think that having history disappear with the sharing
mechanism would be desirable behavior.


>   5.  Will there be changes required inside the AngularJS application for
> this feature request?  Just trying to get a feel for what code requirements
> to put in the official project requirements.
>
>
That may be a little hard to say at this point.  In my mind, I think it
could all be done within the servlet without any modifications to the
AngularJS code, but that could just be because I'm not thinking things all
the way through and haven't fully realized the implications of it.  I would
think that, at the very least, minimal API changes would be desirable to
maintain compatibility.


> I'm curious if any of the Apache Guacamole Developers would be interested
> in being mentors to the students.  Your involvement could be as simple as
> having helped define the project requirements or as far-reaching as
> agreeing to attempt to answer questions the student's face with their
> project.  After the project requirements have been completed you could
> judge the student's contributions and determine which students could
> reference you as their mentor.
>
>
I'm happy to do what I can, but I'm no professional coder, so my answers
may not be the best ones coming from the developers.  My day job is also
keeping me very busy these days, so my responses may be severely
delayed...like this e-mail.


> Lastly, I'm curious if I've missed any major requirements or if certain
> areas need to have modifications in terms of the initial project
> requirements.  We'd also love to hear the community's desire for additional
> high availability requirements that they'd like to see implemented.  I can
> think of quite a few off the top of my head.  This will likely stem into
> another senior project or work for the community to focus on in the future.
>
>
Who knows?!  I'm sure that I've missed something, but that's what community
is about, right - our different vantage points allow us to see each other's
blind spots and build something better!


> Thank you for your time and feedback on this matter.  We really appreciate
> the developers of the Apache Guacamole Project for making an open-source
> project of this nature available to our students!!!
>
>
Thanks for the interest - I look forward to seeing where this goes!  HA is
certainly a great place for this, at least I think so, but there are plenty
of other tasks to be done, too!

-Nick