You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@asterixdb.apache.org by abdullah alamoudi <ba...@gmail.com> on 2015/08/23 19:08:28 UTC

The solution to the sporadic connection refused exceptions

About 3-4 days ago, I was working on the addition of the filesystem based
feed adapter and it didn't take anytime to complete. However, when I wanted
to build and make sure all tests pass, I kept getting ConnectionRefused
errors which caused the installer tests to fail every now and then.

I knew the new change had nothing to do with this failure, yet, I couldn't
direct my attention away from this bug (It just bothered me so much and I
knew it needs to be resolved ASAP). After wasting countless hours, I was
finally able to figure out what was happening :-)

In the startup routine, we start three Jetty web servers (Web interface
server, JSON API server, and Feed server). Sometime ago, we used to end the
startup call before making sure the server.isStarted() method returns true
on all servers. At that time, I introduced the waitUntilServerStarts method
to make sure we don't return before the servers are ready. Turned out, that
was an incorrect way to handle this (We can blame stackoverflow for this
one!) and it is not enough that the server isStarted() returns true. The
correct way to do this is to call the server.join() method after the
server.start().

See:
http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join

This was equally satisfying as it was frustrating and you are welcome for
the future time I saved each of you :)
-- 
Amoudi, Abdullah.

Re: The solution to the sporadic connection refused exceptions

Posted by Mike Carey <dt...@gmail.com>.

😃
On Aug 23, 2015 7:08 AM, "abdullah alamoudi" <ba...@gmail.com> wrote:

> About 3-4 days ago, I was working on the addition of the filesystem based
> feed adapter and it didn't take anytime to complete. However, when I wanted
> to build and make sure all tests pass, I kept getting ConnectionRefused
> errors which caused the installer tests to fail every now and then.
>
> I knew the new change had nothing to do with this failure, yet, I couldn't
> direct my attention away from this bug (It just bothered me so much and I
> knew it needs to be resolved ASAP). After wasting countless hours, I was
> finally able to figure out what was happening :-)
>
> In the startup routine, we start three Jetty web servers (Web interface
> server, JSON API server, and Feed server). Sometime ago, we used to end the
> startup call before making sure the server.isStarted() method returns true
> on all servers. At that time, I introduced the waitUntilServerStarts method
> to make sure we don't return before the servers are ready. Turned out, that
> was an incorrect way to handle this (We can blame stackoverflow for this
> one!) and it is not enough that the server isStarted() returns true. The
> correct way to do this is to call the server.join() method after the
> server.start().
>
> See:
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
>
> This was equally satisfying as it was frustrating and you are welcome for
> the future time I saved each of you :)
> --
> Amoudi, Abdullah.
>

Re: The solution to the sporadic connection refused exceptions

Posted by Till Westmann <ti...@apache.org>.

Continuing this discussion on https://asterix-gerrit.ics.uci.edu/#/c/365 <https://asterix-gerrit.ics.uci.edu/#/c/365> (which gets mirrored on this list anyway).

Cheers,
Till

> On Aug 27, 2015, at 11:52 PM, Ian Maxon <im...@uci.edu> wrote:
> 
>> And Managix uses Zookeeper to mange its information, but YARN doesn’t.
> 
> To put some background into this, I only chose to eschew use of ZK
> because it isn't a requirement in a YARN 2.2.0 cluster, and I could do
> what I needed via HDFS and some polling on the CC. I'm not opposed to
> integrating it further though (and making the YARN client take use of
> that).
> 
> - Ian
> 
> On Thu, Aug 27, 2015 at 7:58 PM, Till Westmann <ti...@apache.org> wrote:
>> I’m not really deep into this topic, but I’d like to understand a little better.
>> 
>> As I understand it, we currently have 2 ways to deploy/manage AsterixDB: a) using Managix and b) using YARN.
>> And Managix uses Zookeeper to mange its information, but YARN doesn’t.
>> Also, neither the Asterix CC or NC depend on the existence of Zookeeper.
>> 
>> Is this correct so far?
>> 
>> Now we are trying to find a way to ensure that an AsterixDB client can reliably know if the cluster is up or down.
>> 
>> My first assumption for the properties that the solution to this problem would have is:
>> 1) The knowledge if the cluster is up or down is available in the CC (as it controls the cluster).
>> 2) The mechanism used to expose that information works for both ways to deploy/manage a cluster.
>> 
>> As simple way to do that seems to be to send a request “waitUntilStarted” to the CC that returns to the client once the CC has determined that everything has started. The response to that request would either be “yes" (cluster is up), “no” (an error occurred and it won’t be up without intervention), or “not sure” (timeout - please ask again later). This would imply that the client is polling, but it wouldn’t be very busy if the timeout is reasonable.
>> 
>> Now this doesn’t seem to be where the discussion is going and I’d like to find out where is is going and why.
>> 
>> Could you help me?
>> 
>> Thanks,
>> Till
>> 
>> 
>>> On Aug 25, 2015, at 7:23 AM, Raman Grover <ra...@gmail.com> wrote:
>>> 
>>> As I mentioned before...
>>> "The information for an AsterixDB instance is "lazily" refreshed when a
>>> management operation is invoked (using managix set of commands) or an
>>> explicit describe command is invoked. "
>>> 
>>> Above, the commands are the Managix set of commands (create, start,
>>> describe etc.) that trigger a refresh and so its "lazy". Currently CC does
>>> not notify Managix. what we are discussing are the elegant way to have CC
>>> relay information to Managix.
>>> 
>>> On Tue, Aug 25, 2015 at 4:10 AM, abdullah alamoudi <ba...@gmail.com>
>>> wrote:
>>> 
>>>> I don't think that is there yet but the intention is to have it at some
>>>> point in the future.
>>>> 
>>>> Cheers,
>>>> Abdullah.
>>>> 
>>>> On Tue, Aug 25, 2015 at 12:38 PM, Chris Hillery <ch...@hillery.land>
>>>> wrote:
>>>> 
>>>>> Very interesting, thank you. Can you point out a couple places in the
>>>> code
>>>>> where some of this logic is kept? Specifically where "CC can update this
>>>>> information and notify Managix" sounds interesting...
>>>>> 
>>>>> Ceej
>>>>> aka Chris Hillery
>>>>> 
>>>>> On Tue, Aug 25, 2015 at 12:49 AM, Raman Grover <ra...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>>> , and what code is
>>>>>>> responsible for keeping it up-to-date?
>>>>>>> 
>>>>>> Apparently, no one is :-)
>>>>>> 
>>>>>> The information for an AsterixDB instance is "lazily" refreshed when a
>>>>>> management operation is invoked (using managix set of commands) or an
>>>>>> explicit describe command is invoked.
>>>>>> Between the time t1 (when state of an AsterixDB instance changes, say
>>>> due
>>>>>> to NC failure) and t2 (when  a management operation is invoked), the
>>>>>> information about the AsterixDB instance inside Zookeeper remains
>>>> stale.
>>>>> CC
>>>>>> can update this information and notify Managix; this way Managix
>>>> realizes
>>>>>> the changed state as soon as it has occurred. This can be particularly
>>>>>> useful when showing on a management console the up-to-date state of an
>>>>>> instance in real time or having Managix respond to an event.
>>>>>> 
>>>>>> Regards,
>>>>>> Raman
>>>>>> 
>>>>>> ---------- Forwarded message ----------
>>>>>> From: abdullah alamoudi <ba...@gmail.com>
>>>>>> Date: Tue, Aug 25, 2015 at 12:27 AM
>>>>>> Subject: Re: The solution to the sporadic connection refused exceptions
>>>>>> To: dev@asterixdb.incubator.apache.org
>>>>>> 
>>>>>> 
>>>>>> On Tue, Aug 25, 2015 at 3:40 AM, Chris Hillery <ch...@hillery.land>
>>>>>> wrote:
>>>>>> 
>>>>>>> Perhaps an aside, but: exactly what is kept in Zookeeper
>>>>>> 
>>>>>> 
>>>>>> A serialized instance of
>>>> edu.uci.ics.asterix.event.model.AsterixInstance
>>>>>> 
>>>>>> 
>>>>>>> , and what code is
>>>>>>> responsible for keeping it up-to-date?
>>>>>>> 
>>>>>> Apparently, no one is :-)
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Ceej
>>>>>>> 
>>>>>>> On Mon, Aug 24, 2015 at 5:28 PM, Raman Grover <
>>>> ramangrover29@gmail.com
>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Well, the state of an instance (and metadata including
>>>> configuration)
>>>>>> is
>>>>>>>> kept in Zookeeper instance that is accessible to Managix and CC. CC
>>>>>>> should
>>>>>>>> be able to set the state of the cluster in Zookeeper under the
>>>> right
>>>>>>> znode
>>>>>>>> which can viewed by Managix.
>>>>>>>> 
>>>>>>>> There exists a communication channel for CC and Managix to share
>>>>>>>> information on state etc. I am not sure if we need another channel
>>>>> such
>>>>>>> as
>>>>>>>> RMI between Managix and CC.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Raman
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <
>>>>>> bamousaa@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Well, it depends on your definition of the boundaries of managix.
>>>>>> What
>>>>>>> I
>>>>>>>>> did is that I added an RMI object in the InstallerDriver which
>>>>>>> basically
>>>>>>>>> listen for state changes from the cluster controller. This means
>>>>> some
>>>>>>>>> additional logic in the CCApplicationEntryPoint where after the
>>>> CC
>>>>> is
>>>>>>>>> ready, it contacts the InstallerDriver using RMI and at that
>>>> point
>>>>>>> only,
>>>>>>>>> the InstallerDriver can return to managix and tells it that the
>>>>>> startup
>>>>>>>> is
>>>>>>>>> complete.
>>>>>>>>> 
>>>>>>>>> Not sure if this is the right way to do it but it definitely is
>>>>>> better
>>>>>>>> than
>>>>>>>>> what we currently have.
>>>>>>>>> Abdullah.
>>>>>>>>> 
>>>>>>>>> On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery
>>>>>> <chillery@hillery.land
>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hopefully the solution won't involve additional important logic
>>>>>>> inside
>>>>>>>>>> Managix itself?
>>>>>>>>>> 
>>>>>>>>>> Ceej
>>>>>>>>>> aka Chris Hillery
>>>>>>>>>> 
>>>>>>>>>> On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <
>>>>>>> bamousaa@gmail.com
>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> That works but it doesn't feel right doing it this way. I am
>>>>>> going
>>>>>>> to
>>>>>>>>> fix
>>>>>>>>>>> this one for good.
>>>>>>>>>>> 
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Abdullah.
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu>
>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> The way I assured liveness for the YARN installer was to
>>>> try
>>>>>>>> running
>>>>>>>>>> "for
>>>>>>>>>>>> $x in dataset Metadata.Dataset return $x" via the API. I
>>>> just
>>>>>>>> polled
>>>>>>>>>> for
>>>>>>>>>>> a
>>>>>>>>>>>> reasonable amount of time  (though honestly, thinking about
>>>>> it
>>>>>>> now,
>>>>>>>>> the
>>>>>>>>>>>> correct parameter to use for the polling interval is the
>>>>>> startup
>>>>>>>> wait
>>>>>>>>>>> time
>>>>>>>>>>>> in the parameters file :) ). It's not perfect, but it gives
>>>>>> less
>>>>>>>>> false
>>>>>>>>>>>> positives than just checking ps for processes that look
>>>> like
>>>>>>>> CCs/NCs.
>>>>>>>>>>>> 
>>>>>>>>>>>> - Ian.
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
>>>>>>>>> bamousaa@gmail.com
>>>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Now that I think about it. Maybe we should provide
>>>> multiple
>>>>>>> ways
>>>>>>>> to
>>>>>>>>>> do
>>>>>>>>>>>>> this. A polling mechanism to be used for arbitrary time
>>>>> and a
>>>>>>>>> pushing
>>>>>>>>>>>>> mechanism on startup.
>>>>>>>>>>>>> I am going to start implementation of this and will
>>>>> probably
>>>>>>> use
>>>>>>>>> RMI
>>>>>>>>>>> for
>>>>>>>>>>>>> this task both ways (CC to InstallerDriver and
>>>>>> InstallerDriver
>>>>>>> to
>>>>>>>>>> CC).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Abdullah.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
>>>>>>>>>> bamousaa@gmail.com
>>>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So after further investigation, turned out our startup
>>>>>>> process
>>>>>>>>> just
>>>>>>>>>>>>> starts
>>>>>>>>>>>>>> the CC and NC processes and then make sure the
>>>> processes
>>>>>> are
>>>>>>>>>> running
>>>>>>>>>>>> and
>>>>>>>>>>>>> if
>>>>>>>>>>>>>> the processes were found to be running, it returns the
>>>>>> state
>>>>>>> of
>>>>>>>>> the
>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>> to be active and the subsequent test commands can start
>>>>>>>>>> immediately.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This means that the CC could've started but is not yet
>>>>>> ready
>>>>>>>> when
>>>>>>>>>> we
>>>>>>>>>>>> try
>>>>>>>>>>>>>> to process the next command. To address this, we need a
>>>>>>> better
>>>>>>>>> way
>>>>>>>>>> to
>>>>>>>>>>>>> tell
>>>>>>>>>>>>>> when the startup procedure has completed. we can do
>>>> this
>>>>> by
>>>>>>>>> pushing
>>>>>>>>>>> (CC
>>>>>>>>>>>>>> informs installer driver when the startup is complete)
>>>> or
>>>>>>>> polling
>>>>>>>>>>> (The
>>>>>>>>>>>>>> installer driver needs to actually query the CC for the
>>>>>> state
>>>>>>>> of
>>>>>>>>>> the
>>>>>>>>>>>>>> cluster).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I can do either way so let's vote. My vote goes to the
>>>>>>> pushing
>>>>>>>>>>>> mechanism.
>>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
>>>>>>>>>>>> bamousaa@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This solution turned out to be incorrect. Actually,
>>>> the
>>>>>> test
>>>>>>>>> cases
>>>>>>>>>>>> when
>>>>>>>>>>>>> I
>>>>>>>>>>>>>>> build after using the join method never fails but
>>>>> running
>>>>>> an
>>>>>>>>>> actual
>>>>>>>>>>>>> asterix
>>>>>>>>>>>>>>> instance never succeeds which is quite confusing.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I also think that the startup script has a major bug
>>>>> where
>>>>>>> it
>>>>>>>>>> might
>>>>>>>>>>>>>>> returns before the startup is complete. More on this
>>>>>>>> later......
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
>>>>>>>>>>>> bamousaa@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> It is highly unlikely that it is related.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>> Abdullah.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <
>>>>>> chenli@gmail.com
>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> @Abdullah: Is this issue related to
>>>>>>>>>>>>>>>>> 
>>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074?
>>>>>> Ian
>>>>>>>>> and I
>>>>>>>>>>>> plan
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> look into the details on Monday.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi
>>>> <
>>>>>>>>>>>>> bamousaa@gmail.com
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> About 3-4 days ago, I was working on the addition
>>>> of
>>>>>> the
>>>>>>>>>>>> filesystem
>>>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>>> feed adapter and it didn't take anytime to
>>>> complete.
>>>>>>>>> However,
>>>>>>>>>>>> when I
>>>>>>>>>>>>>>>>> wanted
>>>>>>>>>>>>>>>>>> to build and make sure all tests pass, I kept
>>>>> getting
>>>>>>>>>>>>>>>>> ConnectionRefused
>>>>>>>>>>>>>>>>>> errors which caused the installer tests to fail
>>>>> every
>>>>>>> now
>>>>>>>>> and
>>>>>>>>>>>> then.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I knew the new change had nothing to do with this
>>>>>>> failure,
>>>>>>>>>> yet,
>>>>>>>>>>> I
>>>>>>>>>>>>>>>>> couldn't
>>>>>>>>>>>>>>>>>> direct my attention away from this bug (It just
>>>>>> bothered
>>>>>>>> me
>>>>>>>>> so
>>>>>>>>>>>> much
>>>>>>>>>>>>>>>>> and I
>>>>>>>>>>>>>>>>>> knew it needs to be resolved ASAP). After wasting
>>>>>>>> countless
>>>>>>>>>>>> hours, I
>>>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>>> finally able to figure out what was happening :-)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> In the startup routine, we start three Jetty web
>>>>>> servers
>>>>>>>>> (Web
>>>>>>>>>>>>>>>>> interface
>>>>>>>>>>>>>>>>>> server, JSON API server, and Feed server).
>>>> Sometime
>>>>>> ago,
>>>>>>>> we
>>>>>>>>>> used
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> end the
>>>>>>>>>>>>>>>>>> startup call before making sure the
>>>>> server.isStarted()
>>>>>>>>> method
>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>> true
>>>>>>>>>>>>>>>>>> on all servers. At that time, I introduced the
>>>>>>>>>>>> waitUntilServerStarts
>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>> to make sure we don't return before the servers
>>>> are
>>>>>>> ready.
>>>>>>>>>>> Turned
>>>>>>>>>>>>>>>>> out, that
>>>>>>>>>>>>>>>>>> was an incorrect way to handle this (We can blame
>>>>>>>>>> stackoverflow
>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>> one!) and it is not enough that the server
>>>>> isStarted()
>>>>>>>>> returns
>>>>>>>>>>>> true.
>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>> correct way to do this is to call the
>>>> server.join()
>>>>>>> method
>>>>>>>>>> after
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> server.start().
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> See:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> This was equally satisfying as it was frustrating
>>>>> and
>>>>>>> you
>>>>>>>>> are
>>>>>>>>>>>>> welcome
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> the future time I saved each of you :)
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Raman
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Amoudi, Abdullah.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Raman
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Amoudi, Abdullah.
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Raman
>>

Re: The solution to the sporadic connection refused exceptions

Posted by Ian Maxon <im...@uci.edu>.

> And Managix uses Zookeeper to mange its information, but YARN doesn’t.

To put some background into this, I only chose to eschew use of ZK
because it isn't a requirement in a YARN 2.2.0 cluster, and I could do
what I needed via HDFS and some polling on the CC. I'm not opposed to
integrating it further though (and making the YARN client take use of
that).

- Ian

On Thu, Aug 27, 2015 at 7:58 PM, Till Westmann <ti...@apache.org> wrote:
> I’m not really deep into this topic, but I’d like to understand a little better.
>
> As I understand it, we currently have 2 ways to deploy/manage AsterixDB: a) using Managix and b) using YARN.
> And Managix uses Zookeeper to mange its information, but YARN doesn’t.
> Also, neither the Asterix CC or NC depend on the existence of Zookeeper.
>
> Is this correct so far?
>
> Now we are trying to find a way to ensure that an AsterixDB client can reliably know if the cluster is up or down.
>
> My first assumption for the properties that the solution to this problem would have is:
> 1) The knowledge if the cluster is up or down is available in the CC (as it controls the cluster).
> 2) The mechanism used to expose that information works for both ways to deploy/manage a cluster.
>
> As simple way to do that seems to be to send a request “waitUntilStarted” to the CC that returns to the client once the CC has determined that everything has started. The response to that request would either be “yes" (cluster is up), “no” (an error occurred and it won’t be up without intervention), or “not sure” (timeout - please ask again later). This would imply that the client is polling, but it wouldn’t be very busy if the timeout is reasonable.
>
> Now this doesn’t seem to be where the discussion is going and I’d like to find out where is is going and why.
>
> Could you help me?
>
> Thanks,
> Till
>
>
>> On Aug 25, 2015, at 7:23 AM, Raman Grover <ra...@gmail.com> wrote:
>>
>> As I mentioned before...
>> "The information for an AsterixDB instance is "lazily" refreshed when a
>> management operation is invoked (using managix set of commands) or an
>> explicit describe command is invoked. "
>>
>> Above, the commands are the Managix set of commands (create, start,
>> describe etc.) that trigger a refresh and so its "lazy". Currently CC does
>> not notify Managix. what we are discussing are the elegant way to have CC
>> relay information to Managix.
>>
>> On Tue, Aug 25, 2015 at 4:10 AM, abdullah alamoudi <ba...@gmail.com>
>> wrote:
>>
>>> I don't think that is there yet but the intention is to have it at some
>>> point in the future.
>>>
>>> Cheers,
>>> Abdullah.
>>>
>>> On Tue, Aug 25, 2015 at 12:38 PM, Chris Hillery <ch...@hillery.land>
>>> wrote:
>>>
>>>> Very interesting, thank you. Can you point out a couple places in the
>>> code
>>>> where some of this logic is kept? Specifically where "CC can update this
>>>> information and notify Managix" sounds interesting...
>>>>
>>>> Ceej
>>>> aka Chris Hillery
>>>>
>>>> On Tue, Aug 25, 2015 at 12:49 AM, Raman Grover <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>>> , and what code is
>>>>>> responsible for keeping it up-to-date?
>>>>>>
>>>>> Apparently, no one is :-)
>>>>>
>>>>> The information for an AsterixDB instance is "lazily" refreshed when a
>>>>> management operation is invoked (using managix set of commands) or an
>>>>> explicit describe command is invoked.
>>>>> Between the time t1 (when state of an AsterixDB instance changes, say
>>> due
>>>>> to NC failure) and t2 (when  a management operation is invoked), the
>>>>> information about the AsterixDB instance inside Zookeeper remains
>>> stale.
>>>> CC
>>>>> can update this information and notify Managix; this way Managix
>>> realizes
>>>>> the changed state as soon as it has occurred. This can be particularly
>>>>> useful when showing on a management console the up-to-date state of an
>>>>> instance in real time or having Managix respond to an event.
>>>>>
>>>>> Regards,
>>>>> Raman
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: abdullah alamoudi <ba...@gmail.com>
>>>>> Date: Tue, Aug 25, 2015 at 12:27 AM
>>>>> Subject: Re: The solution to the sporadic connection refused exceptions
>>>>> To: dev@asterixdb.incubator.apache.org
>>>>>
>>>>>
>>>>> On Tue, Aug 25, 2015 at 3:40 AM, Chris Hillery <ch...@hillery.land>
>>>>> wrote:
>>>>>
>>>>>> Perhaps an aside, but: exactly what is kept in Zookeeper
>>>>>
>>>>>
>>>>> A serialized instance of
>>> edu.uci.ics.asterix.event.model.AsterixInstance
>>>>>
>>>>>
>>>>>> , and what code is
>>>>>> responsible for keeping it up-to-date?
>>>>>>
>>>>> Apparently, no one is :-)
>>>>>
>>>>>
>>>>>>
>>>>>> Ceej
>>>>>>
>>>>>> On Mon, Aug 24, 2015 at 5:28 PM, Raman Grover <
>>> ramangrover29@gmail.com
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> Well, the state of an instance (and metadata including
>>> configuration)
>>>>> is
>>>>>>> kept in Zookeeper instance that is accessible to Managix and CC. CC
>>>>>> should
>>>>>>> be able to set the state of the cluster in Zookeeper under the
>>> right
>>>>>> znode
>>>>>>> which can viewed by Managix.
>>>>>>>
>>>>>>> There exists a communication channel for CC and Managix to share
>>>>>>> information on state etc. I am not sure if we need another channel
>>>> such
>>>>>> as
>>>>>>> RMI between Managix and CC.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Raman
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <
>>>>> bamousaa@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Well, it depends on your definition of the boundaries of managix.
>>>>> What
>>>>>> I
>>>>>>>> did is that I added an RMI object in the InstallerDriver which
>>>>>> basically
>>>>>>>> listen for state changes from the cluster controller. This means
>>>> some
>>>>>>>> additional logic in the CCApplicationEntryPoint where after the
>>> CC
>>>> is
>>>>>>>> ready, it contacts the InstallerDriver using RMI and at that
>>> point
>>>>>> only,
>>>>>>>> the InstallerDriver can return to managix and tells it that the
>>>>> startup
>>>>>>> is
>>>>>>>> complete.
>>>>>>>>
>>>>>>>> Not sure if this is the right way to do it but it definitely is
>>>>> better
>>>>>>> than
>>>>>>>> what we currently have.
>>>>>>>> Abdullah.
>>>>>>>>
>>>>>>>> On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery
>>>>> <chillery@hillery.land
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hopefully the solution won't involve additional important logic
>>>>>> inside
>>>>>>>>> Managix itself?
>>>>>>>>>
>>>>>>>>> Ceej
>>>>>>>>> aka Chris Hillery
>>>>>>>>>
>>>>>>>>> On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <
>>>>>> bamousaa@gmail.com
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> That works but it doesn't feel right doing it this way. I am
>>>>> going
>>>>>> to
>>>>>>>> fix
>>>>>>>>>> this one for good.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Abdullah.
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu>
>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> The way I assured liveness for the YARN installer was to
>>> try
>>>>>>> running
>>>>>>>>> "for
>>>>>>>>>>> $x in dataset Metadata.Dataset return $x" via the API. I
>>> just
>>>>>>> polled
>>>>>>>>> for
>>>>>>>>>> a
>>>>>>>>>>> reasonable amount of time  (though honestly, thinking about
>>>> it
>>>>>> now,
>>>>>>>> the
>>>>>>>>>>> correct parameter to use for the polling interval is the
>>>>> startup
>>>>>>> wait
>>>>>>>>>> time
>>>>>>>>>>> in the parameters file :) ). It's not perfect, but it gives
>>>>> less
>>>>>>>> false
>>>>>>>>>>> positives than just checking ps for processes that look
>>> like
>>>>>>> CCs/NCs.
>>>>>>>>>>>
>>>>>>>>>>> - Ian.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
>>>>>>>> bamousaa@gmail.com
>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Now that I think about it. Maybe we should provide
>>> multiple
>>>>>> ways
>>>>>>> to
>>>>>>>>> do
>>>>>>>>>>>> this. A polling mechanism to be used for arbitrary time
>>>> and a
>>>>>>>> pushing
>>>>>>>>>>>> mechanism on startup.
>>>>>>>>>>>> I am going to start implementation of this and will
>>>> probably
>>>>>> use
>>>>>>>> RMI
>>>>>>>>>> for
>>>>>>>>>>>> this task both ways (CC to InstallerDriver and
>>>>> InstallerDriver
>>>>>> to
>>>>>>>>> CC).
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Abdullah.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
>>>>>>>>> bamousaa@gmail.com
>>>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> So after further investigation, turned out our startup
>>>>>> process
>>>>>>>> just
>>>>>>>>>>>> starts
>>>>>>>>>>>>> the CC and NC processes and then make sure the
>>> processes
>>>>> are
>>>>>>>>> running
>>>>>>>>>>> and
>>>>>>>>>>>> if
>>>>>>>>>>>>> the processes were found to be running, it returns the
>>>>> state
>>>>>> of
>>>>>>>> the
>>>>>>>>>>>> cluster
>>>>>>>>>>>>> to be active and the subsequent test commands can start
>>>>>>>>> immediately.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This means that the CC could've started but is not yet
>>>>> ready
>>>>>>> when
>>>>>>>>> we
>>>>>>>>>>> try
>>>>>>>>>>>>> to process the next command. To address this, we need a
>>>>>> better
>>>>>>>> way
>>>>>>>>> to
>>>>>>>>>>>> tell
>>>>>>>>>>>>> when the startup procedure has completed. we can do
>>> this
>>>> by
>>>>>>>> pushing
>>>>>>>>>> (CC
>>>>>>>>>>>>> informs installer driver when the startup is complete)
>>> or
>>>>>>> polling
>>>>>>>>>> (The
>>>>>>>>>>>>> installer driver needs to actually query the CC for the
>>>>> state
>>>>>>> of
>>>>>>>>> the
>>>>>>>>>>>>> cluster).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can do either way so let's vote. My vote goes to the
>>>>>> pushing
>>>>>>>>>>> mechanism.
>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
>>>>>>>>>>> bamousaa@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> This solution turned out to be incorrect. Actually,
>>> the
>>>>> test
>>>>>>>> cases
>>>>>>>>>>> when
>>>>>>>>>>>> I
>>>>>>>>>>>>>> build after using the join method never fails but
>>>> running
>>>>> an
>>>>>>>>> actual
>>>>>>>>>>>> asterix
>>>>>>>>>>>>>> instance never succeeds which is quite confusing.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I also think that the startup script has a major bug
>>>> where
>>>>>> it
>>>>>>>>> might
>>>>>>>>>>>>>> returns before the startup is complete. More on this
>>>>>>> later......
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
>>>>>>>>>>> bamousaa@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is highly unlikely that it is related.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Abdullah.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <
>>>>> chenli@gmail.com
>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> @Abdullah: Is this issue related to
>>>>>>>>>>>>>>>>
>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074?
>>>>> Ian
>>>>>>>> and I
>>>>>>>>>>> plan
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> look into the details on Monday.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi
>>> <
>>>>>>>>>>>> bamousaa@gmail.com
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> About 3-4 days ago, I was working on the addition
>>> of
>>>>> the
>>>>>>>>>>> filesystem
>>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>> feed adapter and it didn't take anytime to
>>> complete.
>>>>>>>> However,
>>>>>>>>>>> when I
>>>>>>>>>>>>>>>> wanted
>>>>>>>>>>>>>>>>> to build and make sure all tests pass, I kept
>>>> getting
>>>>>>>>>>>>>>>> ConnectionRefused
>>>>>>>>>>>>>>>>> errors which caused the installer tests to fail
>>>> every
>>>>>> now
>>>>>>>> and
>>>>>>>>>>> then.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I knew the new change had nothing to do with this
>>>>>> failure,
>>>>>>>>> yet,
>>>>>>>>>> I
>>>>>>>>>>>>>>>> couldn't
>>>>>>>>>>>>>>>>> direct my attention away from this bug (It just
>>>>> bothered
>>>>>>> me
>>>>>>>> so
>>>>>>>>>>> much
>>>>>>>>>>>>>>>> and I
>>>>>>>>>>>>>>>>> knew it needs to be resolved ASAP). After wasting
>>>>>>> countless
>>>>>>>>>>> hours, I
>>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>> finally able to figure out what was happening :-)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In the startup routine, we start three Jetty web
>>>>> servers
>>>>>>>> (Web
>>>>>>>>>>>>>>>> interface
>>>>>>>>>>>>>>>>> server, JSON API server, and Feed server).
>>> Sometime
>>>>> ago,
>>>>>>> we
>>>>>>>>> used
>>>>>>>>>>> to
>>>>>>>>>>>>>>>> end the
>>>>>>>>>>>>>>>>> startup call before making sure the
>>>> server.isStarted()
>>>>>>>> method
>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>> true
>>>>>>>>>>>>>>>>> on all servers. At that time, I introduced the
>>>>>>>>>>> waitUntilServerStarts
>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>> to make sure we don't return before the servers
>>> are
>>>>>> ready.
>>>>>>>>>> Turned
>>>>>>>>>>>>>>>> out, that
>>>>>>>>>>>>>>>>> was an incorrect way to handle this (We can blame
>>>>>>>>> stackoverflow
>>>>>>>>>>> for
>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> one!) and it is not enough that the server
>>>> isStarted()
>>>>>>>> returns
>>>>>>>>>>> true.
>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>> correct way to do this is to call the
>>> server.join()
>>>>>> method
>>>>>>>>> after
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> server.start().
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> See:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This was equally satisfying as it was frustrating
>>>> and
>>>>>> you
>>>>>>>> are
>>>>>>>>>>>> welcome
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> the future time I saved each of you :)
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Amoudi, Abdullah.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Raman
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Amoudi, Abdullah.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Raman
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Amoudi, Abdullah.
>>>
>>
>>
>>
>> --
>> Raman
>

Re: The solution to the sporadic connection refused exceptions

Posted by abdullah alamoudi <ba...@gmail.com>.

Hi Till,
I am glad that you're interested and let me first say that a change has
been submitted and reviewed by Murtadha, now being reviewed by Chris. Not
only this but I first implemented it completely using RMI and then
re-implemented it completely using Zookeeper.

All what you stated is correct. The solution that was implemented only
deals with knowing when the cluster is up during the startup process. This
seemed urgent to me since I am facing it with almost every change that I
try to verify before I push to Gerrit, and others have seen it too. Knowing
the state of the cluster (i.e. through the Managix describe command) still
relies on checking if the processes are running (Someone correct me if this
is wrong).

So what I did is the following:
When Managix starts the CC, it simply listens on Zookeeper until CC reports
its state. This is currently only done during the startup process. As Ian
has said, he was/is using a polling mechanism to determine if the server is
up. I still think what we implemented is a more elegant solution that
doesn't involve polling at all.

Anyone is welcome to look at the change, suggest changes to it before we
merge it :-)
~Abdullah.

On Fri, Aug 28, 2015 at 8:58 AM, Till Westmann <ti...@apache.org> wrote:

> I’m not really deep into this topic, but I’d like to understand a little
> better.
>
> As I understand it, we currently have 2 ways to deploy/manage AsterixDB:
> a) using Managix and b) using YARN.
> And Managix uses Zookeeper to mange its information, but YARN doesn’t.
> Also, neither the Asterix CC or NC depend on the existence of Zookeeper.
>
> Is this correct so far?
>
> Now we are trying to find a way to ensure that an AsterixDB client can
> reliably know if the cluster is up or down.
>
> My first assumption for the properties that the solution to this problem
> would have is:
> 1) The knowledge if the cluster is up or down is available in the CC (as
> it controls the cluster).
> 2) The mechanism used to expose that information works for both ways to
> deploy/manage a cluster.
>
> As simple way to do that seems to be to send a request “waitUntilStarted”
> to the CC that returns to the client once the CC has determined that
> everything has started. The response to that request would either be “yes"
> (cluster is up), “no” (an error occurred and it won’t be up without
> intervention), or “not sure” (timeout - please ask again later). This would
> imply that the client is polling, but it wouldn’t be very busy if the
> timeout is reasonable.
>
> Now this doesn’t seem to be where the discussion is going and I’d like to
> find out where is is going and why.
>
> Could you help me?
>
> Thanks,
> Till
>
>
> > On Aug 25, 2015, at 7:23 AM, Raman Grover <ra...@gmail.com>
> wrote:
> >
> > As I mentioned before...
> > "The information for an AsterixDB instance is "lazily" refreshed when a
> > management operation is invoked (using managix set of commands) or an
> > explicit describe command is invoked. "
> >
> > Above, the commands are the Managix set of commands (create, start,
> > describe etc.) that trigger a refresh and so its "lazy". Currently CC
> does
> > not notify Managix. what we are discussing are the elegant way to have CC
> > relay information to Managix.
> >
> > On Tue, Aug 25, 2015 at 4:10 AM, abdullah alamoudi <ba...@gmail.com>
> > wrote:
> >
> >> I don't think that is there yet but the intention is to have it at some
> >> point in the future.
> >>
> >> Cheers,
> >> Abdullah.
> >>
> >> On Tue, Aug 25, 2015 at 12:38 PM, Chris Hillery <ch...@hillery.land>
> >> wrote:
> >>
> >>> Very interesting, thank you. Can you point out a couple places in the
> >> code
> >>> where some of this logic is kept? Specifically where "CC can update
> this
> >>> information and notify Managix" sounds interesting...
> >>>
> >>> Ceej
> >>> aka Chris Hillery
> >>>
> >>> On Tue, Aug 25, 2015 at 12:49 AM, Raman Grover <
> ramangrover29@gmail.com>
> >>> wrote:
> >>>
> >>>>> , and what code is
> >>>>> responsible for keeping it up-to-date?
> >>>>>
> >>>> Apparently, no one is :-)
> >>>>
> >>>> The information for an AsterixDB instance is "lazily" refreshed when a
> >>>> management operation is invoked (using managix set of commands) or an
> >>>> explicit describe command is invoked.
> >>>> Between the time t1 (when state of an AsterixDB instance changes, say
> >> due
> >>>> to NC failure) and t2 (when  a management operation is invoked), the
> >>>> information about the AsterixDB instance inside Zookeeper remains
> >> stale.
> >>> CC
> >>>> can update this information and notify Managix; this way Managix
> >> realizes
> >>>> the changed state as soon as it has occurred. This can be particularly
> >>>> useful when showing on a management console the up-to-date state of an
> >>>> instance in real time or having Managix respond to an event.
> >>>>
> >>>> Regards,
> >>>> Raman
> >>>>
> >>>> ---------- Forwarded message ----------
> >>>> From: abdullah alamoudi <ba...@gmail.com>
> >>>> Date: Tue, Aug 25, 2015 at 12:27 AM
> >>>> Subject: Re: The solution to the sporadic connection refused
> exceptions
> >>>> To: dev@asterixdb.incubator.apache.org
> >>>>
> >>>>
> >>>> On Tue, Aug 25, 2015 at 3:40 AM, Chris Hillery <chillery@hillery.land
> >
> >>>> wrote:
> >>>>
> >>>>> Perhaps an aside, but: exactly what is kept in Zookeeper
> >>>>
> >>>>
> >>>> A serialized instance of
> >> edu.uci.ics.asterix.event.model.AsterixInstance
> >>>>
> >>>>
> >>>>> , and what code is
> >>>>> responsible for keeping it up-to-date?
> >>>>>
> >>>> Apparently, no one is :-)
> >>>>
> >>>>
> >>>>>
> >>>>> Ceej
> >>>>>
> >>>>> On Mon, Aug 24, 2015 at 5:28 PM, Raman Grover <
> >> ramangrover29@gmail.com
> >>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Well, the state of an instance (and metadata including
> >> configuration)
> >>>> is
> >>>>>> kept in Zookeeper instance that is accessible to Managix and CC. CC
> >>>>> should
> >>>>>> be able to set the state of the cluster in Zookeeper under the
> >> right
> >>>>> znode
> >>>>>> which can viewed by Managix.
> >>>>>>
> >>>>>> There exists a communication channel for CC and Managix to share
> >>>>>> information on state etc. I am not sure if we need another channel
> >>> such
> >>>>> as
> >>>>>> RMI between Managix and CC.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Raman
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <
> >>>> bamousaa@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Well, it depends on your definition of the boundaries of managix.
> >>>> What
> >>>>> I
> >>>>>>> did is that I added an RMI object in the InstallerDriver which
> >>>>> basically
> >>>>>>> listen for state changes from the cluster controller. This means
> >>> some
> >>>>>>> additional logic in the CCApplicationEntryPoint where after the
> >> CC
> >>> is
> >>>>>>> ready, it contacts the InstallerDriver using RMI and at that
> >> point
> >>>>> only,
> >>>>>>> the InstallerDriver can return to managix and tells it that the
> >>>> startup
> >>>>>> is
> >>>>>>> complete.
> >>>>>>>
> >>>>>>> Not sure if this is the right way to do it but it definitely is
> >>>> better
> >>>>>> than
> >>>>>>> what we currently have.
> >>>>>>> Abdullah.
> >>>>>>>
> >>>>>>> On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery
> >>>> <chillery@hillery.land
> >>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hopefully the solution won't involve additional important logic
> >>>>> inside
> >>>>>>>> Managix itself?
> >>>>>>>>
> >>>>>>>> Ceej
> >>>>>>>> aka Chris Hillery
> >>>>>>>>
> >>>>>>>> On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <
> >>>>> bamousaa@gmail.com
> >>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> That works but it doesn't feel right doing it this way. I am
> >>>> going
> >>>>> to
> >>>>>>> fix
> >>>>>>>>> this one for good.
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> Abdullah.
> >>>>>>>>>
> >>>>>>>>> On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu>
> >>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> The way I assured liveness for the YARN installer was to
> >> try
> >>>>>> running
> >>>>>>>> "for
> >>>>>>>>>> $x in dataset Metadata.Dataset return $x" via the API. I
> >> just
> >>>>>> polled
> >>>>>>>> for
> >>>>>>>>> a
> >>>>>>>>>> reasonable amount of time  (though honestly, thinking about
> >>> it
> >>>>> now,
> >>>>>>> the
> >>>>>>>>>> correct parameter to use for the polling interval is the
> >>>> startup
> >>>>>> wait
> >>>>>>>>> time
> >>>>>>>>>> in the parameters file :) ). It's not perfect, but it gives
> >>>> less
> >>>>>>> false
> >>>>>>>>>> positives than just checking ps for processes that look
> >> like
> >>>>>> CCs/NCs.
> >>>>>>>>>>
> >>>>>>>>>> - Ian.
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
> >>>>>>> bamousaa@gmail.com
> >>>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Now that I think about it. Maybe we should provide
> >> multiple
> >>>>> ways
> >>>>>> to
> >>>>>>>> do
> >>>>>>>>>>> this. A polling mechanism to be used for arbitrary time
> >>> and a
> >>>>>>> pushing
> >>>>>>>>>>> mechanism on startup.
> >>>>>>>>>>> I am going to start implementation of this and will
> >>> probably
> >>>>> use
> >>>>>>> RMI
> >>>>>>>>> for
> >>>>>>>>>>> this task both ways (CC to InstallerDriver and
> >>>> InstallerDriver
> >>>>> to
> >>>>>>>> CC).
> >>>>>>>>>>>
> >>>>>>>>>>> Cheers,
> >>>>>>>>>>> Abdullah.
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
> >>>>>>>> bamousaa@gmail.com
> >>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> So after further investigation, turned out our startup
> >>>>> process
> >>>>>>> just
> >>>>>>>>>>> starts
> >>>>>>>>>>>> the CC and NC processes and then make sure the
> >> processes
> >>>> are
> >>>>>>>> running
> >>>>>>>>>> and
> >>>>>>>>>>> if
> >>>>>>>>>>>> the processes were found to be running, it returns the
> >>>> state
> >>>>> of
> >>>>>>> the
> >>>>>>>>>>> cluster
> >>>>>>>>>>>> to be active and the subsequent test commands can start
> >>>>>>>> immediately.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This means that the CC could've started but is not yet
> >>>> ready
> >>>>>> when
> >>>>>>>> we
> >>>>>>>>>> try
> >>>>>>>>>>>> to process the next command. To address this, we need a
> >>>>> better
> >>>>>>> way
> >>>>>>>> to
> >>>>>>>>>>> tell
> >>>>>>>>>>>> when the startup procedure has completed. we can do
> >> this
> >>> by
> >>>>>>> pushing
> >>>>>>>>> (CC
> >>>>>>>>>>>> informs installer driver when the startup is complete)
> >> or
> >>>>>> polling
> >>>>>>>>> (The
> >>>>>>>>>>>> installer driver needs to actually query the CC for the
> >>>> state
> >>>>>> of
> >>>>>>>> the
> >>>>>>>>>>>> cluster).
> >>>>>>>>>>>>
> >>>>>>>>>>>> I can do either way so let's vote. My vote goes to the
> >>>>> pushing
> >>>>>>>>>> mechanism.
> >>>>>>>>>>>> Thoughts?
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> >>>>>>>>>> bamousaa@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> This solution turned out to be incorrect. Actually,
> >> the
> >>>> test
> >>>>>>> cases
> >>>>>>>>>> when
> >>>>>>>>>>> I
> >>>>>>>>>>>>> build after using the join method never fails but
> >>> running
> >>>> an
> >>>>>>>> actual
> >>>>>>>>>>> asterix
> >>>>>>>>>>>>> instance never succeeds which is quite confusing.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I also think that the startup script has a major bug
> >>> where
> >>>>> it
> >>>>>>>> might
> >>>>>>>>>>>>> returns before the startup is complete. More on this
> >>>>>> later......
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> >>>>>>>>>> bamousaa@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> It is highly unlikely that it is related.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>> Abdullah.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <
> >>>> chenli@gmail.com
> >>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> @Abdullah: Is this issue related to
> >>>>>>>>>>>>>>>
> >> https://issues.apache.org/jira/browse/ASTERIXDB-1074?
> >>>> Ian
> >>>>>>> and I
> >>>>>>>>>> plan
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>> look into the details on Monday.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi
> >> <
> >>>>>>>>>>> bamousaa@gmail.com
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> About 3-4 days ago, I was working on the addition
> >> of
> >>>> the
> >>>>>>>>>> filesystem
> >>>>>>>>>>>>>>> based
> >>>>>>>>>>>>>>>> feed adapter and it didn't take anytime to
> >> complete.
> >>>>>>> However,
> >>>>>>>>>> when I
> >>>>>>>>>>>>>>> wanted
> >>>>>>>>>>>>>>>> to build and make sure all tests pass, I kept
> >>> getting
> >>>>>>>>>>>>>>> ConnectionRefused
> >>>>>>>>>>>>>>>> errors which caused the installer tests to fail
> >>> every
> >>>>> now
> >>>>>>> and
> >>>>>>>>>> then.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I knew the new change had nothing to do with this
> >>>>> failure,
> >>>>>>>> yet,
> >>>>>>>>> I
> >>>>>>>>>>>>>>> couldn't
> >>>>>>>>>>>>>>>> direct my attention away from this bug (It just
> >>>> bothered
> >>>>>> me
> >>>>>>> so
> >>>>>>>>>> much
> >>>>>>>>>>>>>>> and I
> >>>>>>>>>>>>>>>> knew it needs to be resolved ASAP). After wasting
> >>>>>> countless
> >>>>>>>>>> hours, I
> >>>>>>>>>>>>>>> was
> >>>>>>>>>>>>>>>> finally able to figure out what was happening :-)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> In the startup routine, we start three Jetty web
> >>>> servers
> >>>>>>> (Web
> >>>>>>>>>>>>>>> interface
> >>>>>>>>>>>>>>>> server, JSON API server, and Feed server).
> >> Sometime
> >>>> ago,
> >>>>>> we
> >>>>>>>> used
> >>>>>>>>>> to
> >>>>>>>>>>>>>>> end the
> >>>>>>>>>>>>>>>> startup call before making sure the
> >>> server.isStarted()
> >>>>>>> method
> >>>>>>>>>>> returns
> >>>>>>>>>>>>>>> true
> >>>>>>>>>>>>>>>> on all servers. At that time, I introduced the
> >>>>>>>>>> waitUntilServerStarts
> >>>>>>>>>>>>>>> method
> >>>>>>>>>>>>>>>> to make sure we don't return before the servers
> >> are
> >>>>> ready.
> >>>>>>>>> Turned
> >>>>>>>>>>>>>>> out, that
> >>>>>>>>>>>>>>>> was an incorrect way to handle this (We can blame
> >>>>>>>> stackoverflow
> >>>>>>>>>> for
> >>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>> one!) and it is not enough that the server
> >>> isStarted()
> >>>>>>> returns
> >>>>>>>>>> true.
> >>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>> correct way to do this is to call the
> >> server.join()
> >>>>> method
> >>>>>>>> after
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>> server.start().
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> See:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> This was equally satisfying as it was frustrating
> >>> and
> >>>>> you
> >>>>>>> are
> >>>>>>>>>>> welcome
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>> the future time I saved each of you :)
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> Amoudi, Abdullah.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> Amoudi, Abdullah.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Amoudi, Abdullah.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Amoudi, Abdullah.
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Amoudi, Abdullah.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Amoudi, Abdullah.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Amoudi, Abdullah.
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Raman
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Amoudi, Abdullah.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Raman
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Amoudi, Abdullah.
> >>
> >
> >
> >
> > --
> > Raman
>
>


-- 
Amoudi, Abdullah.

Re: The solution to the sporadic connection refused exceptions

Posted by Till Westmann <ti...@apache.org>.

I’m not really deep into this topic, but I’d like to understand a little better. 

As I understand it, we currently have 2 ways to deploy/manage AsterixDB: a) using Managix and b) using YARN.
And Managix uses Zookeeper to mange its information, but YARN doesn’t. 
Also, neither the Asterix CC or NC depend on the existence of Zookeeper.

Is this correct so far?

Now we are trying to find a way to ensure that an AsterixDB client can reliably know if the cluster is up or down.

My first assumption for the properties that the solution to this problem would have is:
1) The knowledge if the cluster is up or down is available in the CC (as it controls the cluster).
2) The mechanism used to expose that information works for both ways to deploy/manage a cluster.

As simple way to do that seems to be to send a request “waitUntilStarted” to the CC that returns to the client once the CC has determined that everything has started. The response to that request would either be “yes" (cluster is up), “no” (an error occurred and it won’t be up without intervention), or “not sure” (timeout - please ask again later). This would imply that the client is polling, but it wouldn’t be very busy if the timeout is reasonable.

Now this doesn’t seem to be where the discussion is going and I’d like to find out where is is going and why.

Could you help me?

Thanks,
Till


> On Aug 25, 2015, at 7:23 AM, Raman Grover <ra...@gmail.com> wrote:
> 
> As I mentioned before...
> "The information for an AsterixDB instance is "lazily" refreshed when a
> management operation is invoked (using managix set of commands) or an
> explicit describe command is invoked. "
> 
> Above, the commands are the Managix set of commands (create, start,
> describe etc.) that trigger a refresh and so its "lazy". Currently CC does
> not notify Managix. what we are discussing are the elegant way to have CC
> relay information to Managix.
> 
> On Tue, Aug 25, 2015 at 4:10 AM, abdullah alamoudi <ba...@gmail.com>
> wrote:
> 
>> I don't think that is there yet but the intention is to have it at some
>> point in the future.
>> 
>> Cheers,
>> Abdullah.
>> 
>> On Tue, Aug 25, 2015 at 12:38 PM, Chris Hillery <ch...@hillery.land>
>> wrote:
>> 
>>> Very interesting, thank you. Can you point out a couple places in the
>> code
>>> where some of this logic is kept? Specifically where "CC can update this
>>> information and notify Managix" sounds interesting...
>>> 
>>> Ceej
>>> aka Chris Hillery
>>> 
>>> On Tue, Aug 25, 2015 at 12:49 AM, Raman Grover <ra...@gmail.com>
>>> wrote:
>>> 
>>>>> , and what code is
>>>>> responsible for keeping it up-to-date?
>>>>> 
>>>> Apparently, no one is :-)
>>>> 
>>>> The information for an AsterixDB instance is "lazily" refreshed when a
>>>> management operation is invoked (using managix set of commands) or an
>>>> explicit describe command is invoked.
>>>> Between the time t1 (when state of an AsterixDB instance changes, say
>> due
>>>> to NC failure) and t2 (when  a management operation is invoked), the
>>>> information about the AsterixDB instance inside Zookeeper remains
>> stale.
>>> CC
>>>> can update this information and notify Managix; this way Managix
>> realizes
>>>> the changed state as soon as it has occurred. This can be particularly
>>>> useful when showing on a management console the up-to-date state of an
>>>> instance in real time or having Managix respond to an event.
>>>> 
>>>> Regards,
>>>> Raman
>>>> 
>>>> ---------- Forwarded message ----------
>>>> From: abdullah alamoudi <ba...@gmail.com>
>>>> Date: Tue, Aug 25, 2015 at 12:27 AM
>>>> Subject: Re: The solution to the sporadic connection refused exceptions
>>>> To: dev@asterixdb.incubator.apache.org
>>>> 
>>>> 
>>>> On Tue, Aug 25, 2015 at 3:40 AM, Chris Hillery <ch...@hillery.land>
>>>> wrote:
>>>> 
>>>>> Perhaps an aside, but: exactly what is kept in Zookeeper
>>>> 
>>>> 
>>>> A serialized instance of
>> edu.uci.ics.asterix.event.model.AsterixInstance
>>>> 
>>>> 
>>>>> , and what code is
>>>>> responsible for keeping it up-to-date?
>>>>> 
>>>> Apparently, no one is :-)
>>>> 
>>>> 
>>>>> 
>>>>> Ceej
>>>>> 
>>>>> On Mon, Aug 24, 2015 at 5:28 PM, Raman Grover <
>> ramangrover29@gmail.com
>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Well, the state of an instance (and metadata including
>> configuration)
>>>> is
>>>>>> kept in Zookeeper instance that is accessible to Managix and CC. CC
>>>>> should
>>>>>> be able to set the state of the cluster in Zookeeper under the
>> right
>>>>> znode
>>>>>> which can viewed by Managix.
>>>>>> 
>>>>>> There exists a communication channel for CC and Managix to share
>>>>>> information on state etc. I am not sure if we need another channel
>>> such
>>>>> as
>>>>>> RMI between Managix and CC.
>>>>>> 
>>>>>> Regards,
>>>>>> Raman
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <
>>>> bamousaa@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Well, it depends on your definition of the boundaries of managix.
>>>> What
>>>>> I
>>>>>>> did is that I added an RMI object in the InstallerDriver which
>>>>> basically
>>>>>>> listen for state changes from the cluster controller. This means
>>> some
>>>>>>> additional logic in the CCApplicationEntryPoint where after the
>> CC
>>> is
>>>>>>> ready, it contacts the InstallerDriver using RMI and at that
>> point
>>>>> only,
>>>>>>> the InstallerDriver can return to managix and tells it that the
>>>> startup
>>>>>> is
>>>>>>> complete.
>>>>>>> 
>>>>>>> Not sure if this is the right way to do it but it definitely is
>>>> better
>>>>>> than
>>>>>>> what we currently have.
>>>>>>> Abdullah.
>>>>>>> 
>>>>>>> On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery
>>>> <chillery@hillery.land
>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hopefully the solution won't involve additional important logic
>>>>> inside
>>>>>>>> Managix itself?
>>>>>>>> 
>>>>>>>> Ceej
>>>>>>>> aka Chris Hillery
>>>>>>>> 
>>>>>>>> On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <
>>>>> bamousaa@gmail.com
>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> That works but it doesn't feel right doing it this way. I am
>>>> going
>>>>> to
>>>>>>> fix
>>>>>>>>> this one for good.
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> Abdullah.
>>>>>>>>> 
>>>>>>>>> On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu>
>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> The way I assured liveness for the YARN installer was to
>> try
>>>>>> running
>>>>>>>> "for
>>>>>>>>>> $x in dataset Metadata.Dataset return $x" via the API. I
>> just
>>>>>> polled
>>>>>>>> for
>>>>>>>>> a
>>>>>>>>>> reasonable amount of time  (though honestly, thinking about
>>> it
>>>>> now,
>>>>>>> the
>>>>>>>>>> correct parameter to use for the polling interval is the
>>>> startup
>>>>>> wait
>>>>>>>>> time
>>>>>>>>>> in the parameters file :) ). It's not perfect, but it gives
>>>> less
>>>>>>> false
>>>>>>>>>> positives than just checking ps for processes that look
>> like
>>>>>> CCs/NCs.
>>>>>>>>>> 
>>>>>>>>>> - Ian.
>>>>>>>>>> 
>>>>>>>>>> On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
>>>>>>> bamousaa@gmail.com
>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Now that I think about it. Maybe we should provide
>> multiple
>>>>> ways
>>>>>> to
>>>>>>>> do
>>>>>>>>>>> this. A polling mechanism to be used for arbitrary time
>>> and a
>>>>>>> pushing
>>>>>>>>>>> mechanism on startup.
>>>>>>>>>>> I am going to start implementation of this and will
>>> probably
>>>>> use
>>>>>>> RMI
>>>>>>>>> for
>>>>>>>>>>> this task both ways (CC to InstallerDriver and
>>>> InstallerDriver
>>>>> to
>>>>>>>> CC).
>>>>>>>>>>> 
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Abdullah.
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
>>>>>>>> bamousaa@gmail.com
>>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> So after further investigation, turned out our startup
>>>>> process
>>>>>>> just
>>>>>>>>>>> starts
>>>>>>>>>>>> the CC and NC processes and then make sure the
>> processes
>>>> are
>>>>>>>> running
>>>>>>>>>> and
>>>>>>>>>>> if
>>>>>>>>>>>> the processes were found to be running, it returns the
>>>> state
>>>>> of
>>>>>>> the
>>>>>>>>>>> cluster
>>>>>>>>>>>> to be active and the subsequent test commands can start
>>>>>>>> immediately.
>>>>>>>>>>>> 
>>>>>>>>>>>> This means that the CC could've started but is not yet
>>>> ready
>>>>>> when
>>>>>>>> we
>>>>>>>>>> try
>>>>>>>>>>>> to process the next command. To address this, we need a
>>>>> better
>>>>>>> way
>>>>>>>> to
>>>>>>>>>>> tell
>>>>>>>>>>>> when the startup procedure has completed. we can do
>> this
>>> by
>>>>>>> pushing
>>>>>>>>> (CC
>>>>>>>>>>>> informs installer driver when the startup is complete)
>> or
>>>>>> polling
>>>>>>>>> (The
>>>>>>>>>>>> installer driver needs to actually query the CC for the
>>>> state
>>>>>> of
>>>>>>>> the
>>>>>>>>>>>> cluster).
>>>>>>>>>>>> 
>>>>>>>>>>>> I can do either way so let's vote. My vote goes to the
>>>>> pushing
>>>>>>>>>> mechanism.
>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
>>>>>>>>>> bamousaa@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> This solution turned out to be incorrect. Actually,
>> the
>>>> test
>>>>>>> cases
>>>>>>>>>> when
>>>>>>>>>>> I
>>>>>>>>>>>>> build after using the join method never fails but
>>> running
>>>> an
>>>>>>>> actual
>>>>>>>>>>> asterix
>>>>>>>>>>>>> instance never succeeds which is quite confusing.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I also think that the startup script has a major bug
>>> where
>>>>> it
>>>>>>>> might
>>>>>>>>>>>>> returns before the startup is complete. More on this
>>>>>> later......
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
>>>>>>>>>> bamousaa@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> It is highly unlikely that it is related.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>> Abdullah.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <
>>>> chenli@gmail.com
>>>>>> 
>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> @Abdullah: Is this issue related to
>>>>>>>>>>>>>>> 
>> https://issues.apache.org/jira/browse/ASTERIXDB-1074?
>>>> Ian
>>>>>>> and I
>>>>>>>>>> plan
>>>>>>>>>>> to
>>>>>>>>>>>>>>> look into the details on Monday.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi
>> <
>>>>>>>>>>> bamousaa@gmail.com
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> About 3-4 days ago, I was working on the addition
>> of
>>>> the
>>>>>>>>>> filesystem
>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>> feed adapter and it didn't take anytime to
>> complete.
>>>>>>> However,
>>>>>>>>>> when I
>>>>>>>>>>>>>>> wanted
>>>>>>>>>>>>>>>> to build and make sure all tests pass, I kept
>>> getting
>>>>>>>>>>>>>>> ConnectionRefused
>>>>>>>>>>>>>>>> errors which caused the installer tests to fail
>>> every
>>>>> now
>>>>>>> and
>>>>>>>>>> then.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I knew the new change had nothing to do with this
>>>>> failure,
>>>>>>>> yet,
>>>>>>>>> I
>>>>>>>>>>>>>>> couldn't
>>>>>>>>>>>>>>>> direct my attention away from this bug (It just
>>>> bothered
>>>>>> me
>>>>>>> so
>>>>>>>>>> much
>>>>>>>>>>>>>>> and I
>>>>>>>>>>>>>>>> knew it needs to be resolved ASAP). After wasting
>>>>>> countless
>>>>>>>>>> hours, I
>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>> finally able to figure out what was happening :-)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> In the startup routine, we start three Jetty web
>>>> servers
>>>>>>> (Web
>>>>>>>>>>>>>>> interface
>>>>>>>>>>>>>>>> server, JSON API server, and Feed server).
>> Sometime
>>>> ago,
>>>>>> we
>>>>>>>> used
>>>>>>>>>> to
>>>>>>>>>>>>>>> end the
>>>>>>>>>>>>>>>> startup call before making sure the
>>> server.isStarted()
>>>>>>> method
>>>>>>>>>>> returns
>>>>>>>>>>>>>>> true
>>>>>>>>>>>>>>>> on all servers. At that time, I introduced the
>>>>>>>>>> waitUntilServerStarts
>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>> to make sure we don't return before the servers
>> are
>>>>> ready.
>>>>>>>>> Turned
>>>>>>>>>>>>>>> out, that
>>>>>>>>>>>>>>>> was an incorrect way to handle this (We can blame
>>>>>>>> stackoverflow
>>>>>>>>>> for
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> one!) and it is not enough that the server
>>> isStarted()
>>>>>>> returns
>>>>>>>>>> true.
>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>> correct way to do this is to call the
>> server.join()
>>>>> method
>>>>>>>> after
>>>>>>>>>> the
>>>>>>>>>>>>>>>> server.start().
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> See:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This was equally satisfying as it was frustrating
>>> and
>>>>> you
>>>>>>> are
>>>>>>>>>>> welcome
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> the future time I saved each of you :)
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Amoudi, Abdullah.
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Amoudi, Abdullah.
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Raman
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Amoudi, Abdullah.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Raman
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Amoudi, Abdullah.
>> 
> 
> 
> 
> -- 
> Raman

Re: The solution to the sporadic connection refused exceptions

Posted by Raman Grover <ra...@gmail.com>.

As I mentioned before...
"The information for an AsterixDB instance is "lazily" refreshed when a
management operation is invoked (using managix set of commands) or an
explicit describe command is invoked. "

Above, the commands are the Managix set of commands (create, start,
describe etc.) that trigger a refresh and so its "lazy". Currently CC does
not notify Managix. what we are discussing are the elegant way to have CC
relay information to Managix.

On Tue, Aug 25, 2015 at 4:10 AM, abdullah alamoudi <ba...@gmail.com>
wrote:

> I don't think that is there yet but the intention is to have it at some
> point in the future.
>
> Cheers,
> Abdullah.
>
> On Tue, Aug 25, 2015 at 12:38 PM, Chris Hillery <ch...@hillery.land>
> wrote:
>
> > Very interesting, thank you. Can you point out a couple places in the
> code
> > where some of this logic is kept? Specifically where "CC can update this
> > information and notify Managix" sounds interesting...
> >
> > Ceej
> > aka Chris Hillery
> >
> > On Tue, Aug 25, 2015 at 12:49 AM, Raman Grover <ra...@gmail.com>
> > wrote:
> >
> > > > , and what code is
> > > > responsible for keeping it up-to-date?
> > > >
> > > Apparently, no one is :-)
> > >
> > > The information for an AsterixDB instance is "lazily" refreshed when a
> > > management operation is invoked (using managix set of commands) or an
> > > explicit describe command is invoked.
> > > Between the time t1 (when state of an AsterixDB instance changes, say
> due
> > > to NC failure) and t2 (when  a management operation is invoked), the
> > > information about the AsterixDB instance inside Zookeeper remains
> stale.
> > CC
> > > can update this information and notify Managix; this way Managix
> realizes
> > > the changed state as soon as it has occurred. This can be particularly
> > > useful when showing on a management console the up-to-date state of an
> > > instance in real time or having Managix respond to an event.
> > >
> > > Regards,
> > > Raman
> > >
> > > ---------- Forwarded message ----------
> > > From: abdullah alamoudi <ba...@gmail.com>
> > > Date: Tue, Aug 25, 2015 at 12:27 AM
> > > Subject: Re: The solution to the sporadic connection refused exceptions
> > > To: dev@asterixdb.incubator.apache.org
> > >
> > >
> > > On Tue, Aug 25, 2015 at 3:40 AM, Chris Hillery <ch...@hillery.land>
> > > wrote:
> > >
> > > > Perhaps an aside, but: exactly what is kept in Zookeeper
> > >
> > >
> > > A serialized instance of
> edu.uci.ics.asterix.event.model.AsterixInstance
> > >
> > >
> > > > , and what code is
> > > > responsible for keeping it up-to-date?
> > > >
> > > Apparently, no one is :-)
> > >
> > >
> > > >
> > > > Ceej
> > > >
> > > > On Mon, Aug 24, 2015 at 5:28 PM, Raman Grover <
> ramangrover29@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Well, the state of an instance (and metadata including
> configuration)
> > > is
> > > > > kept in Zookeeper instance that is accessible to Managix and CC. CC
> > > > should
> > > > > be able to set the state of the cluster in Zookeeper under the
> right
> > > > znode
> > > > > which can viewed by Managix.
> > > > >
> > > > > There exists a communication channel for CC and Managix to share
> > > > > information on state etc. I am not sure if we need another channel
> > such
> > > > as
> > > > > RMI between Managix and CC.
> > > > >
> > > > > Regards,
> > > > > Raman
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <
> > > bamousaa@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Well, it depends on your definition of the boundaries of managix.
> > > What
> > > > I
> > > > > > did is that I added an RMI object in the InstallerDriver which
> > > > basically
> > > > > > listen for state changes from the cluster controller. This means
> > some
> > > > > > additional logic in the CCApplicationEntryPoint where after the
> CC
> > is
> > > > > > ready, it contacts the InstallerDriver using RMI and at that
> point
> > > > only,
> > > > > > the InstallerDriver can return to managix and tells it that the
> > > startup
> > > > > is
> > > > > > complete.
> > > > > >
> > > > > > Not sure if this is the right way to do it but it definitely is
> > > better
> > > > > than
> > > > > > what we currently have.
> > > > > > Abdullah.
> > > > > >
> > > > > > On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery
> > > <chillery@hillery.land
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hopefully the solution won't involve additional important logic
> > > > inside
> > > > > > > Managix itself?
> > > > > > >
> > > > > > > Ceej
> > > > > > > aka Chris Hillery
> > > > > > >
> > > > > > > On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <
> > > > bamousaa@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > That works but it doesn't feel right doing it this way. I am
> > > going
> > > > to
> > > > > > fix
> > > > > > > > this one for good.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Abdullah.
> > > > > > > >
> > > > > > > > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu>
> > > wrote:
> > > > > > > >
> > > > > > > > > The way I assured liveness for the YARN installer was to
> try
> > > > > running
> > > > > > > "for
> > > > > > > > > $x in dataset Metadata.Dataset return $x" via the API. I
> just
> > > > > polled
> > > > > > > for
> > > > > > > > a
> > > > > > > > > reasonable amount of time  (though honestly, thinking about
> > it
> > > > now,
> > > > > > the
> > > > > > > > > correct parameter to use for the polling interval is the
> > > startup
> > > > > wait
> > > > > > > > time
> > > > > > > > > in the parameters file :) ). It's not perfect, but it gives
> > > less
> > > > > > false
> > > > > > > > > positives than just checking ps for processes that look
> like
> > > > > CCs/NCs.
> > > > > > > > >
> > > > > > > > > - Ian.
> > > > > > > > >
> > > > > > > > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
> > > > > > bamousaa@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Now that I think about it. Maybe we should provide
> multiple
> > > > ways
> > > > > to
> > > > > > > do
> > > > > > > > > > this. A polling mechanism to be used for arbitrary time
> > and a
> > > > > > pushing
> > > > > > > > > > mechanism on startup.
> > > > > > > > > > I am going to start implementation of this and will
> > probably
> > > > use
> > > > > > RMI
> > > > > > > > for
> > > > > > > > > > this task both ways (CC to InstallerDriver and
> > > InstallerDriver
> > > > to
> > > > > > > CC).
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Abdullah.
> > > > > > > > > >
> > > > > > > > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
> > > > > > > bamousaa@gmail.com
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > So after further investigation, turned out our startup
> > > > process
> > > > > > just
> > > > > > > > > > starts
> > > > > > > > > > > the CC and NC processes and then make sure the
> processes
> > > are
> > > > > > > running
> > > > > > > > > and
> > > > > > > > > > if
> > > > > > > > > > > the processes were found to be running, it returns the
> > > state
> > > > of
> > > > > > the
> > > > > > > > > > cluster
> > > > > > > > > > > to be active and the subsequent test commands can start
> > > > > > > immediately.
> > > > > > > > > > >
> > > > > > > > > > > This means that the CC could've started but is not yet
> > > ready
> > > > > when
> > > > > > > we
> > > > > > > > > try
> > > > > > > > > > > to process the next command. To address this, we need a
> > > > better
> > > > > > way
> > > > > > > to
> > > > > > > > > > tell
> > > > > > > > > > > when the startup procedure has completed. we can do
> this
> > by
> > > > > > pushing
> > > > > > > > (CC
> > > > > > > > > > > informs installer driver when the startup is complete)
> or
> > > > > polling
> > > > > > > > (The
> > > > > > > > > > > installer driver needs to actually query the CC for the
> > > state
> > > > > of
> > > > > > > the
> > > > > > > > > > > cluster).
> > > > > > > > > > >
> > > > > > > > > > > I can do either way so let's vote. My vote goes to the
> > > > pushing
> > > > > > > > > mechanism.
> > > > > > > > > > > Thoughts?
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> > > > > > > > > bamousaa@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> This solution turned out to be incorrect. Actually,
> the
> > > test
> > > > > > cases
> > > > > > > > > when
> > > > > > > > > > I
> > > > > > > > > > >> build after using the join method never fails but
> > running
> > > an
> > > > > > > actual
> > > > > > > > > > asterix
> > > > > > > > > > >> instance never succeeds which is quite confusing.
> > > > > > > > > > >>
> > > > > > > > > > >> I also think that the startup script has a major bug
> > where
> > > > it
> > > > > > > might
> > > > > > > > > > >> returns before the startup is complete. More on this
> > > > > later......
> > > > > > > > > > >>
> > > > > > > > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> > > > > > > > > bamousaa@gmail.com>
> > > > > > > > > > >> wrote:
> > > > > > > > > > >>
> > > > > > > > > > >>> It is highly unlikely that it is related.
> > > > > > > > > > >>>
> > > > > > > > > > >>> Cheers,
> > > > > > > > > > >>> Abdullah.
> > > > > > > > > > >>>
> > > > > > > > > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <
> > > chenli@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > > >>>
> > > > > > > > > > >>>> @Abdullah: Is this issue related to
> > > > > > > > > > >>>>
> https://issues.apache.org/jira/browse/ASTERIXDB-1074?
> > > Ian
> > > > > > and I
> > > > > > > > > plan
> > > > > > > > > > to
> > > > > > > > > > >>>> look into the details on Monday.
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi
> <
> > > > > > > > > > bamousaa@gmail.com
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> wrote:
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> > About 3-4 days ago, I was working on the addition
> of
> > > the
> > > > > > > > > filesystem
> > > > > > > > > > >>>> based
> > > > > > > > > > >>>> > feed adapter and it didn't take anytime to
> complete.
> > > > > > However,
> > > > > > > > > when I
> > > > > > > > > > >>>> wanted
> > > > > > > > > > >>>> > to build and make sure all tests pass, I kept
> > getting
> > > > > > > > > > >>>> ConnectionRefused
> > > > > > > > > > >>>> > errors which caused the installer tests to fail
> > every
> > > > now
> > > > > > and
> > > > > > > > > then.
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> > I knew the new change had nothing to do with this
> > > > failure,
> > > > > > > yet,
> > > > > > > > I
> > > > > > > > > > >>>> couldn't
> > > > > > > > > > >>>> > direct my attention away from this bug (It just
> > > bothered
> > > > > me
> > > > > > so
> > > > > > > > > much
> > > > > > > > > > >>>> and I
> > > > > > > > > > >>>> > knew it needs to be resolved ASAP). After wasting
> > > > > countless
> > > > > > > > > hours, I
> > > > > > > > > > >>>> was
> > > > > > > > > > >>>> > finally able to figure out what was happening :-)
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> > In the startup routine, we start three Jetty web
> > > servers
> > > > > > (Web
> > > > > > > > > > >>>> interface
> > > > > > > > > > >>>> > server, JSON API server, and Feed server).
> Sometime
> > > ago,
> > > > > we
> > > > > > > used
> > > > > > > > > to
> > > > > > > > > > >>>> end the
> > > > > > > > > > >>>> > startup call before making sure the
> > server.isStarted()
> > > > > > method
> > > > > > > > > > returns
> > > > > > > > > > >>>> true
> > > > > > > > > > >>>> > on all servers. At that time, I introduced the
> > > > > > > > > waitUntilServerStarts
> > > > > > > > > > >>>> method
> > > > > > > > > > >>>> > to make sure we don't return before the servers
> are
> > > > ready.
> > > > > > > > Turned
> > > > > > > > > > >>>> out, that
> > > > > > > > > > >>>> > was an incorrect way to handle this (We can blame
> > > > > > > stackoverflow
> > > > > > > > > for
> > > > > > > > > > >>>> this
> > > > > > > > > > >>>> > one!) and it is not enough that the server
> > isStarted()
> > > > > > returns
> > > > > > > > > true.
> > > > > > > > > > >>>> The
> > > > > > > > > > >>>> > correct way to do this is to call the
> server.join()
> > > > method
> > > > > > > after
> > > > > > > > > the
> > > > > > > > > > >>>> > server.start().
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> > See:
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>> > This was equally satisfying as it was frustrating
> > and
> > > > you
> > > > > > are
> > > > > > > > > > welcome
> > > > > > > > > > >>>> for
> > > > > > > > > > >>>> > the future time I saved each of you :)
> > > > > > > > > > >>>> > --
> > > > > > > > > > >>>> > Amoudi, Abdullah.
> > > > > > > > > > >>>> >
> > > > > > > > > > >>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>>
> > > > > > > > > > >>>
> > > > > > > > > > >>> --
> > > > > > > > > > >>> Amoudi, Abdullah.
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> --
> > > > > > > > > > >> Amoudi, Abdullah.
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Amoudi, Abdullah.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Amoudi, Abdullah.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Amoudi, Abdullah.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Amoudi, Abdullah.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Raman
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Amoudi, Abdullah.
> > >
> > >
> > >
> > > --
> > > Raman
> > >
> >
>
>
>
> --
> Amoudi, Abdullah.
>



-- 
Raman

Re: The solution to the sporadic connection refused exceptions

Posted by abdullah alamoudi <ba...@gmail.com>.

I don't think that is there yet but the intention is to have it at some
point in the future.

Cheers,
Abdullah.

On Tue, Aug 25, 2015 at 12:38 PM, Chris Hillery <ch...@hillery.land>
wrote:

> Very interesting, thank you. Can you point out a couple places in the code
> where some of this logic is kept? Specifically where "CC can update this
> information and notify Managix" sounds interesting...
>
> Ceej
> aka Chris Hillery
>
> On Tue, Aug 25, 2015 at 12:49 AM, Raman Grover <ra...@gmail.com>
> wrote:
>
> > > , and what code is
> > > responsible for keeping it up-to-date?
> > >
> > Apparently, no one is :-)
> >
> > The information for an AsterixDB instance is "lazily" refreshed when a
> > management operation is invoked (using managix set of commands) or an
> > explicit describe command is invoked.
> > Between the time t1 (when state of an AsterixDB instance changes, say due
> > to NC failure) and t2 (when  a management operation is invoked), the
> > information about the AsterixDB instance inside Zookeeper remains stale.
> CC
> > can update this information and notify Managix; this way Managix realizes
> > the changed state as soon as it has occurred. This can be particularly
> > useful when showing on a management console the up-to-date state of an
> > instance in real time or having Managix respond to an event.
> >
> > Regards,
> > Raman
> >
> > ---------- Forwarded message ----------
> > From: abdullah alamoudi <ba...@gmail.com>
> > Date: Tue, Aug 25, 2015 at 12:27 AM
> > Subject: Re: The solution to the sporadic connection refused exceptions
> > To: dev@asterixdb.incubator.apache.org
> >
> >
> > On Tue, Aug 25, 2015 at 3:40 AM, Chris Hillery <ch...@hillery.land>
> > wrote:
> >
> > > Perhaps an aside, but: exactly what is kept in Zookeeper
> >
> >
> > A serialized instance of edu.uci.ics.asterix.event.model.AsterixInstance
> >
> >
> > > , and what code is
> > > responsible for keeping it up-to-date?
> > >
> > Apparently, no one is :-)
> >
> >
> > >
> > > Ceej
> > >
> > > On Mon, Aug 24, 2015 at 5:28 PM, Raman Grover <ramangrover29@gmail.com
> >
> > > wrote:
> > >
> > > > Well, the state of an instance (and metadata including configuration)
> > is
> > > > kept in Zookeeper instance that is accessible to Managix and CC. CC
> > > should
> > > > be able to set the state of the cluster in Zookeeper under the right
> > > znode
> > > > which can viewed by Managix.
> > > >
> > > > There exists a communication channel for CC and Managix to share
> > > > information on state etc. I am not sure if we need another channel
> such
> > > as
> > > > RMI between Managix and CC.
> > > >
> > > > Regards,
> > > > Raman
> > > >
> > > >
> > > >
> > > > On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <
> > bamousaa@gmail.com>
> > > > wrote:
> > > >
> > > > > Well, it depends on your definition of the boundaries of managix.
> > What
> > > I
> > > > > did is that I added an RMI object in the InstallerDriver which
> > > basically
> > > > > listen for state changes from the cluster controller. This means
> some
> > > > > additional logic in the CCApplicationEntryPoint where after the CC
> is
> > > > > ready, it contacts the InstallerDriver using RMI and at that point
> > > only,
> > > > > the InstallerDriver can return to managix and tells it that the
> > startup
> > > > is
> > > > > complete.
> > > > >
> > > > > Not sure if this is the right way to do it but it definitely is
> > better
> > > > than
> > > > > what we currently have.
> > > > > Abdullah.
> > > > >
> > > > > On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery
> > <chillery@hillery.land
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hopefully the solution won't involve additional important logic
> > > inside
> > > > > > Managix itself?
> > > > > >
> > > > > > Ceej
> > > > > > aka Chris Hillery
> > > > > >
> > > > > > On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <
> > > bamousaa@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > That works but it doesn't feel right doing it this way. I am
> > going
> > > to
> > > > > fix
> > > > > > > this one for good.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Abdullah.
> > > > > > >
> > > > > > > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu>
> > wrote:
> > > > > > >
> > > > > > > > The way I assured liveness for the YARN installer was to try
> > > > running
> > > > > > "for
> > > > > > > > $x in dataset Metadata.Dataset return $x" via the API. I just
> > > > polled
> > > > > > for
> > > > > > > a
> > > > > > > > reasonable amount of time  (though honestly, thinking about
> it
> > > now,
> > > > > the
> > > > > > > > correct parameter to use for the polling interval is the
> > startup
> > > > wait
> > > > > > > time
> > > > > > > > in the parameters file :) ). It's not perfect, but it gives
> > less
> > > > > false
> > > > > > > > positives than just checking ps for processes that look like
> > > > CCs/NCs.
> > > > > > > >
> > > > > > > > - Ian.
> > > > > > > >
> > > > > > > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
> > > > > bamousaa@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Now that I think about it. Maybe we should provide multiple
> > > ways
> > > > to
> > > > > > do
> > > > > > > > > this. A polling mechanism to be used for arbitrary time
> and a
> > > > > pushing
> > > > > > > > > mechanism on startup.
> > > > > > > > > I am going to start implementation of this and will
> probably
> > > use
> > > > > RMI
> > > > > > > for
> > > > > > > > > this task both ways (CC to InstallerDriver and
> > InstallerDriver
> > > to
> > > > > > CC).
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Abdullah.
> > > > > > > > >
> > > > > > > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
> > > > > > bamousaa@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > So after further investigation, turned out our startup
> > > process
> > > > > just
> > > > > > > > > starts
> > > > > > > > > > the CC and NC processes and then make sure the processes
> > are
> > > > > > running
> > > > > > > > and
> > > > > > > > > if
> > > > > > > > > > the processes were found to be running, it returns the
> > state
> > > of
> > > > > the
> > > > > > > > > cluster
> > > > > > > > > > to be active and the subsequent test commands can start
> > > > > > immediately.
> > > > > > > > > >
> > > > > > > > > > This means that the CC could've started but is not yet
> > ready
> > > > when
> > > > > > we
> > > > > > > > try
> > > > > > > > > > to process the next command. To address this, we need a
> > > better
> > > > > way
> > > > > > to
> > > > > > > > > tell
> > > > > > > > > > when the startup procedure has completed. we can do this
> by
> > > > > pushing
> > > > > > > (CC
> > > > > > > > > > informs installer driver when the startup is complete) or
> > > > polling
> > > > > > > (The
> > > > > > > > > > installer driver needs to actually query the CC for the
> > state
> > > > of
> > > > > > the
> > > > > > > > > > cluster).
> > > > > > > > > >
> > > > > > > > > > I can do either way so let's vote. My vote goes to the
> > > pushing
> > > > > > > > mechanism.
> > > > > > > > > > Thoughts?
> > > > > > > > > >
> > > > > > > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> > > > > > > > bamousaa@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> This solution turned out to be incorrect. Actually, the
> > test
> > > > > cases
> > > > > > > > when
> > > > > > > > > I
> > > > > > > > > >> build after using the join method never fails but
> running
> > an
> > > > > > actual
> > > > > > > > > asterix
> > > > > > > > > >> instance never succeeds which is quite confusing.
> > > > > > > > > >>
> > > > > > > > > >> I also think that the startup script has a major bug
> where
> > > it
> > > > > > might
> > > > > > > > > >> returns before the startup is complete. More on this
> > > > later......
> > > > > > > > > >>
> > > > > > > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> > > > > > > > bamousaa@gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >>> It is highly unlikely that it is related.
> > > > > > > > > >>>
> > > > > > > > > >>> Cheers,
> > > > > > > > > >>> Abdullah.
> > > > > > > > > >>>
> > > > > > > > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <
> > chenli@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > >>>
> > > > > > > > > >>>> @Abdullah: Is this issue related to
> > > > > > > > > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074?
> > Ian
> > > > > and I
> > > > > > > > plan
> > > > > > > > > to
> > > > > > > > > >>>> look into the details on Monday.
> > > > > > > > > >>>>
> > > > > > > > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> > > > > > > > > bamousaa@gmail.com
> > > > > > > > > >>>> >
> > > > > > > > > >>>> wrote:
> > > > > > > > > >>>>
> > > > > > > > > >>>> > About 3-4 days ago, I was working on the addition of
> > the
> > > > > > > > filesystem
> > > > > > > > > >>>> based
> > > > > > > > > >>>> > feed adapter and it didn't take anytime to complete.
> > > > > However,
> > > > > > > > when I
> > > > > > > > > >>>> wanted
> > > > > > > > > >>>> > to build and make sure all tests pass, I kept
> getting
> > > > > > > > > >>>> ConnectionRefused
> > > > > > > > > >>>> > errors which caused the installer tests to fail
> every
> > > now
> > > > > and
> > > > > > > > then.
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > I knew the new change had nothing to do with this
> > > failure,
> > > > > > yet,
> > > > > > > I
> > > > > > > > > >>>> couldn't
> > > > > > > > > >>>> > direct my attention away from this bug (It just
> > bothered
> > > > me
> > > > > so
> > > > > > > > much
> > > > > > > > > >>>> and I
> > > > > > > > > >>>> > knew it needs to be resolved ASAP). After wasting
> > > > countless
> > > > > > > > hours, I
> > > > > > > > > >>>> was
> > > > > > > > > >>>> > finally able to figure out what was happening :-)
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > In the startup routine, we start three Jetty web
> > servers
> > > > > (Web
> > > > > > > > > >>>> interface
> > > > > > > > > >>>> > server, JSON API server, and Feed server). Sometime
> > ago,
> > > > we
> > > > > > used
> > > > > > > > to
> > > > > > > > > >>>> end the
> > > > > > > > > >>>> > startup call before making sure the
> server.isStarted()
> > > > > method
> > > > > > > > > returns
> > > > > > > > > >>>> true
> > > > > > > > > >>>> > on all servers. At that time, I introduced the
> > > > > > > > waitUntilServerStarts
> > > > > > > > > >>>> method
> > > > > > > > > >>>> > to make sure we don't return before the servers are
> > > ready.
> > > > > > > Turned
> > > > > > > > > >>>> out, that
> > > > > > > > > >>>> > was an incorrect way to handle this (We can blame
> > > > > > stackoverflow
> > > > > > > > for
> > > > > > > > > >>>> this
> > > > > > > > > >>>> > one!) and it is not enough that the server
> isStarted()
> > > > > returns
> > > > > > > > true.
> > > > > > > > > >>>> The
> > > > > > > > > >>>> > correct way to do this is to call the server.join()
> > > method
> > > > > > after
> > > > > > > > the
> > > > > > > > > >>>> > server.start().
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > See:
> > > > > > > > > >>>> >
> > > > > > > > > >>>>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > This was equally satisfying as it was frustrating
> and
> > > you
> > > > > are
> > > > > > > > > welcome
> > > > > > > > > >>>> for
> > > > > > > > > >>>> > the future time I saved each of you :)
> > > > > > > > > >>>> > --
> > > > > > > > > >>>> > Amoudi, Abdullah.
> > > > > > > > > >>>> >
> > > > > > > > > >>>>
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> --
> > > > > > > > > >>> Amoudi, Abdullah.
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >> Amoudi, Abdullah.
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Amoudi, Abdullah.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Amoudi, Abdullah.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Amoudi, Abdullah.
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Amoudi, Abdullah.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Raman
> > > >
> > >
> >
> >
> >
> > --
> > Amoudi, Abdullah.
> >
> >
> >
> > --
> > Raman
> >
>



-- 
Amoudi, Abdullah.

Re: The solution to the sporadic connection refused exceptions

Posted by Chris Hillery <ch...@hillery.land>.

Very interesting, thank you. Can you point out a couple places in the code
where some of this logic is kept? Specifically where "CC can update this
information and notify Managix" sounds interesting...

Ceej
aka Chris Hillery

On Tue, Aug 25, 2015 at 12:49 AM, Raman Grover <ra...@gmail.com>
wrote:

> > , and what code is
> > responsible for keeping it up-to-date?
> >
> Apparently, no one is :-)
>
> The information for an AsterixDB instance is "lazily" refreshed when a
> management operation is invoked (using managix set of commands) or an
> explicit describe command is invoked.
> Between the time t1 (when state of an AsterixDB instance changes, say due
> to NC failure) and t2 (when  a management operation is invoked), the
> information about the AsterixDB instance inside Zookeeper remains stale. CC
> can update this information and notify Managix; this way Managix realizes
> the changed state as soon as it has occurred. This can be particularly
> useful when showing on a management console the up-to-date state of an
> instance in real time or having Managix respond to an event.
>
> Regards,
> Raman
>
> ---------- Forwarded message ----------
> From: abdullah alamoudi <ba...@gmail.com>
> Date: Tue, Aug 25, 2015 at 12:27 AM
> Subject: Re: The solution to the sporadic connection refused exceptions
> To: dev@asterixdb.incubator.apache.org
>
>
> On Tue, Aug 25, 2015 at 3:40 AM, Chris Hillery <ch...@hillery.land>
> wrote:
>
> > Perhaps an aside, but: exactly what is kept in Zookeeper
>
>
> A serialized instance of edu.uci.ics.asterix.event.model.AsterixInstance
>
>
> > , and what code is
> > responsible for keeping it up-to-date?
> >
> Apparently, no one is :-)
>
>
> >
> > Ceej
> >
> > On Mon, Aug 24, 2015 at 5:28 PM, Raman Grover <ra...@gmail.com>
> > wrote:
> >
> > > Well, the state of an instance (and metadata including configuration)
> is
> > > kept in Zookeeper instance that is accessible to Managix and CC. CC
> > should
> > > be able to set the state of the cluster in Zookeeper under the right
> > znode
> > > which can viewed by Managix.
> > >
> > > There exists a communication channel for CC and Managix to share
> > > information on state etc. I am not sure if we need another channel such
> > as
> > > RMI between Managix and CC.
> > >
> > > Regards,
> > > Raman
> > >
> > >
> > >
> > > On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <
> bamousaa@gmail.com>
> > > wrote:
> > >
> > > > Well, it depends on your definition of the boundaries of managix.
> What
> > I
> > > > did is that I added an RMI object in the InstallerDriver which
> > basically
> > > > listen for state changes from the cluster controller. This means some
> > > > additional logic in the CCApplicationEntryPoint where after the CC is
> > > > ready, it contacts the InstallerDriver using RMI and at that point
> > only,
> > > > the InstallerDriver can return to managix and tells it that the
> startup
> > > is
> > > > complete.
> > > >
> > > > Not sure if this is the right way to do it but it definitely is
> better
> > > than
> > > > what we currently have.
> > > > Abdullah.
> > > >
> > > > On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery
> <chillery@hillery.land
> > >
> > > > wrote:
> > > >
> > > > > Hopefully the solution won't involve additional important logic
> > inside
> > > > > Managix itself?
> > > > >
> > > > > Ceej
> > > > > aka Chris Hillery
> > > > >
> > > > > On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <
> > bamousaa@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > That works but it doesn't feel right doing it this way. I am
> going
> > to
> > > > fix
> > > > > > this one for good.
> > > > > >
> > > > > > Cheers,
> > > > > > Abdullah.
> > > > > >
> > > > > > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu>
> wrote:
> > > > > >
> > > > > > > The way I assured liveness for the YARN installer was to try
> > > running
> > > > > "for
> > > > > > > $x in dataset Metadata.Dataset return $x" via the API. I just
> > > polled
> > > > > for
> > > > > > a
> > > > > > > reasonable amount of time  (though honestly, thinking about it
> > now,
> > > > the
> > > > > > > correct parameter to use for the polling interval is the
> startup
> > > wait
> > > > > > time
> > > > > > > in the parameters file :) ). It's not perfect, but it gives
> less
> > > > false
> > > > > > > positives than just checking ps for processes that look like
> > > CCs/NCs.
> > > > > > >
> > > > > > > - Ian.
> > > > > > >
> > > > > > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
> > > > bamousaa@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Now that I think about it. Maybe we should provide multiple
> > ways
> > > to
> > > > > do
> > > > > > > > this. A polling mechanism to be used for arbitrary time and a
> > > > pushing
> > > > > > > > mechanism on startup.
> > > > > > > > I am going to start implementation of this and will probably
> > use
> > > > RMI
> > > > > > for
> > > > > > > > this task both ways (CC to InstallerDriver and
> InstallerDriver
> > to
> > > > > CC).
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Abdullah.
> > > > > > > >
> > > > > > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
> > > > > bamousaa@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > So after further investigation, turned out our startup
> > process
> > > > just
> > > > > > > > starts
> > > > > > > > > the CC and NC processes and then make sure the processes
> are
> > > > > running
> > > > > > > and
> > > > > > > > if
> > > > > > > > > the processes were found to be running, it returns the
> state
> > of
> > > > the
> > > > > > > > cluster
> > > > > > > > > to be active and the subsequent test commands can start
> > > > > immediately.
> > > > > > > > >
> > > > > > > > > This means that the CC could've started but is not yet
> ready
> > > when
> > > > > we
> > > > > > > try
> > > > > > > > > to process the next command. To address this, we need a
> > better
> > > > way
> > > > > to
> > > > > > > > tell
> > > > > > > > > when the startup procedure has completed. we can do this by
> > > > pushing
> > > > > > (CC
> > > > > > > > > informs installer driver when the startup is complete) or
> > > polling
> > > > > > (The
> > > > > > > > > installer driver needs to actually query the CC for the
> state
> > > of
> > > > > the
> > > > > > > > > cluster).
> > > > > > > > >
> > > > > > > > > I can do either way so let's vote. My vote goes to the
> > pushing
> > > > > > > mechanism.
> > > > > > > > > Thoughts?
> > > > > > > > >
> > > > > > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> > > > > > > bamousaa@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> This solution turned out to be incorrect. Actually, the
> test
> > > > cases
> > > > > > > when
> > > > > > > > I
> > > > > > > > >> build after using the join method never fails but running
> an
> > > > > actual
> > > > > > > > asterix
> > > > > > > > >> instance never succeeds which is quite confusing.
> > > > > > > > >>
> > > > > > > > >> I also think that the startup script has a major bug where
> > it
> > > > > might
> > > > > > > > >> returns before the startup is complete. More on this
> > > later......
> > > > > > > > >>
> > > > > > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> > > > > > > bamousaa@gmail.com>
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >>> It is highly unlikely that it is related.
> > > > > > > > >>>
> > > > > > > > >>> Cheers,
> > > > > > > > >>> Abdullah.
> > > > > > > > >>>
> > > > > > > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <
> chenli@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >>>
> > > > > > > > >>>> @Abdullah: Is this issue related to
> > > > > > > > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074?
> Ian
> > > > and I
> > > > > > > plan
> > > > > > > > to
> > > > > > > > >>>> look into the details on Monday.
> > > > > > > > >>>>
> > > > > > > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> > > > > > > > bamousaa@gmail.com
> > > > > > > > >>>> >
> > > > > > > > >>>> wrote:
> > > > > > > > >>>>
> > > > > > > > >>>> > About 3-4 days ago, I was working on the addition of
> the
> > > > > > > filesystem
> > > > > > > > >>>> based
> > > > > > > > >>>> > feed adapter and it didn't take anytime to complete.
> > > > However,
> > > > > > > when I
> > > > > > > > >>>> wanted
> > > > > > > > >>>> > to build and make sure all tests pass, I kept getting
> > > > > > > > >>>> ConnectionRefused
> > > > > > > > >>>> > errors which caused the installer tests to fail every
> > now
> > > > and
> > > > > > > then.
> > > > > > > > >>>> >
> > > > > > > > >>>> > I knew the new change had nothing to do with this
> > failure,
> > > > > yet,
> > > > > > I
> > > > > > > > >>>> couldn't
> > > > > > > > >>>> > direct my attention away from this bug (It just
> bothered
> > > me
> > > > so
> > > > > > > much
> > > > > > > > >>>> and I
> > > > > > > > >>>> > knew it needs to be resolved ASAP). After wasting
> > > countless
> > > > > > > hours, I
> > > > > > > > >>>> was
> > > > > > > > >>>> > finally able to figure out what was happening :-)
> > > > > > > > >>>> >
> > > > > > > > >>>> > In the startup routine, we start three Jetty web
> servers
> > > > (Web
> > > > > > > > >>>> interface
> > > > > > > > >>>> > server, JSON API server, and Feed server). Sometime
> ago,
> > > we
> > > > > used
> > > > > > > to
> > > > > > > > >>>> end the
> > > > > > > > >>>> > startup call before making sure the server.isStarted()
> > > > method
> > > > > > > > returns
> > > > > > > > >>>> true
> > > > > > > > >>>> > on all servers. At that time, I introduced the
> > > > > > > waitUntilServerStarts
> > > > > > > > >>>> method
> > > > > > > > >>>> > to make sure we don't return before the servers are
> > ready.
> > > > > > Turned
> > > > > > > > >>>> out, that
> > > > > > > > >>>> > was an incorrect way to handle this (We can blame
> > > > > stackoverflow
> > > > > > > for
> > > > > > > > >>>> this
> > > > > > > > >>>> > one!) and it is not enough that the server isStarted()
> > > > returns
> > > > > > > true.
> > > > > > > > >>>> The
> > > > > > > > >>>> > correct way to do this is to call the server.join()
> > method
> > > > > after
> > > > > > > the
> > > > > > > > >>>> > server.start().
> > > > > > > > >>>> >
> > > > > > > > >>>> > See:
> > > > > > > > >>>> >
> > > > > > > > >>>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > > > > > > >>>> >
> > > > > > > > >>>> > This was equally satisfying as it was frustrating and
> > you
> > > > are
> > > > > > > > welcome
> > > > > > > > >>>> for
> > > > > > > > >>>> > the future time I saved each of you :)
> > > > > > > > >>>> > --
> > > > > > > > >>>> > Amoudi, Abdullah.
> > > > > > > > >>>> >
> > > > > > > > >>>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> --
> > > > > > > > >>> Amoudi, Abdullah.
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> Amoudi, Abdullah.
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Amoudi, Abdullah.
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Amoudi, Abdullah.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Amoudi, Abdullah.
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Amoudi, Abdullah.
> > > >
> > >
> > >
> > >
> > > --
> > > Raman
> > >
> >
>
>
>
> --
> Amoudi, Abdullah.
>
>
>
> --
> Raman
>

Fwd: The solution to the sporadic connection refused exceptions

Posted by Raman Grover <ra...@gmail.com>.

> , and what code is
> responsible for keeping it up-to-date?
>
Apparently, no one is :-)

The information for an AsterixDB instance is "lazily" refreshed when a
management operation is invoked (using managix set of commands) or an
explicit describe command is invoked.
Between the time t1 (when state of an AsterixDB instance changes, say due
to NC failure) and t2 (when  a management operation is invoked), the
information about the AsterixDB instance inside Zookeeper remains stale. CC
can update this information and notify Managix; this way Managix realizes
the changed state as soon as it has occurred. This can be particularly
useful when showing on a management console the up-to-date state of an
instance in real time or having Managix respond to an event.

Regards,
Raman

---------- Forwarded message ----------
From: abdullah alamoudi <ba...@gmail.com>
Date: Tue, Aug 25, 2015 at 12:27 AM
Subject: Re: The solution to the sporadic connection refused exceptions
To: dev@asterixdb.incubator.apache.org


On Tue, Aug 25, 2015 at 3:40 AM, Chris Hillery <ch...@hillery.land>
wrote:

> Perhaps an aside, but: exactly what is kept in Zookeeper


A serialized instance of edu.uci.ics.asterix.event.model.AsterixInstance


> , and what code is
> responsible for keeping it up-to-date?
>
Apparently, no one is :-)


>
> Ceej
>
> On Mon, Aug 24, 2015 at 5:28 PM, Raman Grover <ra...@gmail.com>
> wrote:
>
> > Well, the state of an instance (and metadata including configuration) is
> > kept in Zookeeper instance that is accessible to Managix and CC. CC
> should
> > be able to set the state of the cluster in Zookeeper under the right
> znode
> > which can viewed by Managix.
> >
> > There exists a communication channel for CC and Managix to share
> > information on state etc. I am not sure if we need another channel such
> as
> > RMI between Managix and CC.
> >
> > Regards,
> > Raman
> >
> >
> >
> > On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <ba...@gmail.com>
> > wrote:
> >
> > > Well, it depends on your definition of the boundaries of managix. What
> I
> > > did is that I added an RMI object in the InstallerDriver which
> basically
> > > listen for state changes from the cluster controller. This means some
> > > additional logic in the CCApplicationEntryPoint where after the CC is
> > > ready, it contacts the InstallerDriver using RMI and at that point
> only,
> > > the InstallerDriver can return to managix and tells it that the
startup
> > is
> > > complete.
> > >
> > > Not sure if this is the right way to do it but it definitely is better
> > than
> > > what we currently have.
> > > Abdullah.
> > >
> > > On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery <chillery@hillery.land
> >
> > > wrote:
> > >
> > > > Hopefully the solution won't involve additional important logic
> inside
> > > > Managix itself?
> > > >
> > > > Ceej
> > > > aka Chris Hillery
> > > >
> > > > On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <
> bamousaa@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > That works but it doesn't feel right doing it this way. I am going
> to
> > > fix
> > > > > this one for good.
> > > > >
> > > > > Cheers,
> > > > > Abdullah.
> > > > >
> > > > > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu> wrote:
> > > > >
> > > > > > The way I assured liveness for the YARN installer was to try
> > running
> > > > "for
> > > > > > $x in dataset Metadata.Dataset return $x" via the API. I just
> > polled
> > > > for
> > > > > a
> > > > > > reasonable amount of time  (though honestly, thinking about it
> now,
> > > the
> > > > > > correct parameter to use for the polling interval is the startup
> > wait
> > > > > time
> > > > > > in the parameters file :) ). It's not perfect, but it gives less
> > > false
> > > > > > positives than just checking ps for processes that look like
> > CCs/NCs.
> > > > > >
> > > > > > - Ian.
> > > > > >
> > > > > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
> > > bamousaa@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Now that I think about it. Maybe we should provide multiple
> ways
> > to
> > > > do
> > > > > > > this. A polling mechanism to be used for arbitrary time and a
> > > pushing
> > > > > > > mechanism on startup.
> > > > > > > I am going to start implementation of this and will probably
> use
> > > RMI
> > > > > for
> > > > > > > this task both ways (CC to InstallerDriver and InstallerDriver
> to
> > > > CC).
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Abdullah.
> > > > > > >
> > > > > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
> > > > bamousaa@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > So after further investigation, turned out our startup
> process
> > > just
> > > > > > > starts
> > > > > > > > the CC and NC processes and then make sure the processes are
> > > > running
> > > > > > and
> > > > > > > if
> > > > > > > > the processes were found to be running, it returns the state
> of
> > > the
> > > > > > > cluster
> > > > > > > > to be active and the subsequent test commands can start
> > > > immediately.
> > > > > > > >
> > > > > > > > This means that the CC could've started but is not yet ready
> > when
> > > > we
> > > > > > try
> > > > > > > > to process the next command. To address this, we need a
> better
> > > way
> > > > to
> > > > > > > tell
> > > > > > > > when the startup procedure has completed. we can do this by
> > > pushing
> > > > > (CC
> > > > > > > > informs installer driver when the startup is complete) or
> > polling
> > > > > (The
> > > > > > > > installer driver needs to actually query the CC for the
state
> > of
> > > > the
> > > > > > > > cluster).
> > > > > > > >
> > > > > > > > I can do either way so let's vote. My vote goes to the
> pushing
> > > > > > mechanism.
> > > > > > > > Thoughts?
> > > > > > > >
> > > > > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> > > > > > bamousaa@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> This solution turned out to be incorrect. Actually, the
test
> > > cases
> > > > > > when
> > > > > > > I
> > > > > > > >> build after using the join method never fails but running
an
> > > > actual
> > > > > > > asterix
> > > > > > > >> instance never succeeds which is quite confusing.
> > > > > > > >>
> > > > > > > >> I also think that the startup script has a major bug where
> it
> > > > might
> > > > > > > >> returns before the startup is complete. More on this
> > later......
> > > > > > > >>
> > > > > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> > > > > > bamousaa@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >>> It is highly unlikely that it is related.
> > > > > > > >>>
> > > > > > > >>> Cheers,
> > > > > > > >>> Abdullah.
> > > > > > > >>>
> > > > > > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <chenli@gmail.com
> >
> > > > wrote:
> > > > > > > >>>
> > > > > > > >>>> @Abdullah: Is this issue related to
> > > > > > > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian
> > > and I
> > > > > > plan
> > > > > > > to
> > > > > > > >>>> look into the details on Monday.
> > > > > > > >>>>
> > > > > > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> > > > > > > bamousaa@gmail.com
> > > > > > > >>>> >
> > > > > > > >>>> wrote:
> > > > > > > >>>>
> > > > > > > >>>> > About 3-4 days ago, I was working on the addition of
the
> > > > > > filesystem
> > > > > > > >>>> based
> > > > > > > >>>> > feed adapter and it didn't take anytime to complete.
> > > However,
> > > > > > when I
> > > > > > > >>>> wanted
> > > > > > > >>>> > to build and make sure all tests pass, I kept getting
> > > > > > > >>>> ConnectionRefused
> > > > > > > >>>> > errors which caused the installer tests to fail every
> now
> > > and
> > > > > > then.
> > > > > > > >>>> >
> > > > > > > >>>> > I knew the new change had nothing to do with this
> failure,
> > > > yet,
> > > > > I
> > > > > > > >>>> couldn't
> > > > > > > >>>> > direct my attention away from this bug (It just
bothered
> > me
> > > so
> > > > > > much
> > > > > > > >>>> and I
> > > > > > > >>>> > knew it needs to be resolved ASAP). After wasting
> > countless
> > > > > > hours, I
> > > > > > > >>>> was
> > > > > > > >>>> > finally able to figure out what was happening :-)
> > > > > > > >>>> >
> > > > > > > >>>> > In the startup routine, we start three Jetty web
servers
> > > (Web
> > > > > > > >>>> interface
> > > > > > > >>>> > server, JSON API server, and Feed server). Sometime
ago,
> > we
> > > > used
> > > > > > to
> > > > > > > >>>> end the
> > > > > > > >>>> > startup call before making sure the server.isStarted()
> > > method
> > > > > > > returns
> > > > > > > >>>> true
> > > > > > > >>>> > on all servers. At that time, I introduced the
> > > > > > waitUntilServerStarts
> > > > > > > >>>> method
> > > > > > > >>>> > to make sure we don't return before the servers are
> ready.
> > > > > Turned
> > > > > > > >>>> out, that
> > > > > > > >>>> > was an incorrect way to handle this (We can blame
> > > > stackoverflow
> > > > > > for
> > > > > > > >>>> this
> > > > > > > >>>> > one!) and it is not enough that the server isStarted()
> > > returns
> > > > > > true.
> > > > > > > >>>> The
> > > > > > > >>>> > correct way to do this is to call the server.join()
> method
> > > > after
> > > > > > the
> > > > > > > >>>> > server.start().
> > > > > > > >>>> >
> > > > > > > >>>> > See:
> > > > > > > >>>> >
> > > > > > > >>>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > > > > > >>>> >
> > > > > > > >>>> > This was equally satisfying as it was frustrating and
> you
> > > are
> > > > > > > welcome
> > > > > > > >>>> for
> > > > > > > >>>> > the future time I saved each of you :)
> > > > > > > >>>> > --
> > > > > > > >>>> > Amoudi, Abdullah.
> > > > > > > >>>> >
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> --
> > > > > > > >>> Amoudi, Abdullah.
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Amoudi, Abdullah.
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Amoudi, Abdullah.
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Amoudi, Abdullah.
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Amoudi, Abdullah.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Amoudi, Abdullah.
> > >
> >
> >
> >
> > --
> > Raman
> >
>



--
Amoudi, Abdullah.



-- 
Raman

Re: The solution to the sporadic connection refused exceptions

Posted by abdullah alamoudi <ba...@gmail.com>.

On Tue, Aug 25, 2015 at 3:40 AM, Chris Hillery <ch...@hillery.land>
wrote:

> Perhaps an aside, but: exactly what is kept in Zookeeper


A serialized instance of edu.uci.ics.asterix.event.model.AsterixInstance


> , and what code is
> responsible for keeping it up-to-date?
>
Apparently, no one is :-)


>
> Ceej
>
> On Mon, Aug 24, 2015 at 5:28 PM, Raman Grover <ra...@gmail.com>
> wrote:
>
> > Well, the state of an instance (and metadata including configuration) is
> > kept in Zookeeper instance that is accessible to Managix and CC. CC
> should
> > be able to set the state of the cluster in Zookeeper under the right
> znode
> > which can viewed by Managix.
> >
> > There exists a communication channel for CC and Managix to share
> > information on state etc. I am not sure if we need another channel such
> as
> > RMI between Managix and CC.
> >
> > Regards,
> > Raman
> >
> >
> >
> > On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <ba...@gmail.com>
> > wrote:
> >
> > > Well, it depends on your definition of the boundaries of managix. What
> I
> > > did is that I added an RMI object in the InstallerDriver which
> basically
> > > listen for state changes from the cluster controller. This means some
> > > additional logic in the CCApplicationEntryPoint where after the CC is
> > > ready, it contacts the InstallerDriver using RMI and at that point
> only,
> > > the InstallerDriver can return to managix and tells it that the startup
> > is
> > > complete.
> > >
> > > Not sure if this is the right way to do it but it definitely is better
> > than
> > > what we currently have.
> > > Abdullah.
> > >
> > > On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery <chillery@hillery.land
> >
> > > wrote:
> > >
> > > > Hopefully the solution won't involve additional important logic
> inside
> > > > Managix itself?
> > > >
> > > > Ceej
> > > > aka Chris Hillery
> > > >
> > > > On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <
> bamousaa@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > That works but it doesn't feel right doing it this way. I am going
> to
> > > fix
> > > > > this one for good.
> > > > >
> > > > > Cheers,
> > > > > Abdullah.
> > > > >
> > > > > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu> wrote:
> > > > >
> > > > > > The way I assured liveness for the YARN installer was to try
> > running
> > > > "for
> > > > > > $x in dataset Metadata.Dataset return $x" via the API. I just
> > polled
> > > > for
> > > > > a
> > > > > > reasonable amount of time  (though honestly, thinking about it
> now,
> > > the
> > > > > > correct parameter to use for the polling interval is the startup
> > wait
> > > > > time
> > > > > > in the parameters file :) ). It's not perfect, but it gives less
> > > false
> > > > > > positives than just checking ps for processes that look like
> > CCs/NCs.
> > > > > >
> > > > > > - Ian.
> > > > > >
> > > > > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
> > > bamousaa@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Now that I think about it. Maybe we should provide multiple
> ways
> > to
> > > > do
> > > > > > > this. A polling mechanism to be used for arbitrary time and a
> > > pushing
> > > > > > > mechanism on startup.
> > > > > > > I am going to start implementation of this and will probably
> use
> > > RMI
> > > > > for
> > > > > > > this task both ways (CC to InstallerDriver and InstallerDriver
> to
> > > > CC).
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Abdullah.
> > > > > > >
> > > > > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
> > > > bamousaa@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > So after further investigation, turned out our startup
> process
> > > just
> > > > > > > starts
> > > > > > > > the CC and NC processes and then make sure the processes are
> > > > running
> > > > > > and
> > > > > > > if
> > > > > > > > the processes were found to be running, it returns the state
> of
> > > the
> > > > > > > cluster
> > > > > > > > to be active and the subsequent test commands can start
> > > > immediately.
> > > > > > > >
> > > > > > > > This means that the CC could've started but is not yet ready
> > when
> > > > we
> > > > > > try
> > > > > > > > to process the next command. To address this, we need a
> better
> > > way
> > > > to
> > > > > > > tell
> > > > > > > > when the startup procedure has completed. we can do this by
> > > pushing
> > > > > (CC
> > > > > > > > informs installer driver when the startup is complete) or
> > polling
> > > > > (The
> > > > > > > > installer driver needs to actually query the CC for the state
> > of
> > > > the
> > > > > > > > cluster).
> > > > > > > >
> > > > > > > > I can do either way so let's vote. My vote goes to the
> pushing
> > > > > > mechanism.
> > > > > > > > Thoughts?
> > > > > > > >
> > > > > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> > > > > > bamousaa@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> This solution turned out to be incorrect. Actually, the test
> > > cases
> > > > > > when
> > > > > > > I
> > > > > > > >> build after using the join method never fails but running an
> > > > actual
> > > > > > > asterix
> > > > > > > >> instance never succeeds which is quite confusing.
> > > > > > > >>
> > > > > > > >> I also think that the startup script has a major bug where
> it
> > > > might
> > > > > > > >> returns before the startup is complete. More on this
> > later......
> > > > > > > >>
> > > > > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> > > > > > bamousaa@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >>> It is highly unlikely that it is related.
> > > > > > > >>>
> > > > > > > >>> Cheers,
> > > > > > > >>> Abdullah.
> > > > > > > >>>
> > > > > > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <chenli@gmail.com
> >
> > > > wrote:
> > > > > > > >>>
> > > > > > > >>>> @Abdullah: Is this issue related to
> > > > > > > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian
> > > and I
> > > > > > plan
> > > > > > > to
> > > > > > > >>>> look into the details on Monday.
> > > > > > > >>>>
> > > > > > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> > > > > > > bamousaa@gmail.com
> > > > > > > >>>> >
> > > > > > > >>>> wrote:
> > > > > > > >>>>
> > > > > > > >>>> > About 3-4 days ago, I was working on the addition of the
> > > > > > filesystem
> > > > > > > >>>> based
> > > > > > > >>>> > feed adapter and it didn't take anytime to complete.
> > > However,
> > > > > > when I
> > > > > > > >>>> wanted
> > > > > > > >>>> > to build and make sure all tests pass, I kept getting
> > > > > > > >>>> ConnectionRefused
> > > > > > > >>>> > errors which caused the installer tests to fail every
> now
> > > and
> > > > > > then.
> > > > > > > >>>> >
> > > > > > > >>>> > I knew the new change had nothing to do with this
> failure,
> > > > yet,
> > > > > I
> > > > > > > >>>> couldn't
> > > > > > > >>>> > direct my attention away from this bug (It just bothered
> > me
> > > so
> > > > > > much
> > > > > > > >>>> and I
> > > > > > > >>>> > knew it needs to be resolved ASAP). After wasting
> > countless
> > > > > > hours, I
> > > > > > > >>>> was
> > > > > > > >>>> > finally able to figure out what was happening :-)
> > > > > > > >>>> >
> > > > > > > >>>> > In the startup routine, we start three Jetty web servers
> > > (Web
> > > > > > > >>>> interface
> > > > > > > >>>> > server, JSON API server, and Feed server). Sometime ago,
> > we
> > > > used
> > > > > > to
> > > > > > > >>>> end the
> > > > > > > >>>> > startup call before making sure the server.isStarted()
> > > method
> > > > > > > returns
> > > > > > > >>>> true
> > > > > > > >>>> > on all servers. At that time, I introduced the
> > > > > > waitUntilServerStarts
> > > > > > > >>>> method
> > > > > > > >>>> > to make sure we don't return before the servers are
> ready.
> > > > > Turned
> > > > > > > >>>> out, that
> > > > > > > >>>> > was an incorrect way to handle this (We can blame
> > > > stackoverflow
> > > > > > for
> > > > > > > >>>> this
> > > > > > > >>>> > one!) and it is not enough that the server isStarted()
> > > returns
> > > > > > true.
> > > > > > > >>>> The
> > > > > > > >>>> > correct way to do this is to call the server.join()
> method
> > > > after
> > > > > > the
> > > > > > > >>>> > server.start().
> > > > > > > >>>> >
> > > > > > > >>>> > See:
> > > > > > > >>>> >
> > > > > > > >>>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > > > > > >>>> >
> > > > > > > >>>> > This was equally satisfying as it was frustrating and
> you
> > > are
> > > > > > > welcome
> > > > > > > >>>> for
> > > > > > > >>>> > the future time I saved each of you :)
> > > > > > > >>>> > --
> > > > > > > >>>> > Amoudi, Abdullah.
> > > > > > > >>>> >
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> --
> > > > > > > >>> Amoudi, Abdullah.
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Amoudi, Abdullah.
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Amoudi, Abdullah.
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Amoudi, Abdullah.
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Amoudi, Abdullah.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Amoudi, Abdullah.
> > >
> >
> >
> >
> > --
> > Raman
> >
>



-- 
Amoudi, Abdullah.

Re: The solution to the sporadic connection refused exceptions

Posted by Chris Hillery <ch...@hillery.land>.

Perhaps an aside, but: exactly what is kept in Zookeeper, and what code is
responsible for keeping it up-to-date?

Ceej

On Mon, Aug 24, 2015 at 5:28 PM, Raman Grover <ra...@gmail.com>
wrote:

> Well, the state of an instance (and metadata including configuration) is
> kept in Zookeeper instance that is accessible to Managix and CC. CC should
> be able to set the state of the cluster in Zookeeper under the right znode
> which can viewed by Managix.
>
> There exists a communication channel for CC and Managix to share
> information on state etc. I am not sure if we need another channel such as
> RMI between Managix and CC.
>
> Regards,
> Raman
>
>
>
> On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <ba...@gmail.com>
> wrote:
>
> > Well, it depends on your definition of the boundaries of managix. What I
> > did is that I added an RMI object in the InstallerDriver which basically
> > listen for state changes from the cluster controller. This means some
> > additional logic in the CCApplicationEntryPoint where after the CC is
> > ready, it contacts the InstallerDriver using RMI and at that point only,
> > the InstallerDriver can return to managix and tells it that the startup
> is
> > complete.
> >
> > Not sure if this is the right way to do it but it definitely is better
> than
> > what we currently have.
> > Abdullah.
> >
> > On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery <ch...@hillery.land>
> > wrote:
> >
> > > Hopefully the solution won't involve additional important logic inside
> > > Managix itself?
> > >
> > > Ceej
> > > aka Chris Hillery
> > >
> > > On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <bamousaa@gmail.com
> >
> > > wrote:
> > >
> > > > That works but it doesn't feel right doing it this way. I am going to
> > fix
> > > > this one for good.
> > > >
> > > > Cheers,
> > > > Abdullah.
> > > >
> > > > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu> wrote:
> > > >
> > > > > The way I assured liveness for the YARN installer was to try
> running
> > > "for
> > > > > $x in dataset Metadata.Dataset return $x" via the API. I just
> polled
> > > for
> > > > a
> > > > > reasonable amount of time  (though honestly, thinking about it now,
> > the
> > > > > correct parameter to use for the polling interval is the startup
> wait
> > > > time
> > > > > in the parameters file :) ). It's not perfect, but it gives less
> > false
> > > > > positives than just checking ps for processes that look like
> CCs/NCs.
> > > > >
> > > > > - Ian.
> > > > >
> > > > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
> > bamousaa@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Now that I think about it. Maybe we should provide multiple ways
> to
> > > do
> > > > > > this. A polling mechanism to be used for arbitrary time and a
> > pushing
> > > > > > mechanism on startup.
> > > > > > I am going to start implementation of this and will probably use
> > RMI
> > > > for
> > > > > > this task both ways (CC to InstallerDriver and InstallerDriver to
> > > CC).
> > > > > >
> > > > > > Cheers,
> > > > > > Abdullah.
> > > > > >
> > > > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
> > > bamousaa@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > So after further investigation, turned out our startup process
> > just
> > > > > > starts
> > > > > > > the CC and NC processes and then make sure the processes are
> > > running
> > > > > and
> > > > > > if
> > > > > > > the processes were found to be running, it returns the state of
> > the
> > > > > > cluster
> > > > > > > to be active and the subsequent test commands can start
> > > immediately.
> > > > > > >
> > > > > > > This means that the CC could've started but is not yet ready
> when
> > > we
> > > > > try
> > > > > > > to process the next command. To address this, we need a better
> > way
> > > to
> > > > > > tell
> > > > > > > when the startup procedure has completed. we can do this by
> > pushing
> > > > (CC
> > > > > > > informs installer driver when the startup is complete) or
> polling
> > > > (The
> > > > > > > installer driver needs to actually query the CC for the state
> of
> > > the
> > > > > > > cluster).
> > > > > > >
> > > > > > > I can do either way so let's vote. My vote goes to the pushing
> > > > > mechanism.
> > > > > > > Thoughts?
> > > > > > >
> > > > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> > > > > bamousaa@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> This solution turned out to be incorrect. Actually, the test
> > cases
> > > > > when
> > > > > > I
> > > > > > >> build after using the join method never fails but running an
> > > actual
> > > > > > asterix
> > > > > > >> instance never succeeds which is quite confusing.
> > > > > > >>
> > > > > > >> I also think that the startup script has a major bug where it
> > > might
> > > > > > >> returns before the startup is complete. More on this
> later......
> > > > > > >>
> > > > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> > > > > bamousaa@gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> It is highly unlikely that it is related.
> > > > > > >>>
> > > > > > >>> Cheers,
> > > > > > >>> Abdullah.
> > > > > > >>>
> > > > > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <ch...@gmail.com>
> > > wrote:
> > > > > > >>>
> > > > > > >>>> @Abdullah: Is this issue related to
> > > > > > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian
> > and I
> > > > > plan
> > > > > > to
> > > > > > >>>> look into the details on Monday.
> > > > > > >>>>
> > > > > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> > > > > > bamousaa@gmail.com
> > > > > > >>>> >
> > > > > > >>>> wrote:
> > > > > > >>>>
> > > > > > >>>> > About 3-4 days ago, I was working on the addition of the
> > > > > filesystem
> > > > > > >>>> based
> > > > > > >>>> > feed adapter and it didn't take anytime to complete.
> > However,
> > > > > when I
> > > > > > >>>> wanted
> > > > > > >>>> > to build and make sure all tests pass, I kept getting
> > > > > > >>>> ConnectionRefused
> > > > > > >>>> > errors which caused the installer tests to fail every now
> > and
> > > > > then.
> > > > > > >>>> >
> > > > > > >>>> > I knew the new change had nothing to do with this failure,
> > > yet,
> > > > I
> > > > > > >>>> couldn't
> > > > > > >>>> > direct my attention away from this bug (It just bothered
> me
> > so
> > > > > much
> > > > > > >>>> and I
> > > > > > >>>> > knew it needs to be resolved ASAP). After wasting
> countless
> > > > > hours, I
> > > > > > >>>> was
> > > > > > >>>> > finally able to figure out what was happening :-)
> > > > > > >>>> >
> > > > > > >>>> > In the startup routine, we start three Jetty web servers
> > (Web
> > > > > > >>>> interface
> > > > > > >>>> > server, JSON API server, and Feed server). Sometime ago,
> we
> > > used
> > > > > to
> > > > > > >>>> end the
> > > > > > >>>> > startup call before making sure the server.isStarted()
> > method
> > > > > > returns
> > > > > > >>>> true
> > > > > > >>>> > on all servers. At that time, I introduced the
> > > > > waitUntilServerStarts
> > > > > > >>>> method
> > > > > > >>>> > to make sure we don't return before the servers are ready.
> > > > Turned
> > > > > > >>>> out, that
> > > > > > >>>> > was an incorrect way to handle this (We can blame
> > > stackoverflow
> > > > > for
> > > > > > >>>> this
> > > > > > >>>> > one!) and it is not enough that the server isStarted()
> > returns
> > > > > true.
> > > > > > >>>> The
> > > > > > >>>> > correct way to do this is to call the server.join() method
> > > after
> > > > > the
> > > > > > >>>> > server.start().
> > > > > > >>>> >
> > > > > > >>>> > See:
> > > > > > >>>> >
> > > > > > >>>>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > > > > >>>> >
> > > > > > >>>> > This was equally satisfying as it was frustrating and you
> > are
> > > > > > welcome
> > > > > > >>>> for
> > > > > > >>>> > the future time I saved each of you :)
> > > > > > >>>> > --
> > > > > > >>>> > Amoudi, Abdullah.
> > > > > > >>>> >
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> --
> > > > > > >>> Amoudi, Abdullah.
> > > > > > >>>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Amoudi, Abdullah.
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Amoudi, Abdullah.
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Amoudi, Abdullah.
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Amoudi, Abdullah.
> > > >
> > >
> >
> >
> >
> > --
> > Amoudi, Abdullah.
> >
>
>
>
> --
> Raman
>

Re: The solution to the sporadic connection refused exceptions

Posted by abdullah alamoudi <ba...@gmail.com>.

Thanks Raman,
I already figured that out. I am currently looking into synchronization
(Even though I don't need it right now) but it can definitely become handy
in future tasks.

Cheers,
Abdullah.

P.S
Using ZooKeeper seems the cleaner/easier approach. Thanks for bringing this
up :-)

On Tue, Aug 25, 2015 at 9:06 AM, Raman Grover <ra...@gmail.com>
wrote:

> Abdullah, Zookeeper allows setting up a  "watch" on any node (recall that
> internally zookeeper maintains info as a tree of nodes, called znodes).
> Check out the zookeeper API here
> <
> http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkWatches
> >.
> Watch nodes are a common mechanism for notifying changes to information
> stored within a node, creation/deletion of a child under the node or
> deletion of node.
> One can attach listeners and be notified.
>
>
> Regards,
> Raman
>
> On Mon, Aug 24, 2015 at 9:47 PM, abdullah alamoudi <ba...@gmail.com>
> wrote:
>
> > @Raman,
> > I will look into doing it with Zookeeper.
> >
> > Is there a way to notify Managix once the cluster state has been updated
> in
> > Zookeeper? or would Managix have to poll and check the state?
> >
> > Cheers,
> > Abdullah.
> >
> > On Tue, Aug 25, 2015 at 3:28 AM, Raman Grover <ra...@gmail.com>
> > wrote:
> >
> > > Well, the state of an instance (and metadata including configuration)
> is
> > > kept in Zookeeper instance that is accessible to Managix and CC. CC
> > should
> > > be able to set the state of the cluster in Zookeeper under the right
> > znode
> > > which can viewed by Managix.
> > >
> > > There exists a communication channel for CC and Managix to share
> > > information on state etc. I am not sure if we need another channel such
> > as
> > > RMI between Managix and CC.
> > >
> > > Regards,
> > > Raman
> > >
> > >
> > >
> > > On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <
> bamousaa@gmail.com>
> > > wrote:
> > >
> > > > Well, it depends on your definition of the boundaries of managix.
> What
> > I
> > > > did is that I added an RMI object in the InstallerDriver which
> > basically
> > > > listen for state changes from the cluster controller. This means some
> > > > additional logic in the CCApplicationEntryPoint where after the CC is
> > > > ready, it contacts the InstallerDriver using RMI and at that point
> > only,
> > > > the InstallerDriver can return to managix and tells it that the
> startup
> > > is
> > > > complete.
> > > >
> > > > Not sure if this is the right way to do it but it definitely is
> better
> > > than
> > > > what we currently have.
> > > > Abdullah.
> > > >
> > > > On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery
> <chillery@hillery.land
> > >
> > > > wrote:
> > > >
> > > > > Hopefully the solution won't involve additional important logic
> > inside
> > > > > Managix itself?
> > > > >
> > > > > Ceej
> > > > > aka Chris Hillery
> > > > >
> > > > > On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <
> > bamousaa@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > That works but it doesn't feel right doing it this way. I am
> going
> > to
> > > > fix
> > > > > > this one for good.
> > > > > >
> > > > > > Cheers,
> > > > > > Abdullah.
> > > > > >
> > > > > > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu>
> wrote:
> > > > > >
> > > > > > > The way I assured liveness for the YARN installer was to try
> > > running
> > > > > "for
> > > > > > > $x in dataset Metadata.Dataset return $x" via the API. I just
> > > polled
> > > > > for
> > > > > > a
> > > > > > > reasonable amount of time  (though honestly, thinking about it
> > now,
> > > > the
> > > > > > > correct parameter to use for the polling interval is the
> startup
> > > wait
> > > > > > time
> > > > > > > in the parameters file :) ). It's not perfect, but it gives
> less
> > > > false
> > > > > > > positives than just checking ps for processes that look like
> > > CCs/NCs.
> > > > > > >
> > > > > > > - Ian.
> > > > > > >
> > > > > > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
> > > > bamousaa@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Now that I think about it. Maybe we should provide multiple
> > ways
> > > to
> > > > > do
> > > > > > > > this. A polling mechanism to be used for arbitrary time and a
> > > > pushing
> > > > > > > > mechanism on startup.
> > > > > > > > I am going to start implementation of this and will probably
> > use
> > > > RMI
> > > > > > for
> > > > > > > > this task both ways (CC to InstallerDriver and
> InstallerDriver
> > to
> > > > > CC).
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Abdullah.
> > > > > > > >
> > > > > > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
> > > > > bamousaa@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > So after further investigation, turned out our startup
> > process
> > > > just
> > > > > > > > starts
> > > > > > > > > the CC and NC processes and then make sure the processes
> are
> > > > > running
> > > > > > > and
> > > > > > > > if
> > > > > > > > > the processes were found to be running, it returns the
> state
> > of
> > > > the
> > > > > > > > cluster
> > > > > > > > > to be active and the subsequent test commands can start
> > > > > immediately.
> > > > > > > > >
> > > > > > > > > This means that the CC could've started but is not yet
> ready
> > > when
> > > > > we
> > > > > > > try
> > > > > > > > > to process the next command. To address this, we need a
> > better
> > > > way
> > > > > to
> > > > > > > > tell
> > > > > > > > > when the startup procedure has completed. we can do this by
> > > > pushing
> > > > > > (CC
> > > > > > > > > informs installer driver when the startup is complete) or
> > > polling
> > > > > > (The
> > > > > > > > > installer driver needs to actually query the CC for the
> state
> > > of
> > > > > the
> > > > > > > > > cluster).
> > > > > > > > >
> > > > > > > > > I can do either way so let's vote. My vote goes to the
> > pushing
> > > > > > > mechanism.
> > > > > > > > > Thoughts?
> > > > > > > > >
> > > > > > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> > > > > > > bamousaa@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> This solution turned out to be incorrect. Actually, the
> test
> > > > cases
> > > > > > > when
> > > > > > > > I
> > > > > > > > >> build after using the join method never fails but running
> an
> > > > > actual
> > > > > > > > asterix
> > > > > > > > >> instance never succeeds which is quite confusing.
> > > > > > > > >>
> > > > > > > > >> I also think that the startup script has a major bug where
> > it
> > > > > might
> > > > > > > > >> returns before the startup is complete. More on this
> > > later......
> > > > > > > > >>
> > > > > > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> > > > > > > bamousaa@gmail.com>
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >>> It is highly unlikely that it is related.
> > > > > > > > >>>
> > > > > > > > >>> Cheers,
> > > > > > > > >>> Abdullah.
> > > > > > > > >>>
> > > > > > > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <
> chenli@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >>>
> > > > > > > > >>>> @Abdullah: Is this issue related to
> > > > > > > > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074?
> Ian
> > > > and I
> > > > > > > plan
> > > > > > > > to
> > > > > > > > >>>> look into the details on Monday.
> > > > > > > > >>>>
> > > > > > > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> > > > > > > > bamousaa@gmail.com
> > > > > > > > >>>> >
> > > > > > > > >>>> wrote:
> > > > > > > > >>>>
> > > > > > > > >>>> > About 3-4 days ago, I was working on the addition of
> the
> > > > > > > filesystem
> > > > > > > > >>>> based
> > > > > > > > >>>> > feed adapter and it didn't take anytime to complete.
> > > > However,
> > > > > > > when I
> > > > > > > > >>>> wanted
> > > > > > > > >>>> > to build and make sure all tests pass, I kept getting
> > > > > > > > >>>> ConnectionRefused
> > > > > > > > >>>> > errors which caused the installer tests to fail every
> > now
> > > > and
> > > > > > > then.
> > > > > > > > >>>> >
> > > > > > > > >>>> > I knew the new change had nothing to do with this
> > failure,
> > > > > yet,
> > > > > > I
> > > > > > > > >>>> couldn't
> > > > > > > > >>>> > direct my attention away from this bug (It just
> bothered
> > > me
> > > > so
> > > > > > > much
> > > > > > > > >>>> and I
> > > > > > > > >>>> > knew it needs to be resolved ASAP). After wasting
> > > countless
> > > > > > > hours, I
> > > > > > > > >>>> was
> > > > > > > > >>>> > finally able to figure out what was happening :-)
> > > > > > > > >>>> >
> > > > > > > > >>>> > In the startup routine, we start three Jetty web
> servers
> > > > (Web
> > > > > > > > >>>> interface
> > > > > > > > >>>> > server, JSON API server, and Feed server). Sometime
> ago,
> > > we
> > > > > used
> > > > > > > to
> > > > > > > > >>>> end the
> > > > > > > > >>>> > startup call before making sure the server.isStarted()
> > > > method
> > > > > > > > returns
> > > > > > > > >>>> true
> > > > > > > > >>>> > on all servers. At that time, I introduced the
> > > > > > > waitUntilServerStarts
> > > > > > > > >>>> method
> > > > > > > > >>>> > to make sure we don't return before the servers are
> > ready.
> > > > > > Turned
> > > > > > > > >>>> out, that
> > > > > > > > >>>> > was an incorrect way to handle this (We can blame
> > > > > stackoverflow
> > > > > > > for
> > > > > > > > >>>> this
> > > > > > > > >>>> > one!) and it is not enough that the server isStarted()
> > > > returns
> > > > > > > true.
> > > > > > > > >>>> The
> > > > > > > > >>>> > correct way to do this is to call the server.join()
> > method
> > > > > after
> > > > > > > the
> > > > > > > > >>>> > server.start().
> > > > > > > > >>>> >
> > > > > > > > >>>> > See:
> > > > > > > > >>>> >
> > > > > > > > >>>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > > > > > > >>>> >
> > > > > > > > >>>> > This was equally satisfying as it was frustrating and
> > you
> > > > are
> > > > > > > > welcome
> > > > > > > > >>>> for
> > > > > > > > >>>> > the future time I saved each of you :)
> > > > > > > > >>>> > --
> > > > > > > > >>>> > Amoudi, Abdullah.
> > > > > > > > >>>> >
> > > > > > > > >>>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> --
> > > > > > > > >>> Amoudi, Abdullah.
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> Amoudi, Abdullah.
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Amoudi, Abdullah.
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Amoudi, Abdullah.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Amoudi, Abdullah.
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Amoudi, Abdullah.
> > > >
> > >
> > >
> > >
> > > --
> > > Raman
> > >
> >
> >
> >
> > --
> > Amoudi, Abdullah.
> >
>
>
>
> --
> Raman
>



-- 
Amoudi, Abdullah.

Re: The solution to the sporadic connection refused exceptions

Posted by Raman Grover <ra...@gmail.com>.

Abdullah, Zookeeper allows setting up a  "watch" on any node (recall that
internally zookeeper maintains info as a tree of nodes, called znodes).
Check out the zookeeper API here
<http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkWatches>.
Watch nodes are a common mechanism for notifying changes to information
stored within a node, creation/deletion of a child under the node or
deletion of node.
One can attach listeners and be notified.


Regards,
Raman

On Mon, Aug 24, 2015 at 9:47 PM, abdullah alamoudi <ba...@gmail.com>
wrote:

> @Raman,
> I will look into doing it with Zookeeper.
>
> Is there a way to notify Managix once the cluster state has been updated in
> Zookeeper? or would Managix have to poll and check the state?
>
> Cheers,
> Abdullah.
>
> On Tue, Aug 25, 2015 at 3:28 AM, Raman Grover <ra...@gmail.com>
> wrote:
>
> > Well, the state of an instance (and metadata including configuration) is
> > kept in Zookeeper instance that is accessible to Managix and CC. CC
> should
> > be able to set the state of the cluster in Zookeeper under the right
> znode
> > which can viewed by Managix.
> >
> > There exists a communication channel for CC and Managix to share
> > information on state etc. I am not sure if we need another channel such
> as
> > RMI between Managix and CC.
> >
> > Regards,
> > Raman
> >
> >
> >
> > On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <ba...@gmail.com>
> > wrote:
> >
> > > Well, it depends on your definition of the boundaries of managix. What
> I
> > > did is that I added an RMI object in the InstallerDriver which
> basically
> > > listen for state changes from the cluster controller. This means some
> > > additional logic in the CCApplicationEntryPoint where after the CC is
> > > ready, it contacts the InstallerDriver using RMI and at that point
> only,
> > > the InstallerDriver can return to managix and tells it that the startup
> > is
> > > complete.
> > >
> > > Not sure if this is the right way to do it but it definitely is better
> > than
> > > what we currently have.
> > > Abdullah.
> > >
> > > On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery <chillery@hillery.land
> >
> > > wrote:
> > >
> > > > Hopefully the solution won't involve additional important logic
> inside
> > > > Managix itself?
> > > >
> > > > Ceej
> > > > aka Chris Hillery
> > > >
> > > > On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <
> bamousaa@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > That works but it doesn't feel right doing it this way. I am going
> to
> > > fix
> > > > > this one for good.
> > > > >
> > > > > Cheers,
> > > > > Abdullah.
> > > > >
> > > > > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu> wrote:
> > > > >
> > > > > > The way I assured liveness for the YARN installer was to try
> > running
> > > > "for
> > > > > > $x in dataset Metadata.Dataset return $x" via the API. I just
> > polled
> > > > for
> > > > > a
> > > > > > reasonable amount of time  (though honestly, thinking about it
> now,
> > > the
> > > > > > correct parameter to use for the polling interval is the startup
> > wait
> > > > > time
> > > > > > in the parameters file :) ). It's not perfect, but it gives less
> > > false
> > > > > > positives than just checking ps for processes that look like
> > CCs/NCs.
> > > > > >
> > > > > > - Ian.
> > > > > >
> > > > > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
> > > bamousaa@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Now that I think about it. Maybe we should provide multiple
> ways
> > to
> > > > do
> > > > > > > this. A polling mechanism to be used for arbitrary time and a
> > > pushing
> > > > > > > mechanism on startup.
> > > > > > > I am going to start implementation of this and will probably
> use
> > > RMI
> > > > > for
> > > > > > > this task both ways (CC to InstallerDriver and InstallerDriver
> to
> > > > CC).
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Abdullah.
> > > > > > >
> > > > > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
> > > > bamousaa@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > So after further investigation, turned out our startup
> process
> > > just
> > > > > > > starts
> > > > > > > > the CC and NC processes and then make sure the processes are
> > > > running
> > > > > > and
> > > > > > > if
> > > > > > > > the processes were found to be running, it returns the state
> of
> > > the
> > > > > > > cluster
> > > > > > > > to be active and the subsequent test commands can start
> > > > immediately.
> > > > > > > >
> > > > > > > > This means that the CC could've started but is not yet ready
> > when
> > > > we
> > > > > > try
> > > > > > > > to process the next command. To address this, we need a
> better
> > > way
> > > > to
> > > > > > > tell
> > > > > > > > when the startup procedure has completed. we can do this by
> > > pushing
> > > > > (CC
> > > > > > > > informs installer driver when the startup is complete) or
> > polling
> > > > > (The
> > > > > > > > installer driver needs to actually query the CC for the state
> > of
> > > > the
> > > > > > > > cluster).
> > > > > > > >
> > > > > > > > I can do either way so let's vote. My vote goes to the
> pushing
> > > > > > mechanism.
> > > > > > > > Thoughts?
> > > > > > > >
> > > > > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> > > > > > bamousaa@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> This solution turned out to be incorrect. Actually, the test
> > > cases
> > > > > > when
> > > > > > > I
> > > > > > > >> build after using the join method never fails but running an
> > > > actual
> > > > > > > asterix
> > > > > > > >> instance never succeeds which is quite confusing.
> > > > > > > >>
> > > > > > > >> I also think that the startup script has a major bug where
> it
> > > > might
> > > > > > > >> returns before the startup is complete. More on this
> > later......
> > > > > > > >>
> > > > > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> > > > > > bamousaa@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >>> It is highly unlikely that it is related.
> > > > > > > >>>
> > > > > > > >>> Cheers,
> > > > > > > >>> Abdullah.
> > > > > > > >>>
> > > > > > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <chenli@gmail.com
> >
> > > > wrote:
> > > > > > > >>>
> > > > > > > >>>> @Abdullah: Is this issue related to
> > > > > > > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian
> > > and I
> > > > > > plan
> > > > > > > to
> > > > > > > >>>> look into the details on Monday.
> > > > > > > >>>>
> > > > > > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> > > > > > > bamousaa@gmail.com
> > > > > > > >>>> >
> > > > > > > >>>> wrote:
> > > > > > > >>>>
> > > > > > > >>>> > About 3-4 days ago, I was working on the addition of the
> > > > > > filesystem
> > > > > > > >>>> based
> > > > > > > >>>> > feed adapter and it didn't take anytime to complete.
> > > However,
> > > > > > when I
> > > > > > > >>>> wanted
> > > > > > > >>>> > to build and make sure all tests pass, I kept getting
> > > > > > > >>>> ConnectionRefused
> > > > > > > >>>> > errors which caused the installer tests to fail every
> now
> > > and
> > > > > > then.
> > > > > > > >>>> >
> > > > > > > >>>> > I knew the new change had nothing to do with this
> failure,
> > > > yet,
> > > > > I
> > > > > > > >>>> couldn't
> > > > > > > >>>> > direct my attention away from this bug (It just bothered
> > me
> > > so
> > > > > > much
> > > > > > > >>>> and I
> > > > > > > >>>> > knew it needs to be resolved ASAP). After wasting
> > countless
> > > > > > hours, I
> > > > > > > >>>> was
> > > > > > > >>>> > finally able to figure out what was happening :-)
> > > > > > > >>>> >
> > > > > > > >>>> > In the startup routine, we start three Jetty web servers
> > > (Web
> > > > > > > >>>> interface
> > > > > > > >>>> > server, JSON API server, and Feed server). Sometime ago,
> > we
> > > > used
> > > > > > to
> > > > > > > >>>> end the
> > > > > > > >>>> > startup call before making sure the server.isStarted()
> > > method
> > > > > > > returns
> > > > > > > >>>> true
> > > > > > > >>>> > on all servers. At that time, I introduced the
> > > > > > waitUntilServerStarts
> > > > > > > >>>> method
> > > > > > > >>>> > to make sure we don't return before the servers are
> ready.
> > > > > Turned
> > > > > > > >>>> out, that
> > > > > > > >>>> > was an incorrect way to handle this (We can blame
> > > > stackoverflow
> > > > > > for
> > > > > > > >>>> this
> > > > > > > >>>> > one!) and it is not enough that the server isStarted()
> > > returns
> > > > > > true.
> > > > > > > >>>> The
> > > > > > > >>>> > correct way to do this is to call the server.join()
> method
> > > > after
> > > > > > the
> > > > > > > >>>> > server.start().
> > > > > > > >>>> >
> > > > > > > >>>> > See:
> > > > > > > >>>> >
> > > > > > > >>>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > > > > > >>>> >
> > > > > > > >>>> > This was equally satisfying as it was frustrating and
> you
> > > are
> > > > > > > welcome
> > > > > > > >>>> for
> > > > > > > >>>> > the future time I saved each of you :)
> > > > > > > >>>> > --
> > > > > > > >>>> > Amoudi, Abdullah.
> > > > > > > >>>> >
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> --
> > > > > > > >>> Amoudi, Abdullah.
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Amoudi, Abdullah.
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Amoudi, Abdullah.
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Amoudi, Abdullah.
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Amoudi, Abdullah.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Amoudi, Abdullah.
> > >
> >
> >
> >
> > --
> > Raman
> >
>
>
>
> --
> Amoudi, Abdullah.
>



-- 
Raman

Re: The solution to the sporadic connection refused exceptions

Posted by abdullah alamoudi <ba...@gmail.com>.

@Raman,
I will look into doing it with Zookeeper.

Is there a way to notify Managix once the cluster state has been updated in
Zookeeper? or would Managix have to poll and check the state?

Cheers,
Abdullah.

On Tue, Aug 25, 2015 at 3:28 AM, Raman Grover <ra...@gmail.com>
wrote:

> Well, the state of an instance (and metadata including configuration) is
> kept in Zookeeper instance that is accessible to Managix and CC. CC should
> be able to set the state of the cluster in Zookeeper under the right znode
> which can viewed by Managix.
>
> There exists a communication channel for CC and Managix to share
> information on state etc. I am not sure if we need another channel such as
> RMI between Managix and CC.
>
> Regards,
> Raman
>
>
>
> On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <ba...@gmail.com>
> wrote:
>
> > Well, it depends on your definition of the boundaries of managix. What I
> > did is that I added an RMI object in the InstallerDriver which basically
> > listen for state changes from the cluster controller. This means some
> > additional logic in the CCApplicationEntryPoint where after the CC is
> > ready, it contacts the InstallerDriver using RMI and at that point only,
> > the InstallerDriver can return to managix and tells it that the startup
> is
> > complete.
> >
> > Not sure if this is the right way to do it but it definitely is better
> than
> > what we currently have.
> > Abdullah.
> >
> > On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery <ch...@hillery.land>
> > wrote:
> >
> > > Hopefully the solution won't involve additional important logic inside
> > > Managix itself?
> > >
> > > Ceej
> > > aka Chris Hillery
> > >
> > > On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <bamousaa@gmail.com
> >
> > > wrote:
> > >
> > > > That works but it doesn't feel right doing it this way. I am going to
> > fix
> > > > this one for good.
> > > >
> > > > Cheers,
> > > > Abdullah.
> > > >
> > > > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu> wrote:
> > > >
> > > > > The way I assured liveness for the YARN installer was to try
> running
> > > "for
> > > > > $x in dataset Metadata.Dataset return $x" via the API. I just
> polled
> > > for
> > > > a
> > > > > reasonable amount of time  (though honestly, thinking about it now,
> > the
> > > > > correct parameter to use for the polling interval is the startup
> wait
> > > > time
> > > > > in the parameters file :) ). It's not perfect, but it gives less
> > false
> > > > > positives than just checking ps for processes that look like
> CCs/NCs.
> > > > >
> > > > > - Ian.
> > > > >
> > > > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
> > bamousaa@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Now that I think about it. Maybe we should provide multiple ways
> to
> > > do
> > > > > > this. A polling mechanism to be used for arbitrary time and a
> > pushing
> > > > > > mechanism on startup.
> > > > > > I am going to start implementation of this and will probably use
> > RMI
> > > > for
> > > > > > this task both ways (CC to InstallerDriver and InstallerDriver to
> > > CC).
> > > > > >
> > > > > > Cheers,
> > > > > > Abdullah.
> > > > > >
> > > > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
> > > bamousaa@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > So after further investigation, turned out our startup process
> > just
> > > > > > starts
> > > > > > > the CC and NC processes and then make sure the processes are
> > > running
> > > > > and
> > > > > > if
> > > > > > > the processes were found to be running, it returns the state of
> > the
> > > > > > cluster
> > > > > > > to be active and the subsequent test commands can start
> > > immediately.
> > > > > > >
> > > > > > > This means that the CC could've started but is not yet ready
> when
> > > we
> > > > > try
> > > > > > > to process the next command. To address this, we need a better
> > way
> > > to
> > > > > > tell
> > > > > > > when the startup procedure has completed. we can do this by
> > pushing
> > > > (CC
> > > > > > > informs installer driver when the startup is complete) or
> polling
> > > > (The
> > > > > > > installer driver needs to actually query the CC for the state
> of
> > > the
> > > > > > > cluster).
> > > > > > >
> > > > > > > I can do either way so let's vote. My vote goes to the pushing
> > > > > mechanism.
> > > > > > > Thoughts?
> > > > > > >
> > > > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> > > > > bamousaa@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> This solution turned out to be incorrect. Actually, the test
> > cases
> > > > > when
> > > > > > I
> > > > > > >> build after using the join method never fails but running an
> > > actual
> > > > > > asterix
> > > > > > >> instance never succeeds which is quite confusing.
> > > > > > >>
> > > > > > >> I also think that the startup script has a major bug where it
> > > might
> > > > > > >> returns before the startup is complete. More on this
> later......
> > > > > > >>
> > > > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> > > > > bamousaa@gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> It is highly unlikely that it is related.
> > > > > > >>>
> > > > > > >>> Cheers,
> > > > > > >>> Abdullah.
> > > > > > >>>
> > > > > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <ch...@gmail.com>
> > > wrote:
> > > > > > >>>
> > > > > > >>>> @Abdullah: Is this issue related to
> > > > > > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian
> > and I
> > > > > plan
> > > > > > to
> > > > > > >>>> look into the details on Monday.
> > > > > > >>>>
> > > > > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> > > > > > bamousaa@gmail.com
> > > > > > >>>> >
> > > > > > >>>> wrote:
> > > > > > >>>>
> > > > > > >>>> > About 3-4 days ago, I was working on the addition of the
> > > > > filesystem
> > > > > > >>>> based
> > > > > > >>>> > feed adapter and it didn't take anytime to complete.
> > However,
> > > > > when I
> > > > > > >>>> wanted
> > > > > > >>>> > to build and make sure all tests pass, I kept getting
> > > > > > >>>> ConnectionRefused
> > > > > > >>>> > errors which caused the installer tests to fail every now
> > and
> > > > > then.
> > > > > > >>>> >
> > > > > > >>>> > I knew the new change had nothing to do with this failure,
> > > yet,
> > > > I
> > > > > > >>>> couldn't
> > > > > > >>>> > direct my attention away from this bug (It just bothered
> me
> > so
> > > > > much
> > > > > > >>>> and I
> > > > > > >>>> > knew it needs to be resolved ASAP). After wasting
> countless
> > > > > hours, I
> > > > > > >>>> was
> > > > > > >>>> > finally able to figure out what was happening :-)
> > > > > > >>>> >
> > > > > > >>>> > In the startup routine, we start three Jetty web servers
> > (Web
> > > > > > >>>> interface
> > > > > > >>>> > server, JSON API server, and Feed server). Sometime ago,
> we
> > > used
> > > > > to
> > > > > > >>>> end the
> > > > > > >>>> > startup call before making sure the server.isStarted()
> > method
> > > > > > returns
> > > > > > >>>> true
> > > > > > >>>> > on all servers. At that time, I introduced the
> > > > > waitUntilServerStarts
> > > > > > >>>> method
> > > > > > >>>> > to make sure we don't return before the servers are ready.
> > > > Turned
> > > > > > >>>> out, that
> > > > > > >>>> > was an incorrect way to handle this (We can blame
> > > stackoverflow
> > > > > for
> > > > > > >>>> this
> > > > > > >>>> > one!) and it is not enough that the server isStarted()
> > returns
> > > > > true.
> > > > > > >>>> The
> > > > > > >>>> > correct way to do this is to call the server.join() method
> > > after
> > > > > the
> > > > > > >>>> > server.start().
> > > > > > >>>> >
> > > > > > >>>> > See:
> > > > > > >>>> >
> > > > > > >>>>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > > > > >>>> >
> > > > > > >>>> > This was equally satisfying as it was frustrating and you
> > are
> > > > > > welcome
> > > > > > >>>> for
> > > > > > >>>> > the future time I saved each of you :)
> > > > > > >>>> > --
> > > > > > >>>> > Amoudi, Abdullah.
> > > > > > >>>> >
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> --
> > > > > > >>> Amoudi, Abdullah.
> > > > > > >>>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Amoudi, Abdullah.
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Amoudi, Abdullah.
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Amoudi, Abdullah.
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Amoudi, Abdullah.
> > > >
> > >
> >
> >
> >
> > --
> > Amoudi, Abdullah.
> >
>
>
>
> --
> Raman
>



-- 
Amoudi, Abdullah.

Re: The solution to the sporadic connection refused exceptions

Posted by Raman Grover <ra...@gmail.com>.

Well, the state of an instance (and metadata including configuration) is
kept in Zookeeper instance that is accessible to Managix and CC. CC should
be able to set the state of the cluster in Zookeeper under the right znode
which can viewed by Managix.

There exists a communication channel for CC and Managix to share
information on state etc. I am not sure if we need another channel such as
RMI between Managix and CC.

Regards,
Raman



On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <ba...@gmail.com>
wrote:

> Well, it depends on your definition of the boundaries of managix. What I
> did is that I added an RMI object in the InstallerDriver which basically
> listen for state changes from the cluster controller. This means some
> additional logic in the CCApplicationEntryPoint where after the CC is
> ready, it contacts the InstallerDriver using RMI and at that point only,
> the InstallerDriver can return to managix and tells it that the startup is
> complete.
>
> Not sure if this is the right way to do it but it definitely is better than
> what we currently have.
> Abdullah.
>
> On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery <ch...@hillery.land>
> wrote:
>
> > Hopefully the solution won't involve additional important logic inside
> > Managix itself?
> >
> > Ceej
> > aka Chris Hillery
> >
> > On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <ba...@gmail.com>
> > wrote:
> >
> > > That works but it doesn't feel right doing it this way. I am going to
> fix
> > > this one for good.
> > >
> > > Cheers,
> > > Abdullah.
> > >
> > > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu> wrote:
> > >
> > > > The way I assured liveness for the YARN installer was to try running
> > "for
> > > > $x in dataset Metadata.Dataset return $x" via the API. I just polled
> > for
> > > a
> > > > reasonable amount of time  (though honestly, thinking about it now,
> the
> > > > correct parameter to use for the polling interval is the startup wait
> > > time
> > > > in the parameters file :) ). It's not perfect, but it gives less
> false
> > > > positives than just checking ps for processes that look like CCs/NCs.
> > > >
> > > > - Ian.
> > > >
> > > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
> bamousaa@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Now that I think about it. Maybe we should provide multiple ways to
> > do
> > > > > this. A polling mechanism to be used for arbitrary time and a
> pushing
> > > > > mechanism on startup.
> > > > > I am going to start implementation of this and will probably use
> RMI
> > > for
> > > > > this task both ways (CC to InstallerDriver and InstallerDriver to
> > CC).
> > > > >
> > > > > Cheers,
> > > > > Abdullah.
> > > > >
> > > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
> > bamousaa@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > So after further investigation, turned out our startup process
> just
> > > > > starts
> > > > > > the CC and NC processes and then make sure the processes are
> > running
> > > > and
> > > > > if
> > > > > > the processes were found to be running, it returns the state of
> the
> > > > > cluster
> > > > > > to be active and the subsequent test commands can start
> > immediately.
> > > > > >
> > > > > > This means that the CC could've started but is not yet ready when
> > we
> > > > try
> > > > > > to process the next command. To address this, we need a better
> way
> > to
> > > > > tell
> > > > > > when the startup procedure has completed. we can do this by
> pushing
> > > (CC
> > > > > > informs installer driver when the startup is complete) or polling
> > > (The
> > > > > > installer driver needs to actually query the CC for the state of
> > the
> > > > > > cluster).
> > > > > >
> > > > > > I can do either way so let's vote. My vote goes to the pushing
> > > > mechanism.
> > > > > > Thoughts?
> > > > > >
> > > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> > > > bamousaa@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> This solution turned out to be incorrect. Actually, the test
> cases
> > > > when
> > > > > I
> > > > > >> build after using the join method never fails but running an
> > actual
> > > > > asterix
> > > > > >> instance never succeeds which is quite confusing.
> > > > > >>
> > > > > >> I also think that the startup script has a major bug where it
> > might
> > > > > >> returns before the startup is complete. More on this later......
> > > > > >>
> > > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> > > > bamousaa@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> It is highly unlikely that it is related.
> > > > > >>>
> > > > > >>> Cheers,
> > > > > >>> Abdullah.
> > > > > >>>
> > > > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <ch...@gmail.com>
> > wrote:
> > > > > >>>
> > > > > >>>> @Abdullah: Is this issue related to
> > > > > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian
> and I
> > > > plan
> > > > > to
> > > > > >>>> look into the details on Monday.
> > > > > >>>>
> > > > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> > > > > bamousaa@gmail.com
> > > > > >>>> >
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>> > About 3-4 days ago, I was working on the addition of the
> > > > filesystem
> > > > > >>>> based
> > > > > >>>> > feed adapter and it didn't take anytime to complete.
> However,
> > > > when I
> > > > > >>>> wanted
> > > > > >>>> > to build and make sure all tests pass, I kept getting
> > > > > >>>> ConnectionRefused
> > > > > >>>> > errors which caused the installer tests to fail every now
> and
> > > > then.
> > > > > >>>> >
> > > > > >>>> > I knew the new change had nothing to do with this failure,
> > yet,
> > > I
> > > > > >>>> couldn't
> > > > > >>>> > direct my attention away from this bug (It just bothered me
> so
> > > > much
> > > > > >>>> and I
> > > > > >>>> > knew it needs to be resolved ASAP). After wasting countless
> > > > hours, I
> > > > > >>>> was
> > > > > >>>> > finally able to figure out what was happening :-)
> > > > > >>>> >
> > > > > >>>> > In the startup routine, we start three Jetty web servers
> (Web
> > > > > >>>> interface
> > > > > >>>> > server, JSON API server, and Feed server). Sometime ago, we
> > used
> > > > to
> > > > > >>>> end the
> > > > > >>>> > startup call before making sure the server.isStarted()
> method
> > > > > returns
> > > > > >>>> true
> > > > > >>>> > on all servers. At that time, I introduced the
> > > > waitUntilServerStarts
> > > > > >>>> method
> > > > > >>>> > to make sure we don't return before the servers are ready.
> > > Turned
> > > > > >>>> out, that
> > > > > >>>> > was an incorrect way to handle this (We can blame
> > stackoverflow
> > > > for
> > > > > >>>> this
> > > > > >>>> > one!) and it is not enough that the server isStarted()
> returns
> > > > true.
> > > > > >>>> The
> > > > > >>>> > correct way to do this is to call the server.join() method
> > after
> > > > the
> > > > > >>>> > server.start().
> > > > > >>>> >
> > > > > >>>> > See:
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > > > >>>> >
> > > > > >>>> > This was equally satisfying as it was frustrating and you
> are
> > > > > welcome
> > > > > >>>> for
> > > > > >>>> > the future time I saved each of you :)
> > > > > >>>> > --
> > > > > >>>> > Amoudi, Abdullah.
> > > > > >>>> >
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>> Amoudi, Abdullah.
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Amoudi, Abdullah.
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Amoudi, Abdullah.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Amoudi, Abdullah.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Amoudi, Abdullah.
> > >
> >
>
>
>
> --
> Amoudi, Abdullah.
>



-- 
Raman

Re: The solution to the sporadic connection refused exceptions

Posted by abdullah alamoudi <ba...@gmail.com>.

Well, it depends on your definition of the boundaries of managix. What I
did is that I added an RMI object in the InstallerDriver which basically
listen for state changes from the cluster controller. This means some
additional logic in the CCApplicationEntryPoint where after the CC is
ready, it contacts the InstallerDriver using RMI and at that point only,
the InstallerDriver can return to managix and tells it that the startup is
complete.

Not sure if this is the right way to do it but it definitely is better than
what we currently have.
Abdullah.

On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery <ch...@hillery.land>
wrote:

> Hopefully the solution won't involve additional important logic inside
> Managix itself?
>
> Ceej
> aka Chris Hillery
>
> On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <ba...@gmail.com>
> wrote:
>
> > That works but it doesn't feel right doing it this way. I am going to fix
> > this one for good.
> >
> > Cheers,
> > Abdullah.
> >
> > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu> wrote:
> >
> > > The way I assured liveness for the YARN installer was to try running
> "for
> > > $x in dataset Metadata.Dataset return $x" via the API. I just polled
> for
> > a
> > > reasonable amount of time  (though honestly, thinking about it now, the
> > > correct parameter to use for the polling interval is the startup wait
> > time
> > > in the parameters file :) ). It's not perfect, but it gives less false
> > > positives than just checking ps for processes that look like CCs/NCs.
> > >
> > > - Ian.
> > >
> > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <bamousaa@gmail.com
> >
> > > wrote:
> > >
> > > > Now that I think about it. Maybe we should provide multiple ways to
> do
> > > > this. A polling mechanism to be used for arbitrary time and a pushing
> > > > mechanism on startup.
> > > > I am going to start implementation of this and will probably use RMI
> > for
> > > > this task both ways (CC to InstallerDriver and InstallerDriver to
> CC).
> > > >
> > > > Cheers,
> > > > Abdullah.
> > > >
> > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
> bamousaa@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > So after further investigation, turned out our startup process just
> > > > starts
> > > > > the CC and NC processes and then make sure the processes are
> running
> > > and
> > > > if
> > > > > the processes were found to be running, it returns the state of the
> > > > cluster
> > > > > to be active and the subsequent test commands can start
> immediately.
> > > > >
> > > > > This means that the CC could've started but is not yet ready when
> we
> > > try
> > > > > to process the next command. To address this, we need a better way
> to
> > > > tell
> > > > > when the startup procedure has completed. we can do this by pushing
> > (CC
> > > > > informs installer driver when the startup is complete) or polling
> > (The
> > > > > installer driver needs to actually query the CC for the state of
> the
> > > > > cluster).
> > > > >
> > > > > I can do either way so let's vote. My vote goes to the pushing
> > > mechanism.
> > > > > Thoughts?
> > > > >
> > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> > > bamousaa@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> This solution turned out to be incorrect. Actually, the test cases
> > > when
> > > > I
> > > > >> build after using the join method never fails but running an
> actual
> > > > asterix
> > > > >> instance never succeeds which is quite confusing.
> > > > >>
> > > > >> I also think that the startup script has a major bug where it
> might
> > > > >> returns before the startup is complete. More on this later......
> > > > >>
> > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> > > bamousaa@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>> It is highly unlikely that it is related.
> > > > >>>
> > > > >>> Cheers,
> > > > >>> Abdullah.
> > > > >>>
> > > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <ch...@gmail.com>
> wrote:
> > > > >>>
> > > > >>>> @Abdullah: Is this issue related to
> > > > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian and I
> > > plan
> > > > to
> > > > >>>> look into the details on Monday.
> > > > >>>>
> > > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> > > > bamousaa@gmail.com
> > > > >>>> >
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>> > About 3-4 days ago, I was working on the addition of the
> > > filesystem
> > > > >>>> based
> > > > >>>> > feed adapter and it didn't take anytime to complete. However,
> > > when I
> > > > >>>> wanted
> > > > >>>> > to build and make sure all tests pass, I kept getting
> > > > >>>> ConnectionRefused
> > > > >>>> > errors which caused the installer tests to fail every now and
> > > then.
> > > > >>>> >
> > > > >>>> > I knew the new change had nothing to do with this failure,
> yet,
> > I
> > > > >>>> couldn't
> > > > >>>> > direct my attention away from this bug (It just bothered me so
> > > much
> > > > >>>> and I
> > > > >>>> > knew it needs to be resolved ASAP). After wasting countless
> > > hours, I
> > > > >>>> was
> > > > >>>> > finally able to figure out what was happening :-)
> > > > >>>> >
> > > > >>>> > In the startup routine, we start three Jetty web servers (Web
> > > > >>>> interface
> > > > >>>> > server, JSON API server, and Feed server). Sometime ago, we
> used
> > > to
> > > > >>>> end the
> > > > >>>> > startup call before making sure the server.isStarted() method
> > > > returns
> > > > >>>> true
> > > > >>>> > on all servers. At that time, I introduced the
> > > waitUntilServerStarts
> > > > >>>> method
> > > > >>>> > to make sure we don't return before the servers are ready.
> > Turned
> > > > >>>> out, that
> > > > >>>> > was an incorrect way to handle this (We can blame
> stackoverflow
> > > for
> > > > >>>> this
> > > > >>>> > one!) and it is not enough that the server isStarted() returns
> > > true.
> > > > >>>> The
> > > > >>>> > correct way to do this is to call the server.join() method
> after
> > > the
> > > > >>>> > server.start().
> > > > >>>> >
> > > > >>>> > See:
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > > >>>> >
> > > > >>>> > This was equally satisfying as it was frustrating and you are
> > > > welcome
> > > > >>>> for
> > > > >>>> > the future time I saved each of you :)
> > > > >>>> > --
> > > > >>>> > Amoudi, Abdullah.
> > > > >>>> >
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>> Amoudi, Abdullah.
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Amoudi, Abdullah.
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Amoudi, Abdullah.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Amoudi, Abdullah.
> > > >
> > >
> >
> >
> >
> > --
> > Amoudi, Abdullah.
> >
>



-- 
Amoudi, Abdullah.

Re: The solution to the sporadic connection refused exceptions

Posted by Chris Hillery <ch...@hillery.land>.

Hopefully the solution won't involve additional important logic inside
Managix itself?

Ceej
aka Chris Hillery

On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <ba...@gmail.com>
wrote:

> That works but it doesn't feel right doing it this way. I am going to fix
> this one for good.
>
> Cheers,
> Abdullah.
>
> On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu> wrote:
>
> > The way I assured liveness for the YARN installer was to try running "for
> > $x in dataset Metadata.Dataset return $x" via the API. I just polled for
> a
> > reasonable amount of time  (though honestly, thinking about it now, the
> > correct parameter to use for the polling interval is the startup wait
> time
> > in the parameters file :) ). It's not perfect, but it gives less false
> > positives than just checking ps for processes that look like CCs/NCs.
> >
> > - Ian.
> >
> > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <ba...@gmail.com>
> > wrote:
> >
> > > Now that I think about it. Maybe we should provide multiple ways to do
> > > this. A polling mechanism to be used for arbitrary time and a pushing
> > > mechanism on startup.
> > > I am going to start implementation of this and will probably use RMI
> for
> > > this task both ways (CC to InstallerDriver and InstallerDriver to CC).
> > >
> > > Cheers,
> > > Abdullah.
> > >
> > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <bamousaa@gmail.com
> >
> > > wrote:
> > >
> > > > So after further investigation, turned out our startup process just
> > > starts
> > > > the CC and NC processes and then make sure the processes are running
> > and
> > > if
> > > > the processes were found to be running, it returns the state of the
> > > cluster
> > > > to be active and the subsequent test commands can start immediately.
> > > >
> > > > This means that the CC could've started but is not yet ready when we
> > try
> > > > to process the next command. To address this, we need a better way to
> > > tell
> > > > when the startup procedure has completed. we can do this by pushing
> (CC
> > > > informs installer driver when the startup is complete) or polling
> (The
> > > > installer driver needs to actually query the CC for the state of the
> > > > cluster).
> > > >
> > > > I can do either way so let's vote. My vote goes to the pushing
> > mechanism.
> > > > Thoughts?
> > > >
> > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> > bamousaa@gmail.com>
> > > > wrote:
> > > >
> > > >> This solution turned out to be incorrect. Actually, the test cases
> > when
> > > I
> > > >> build after using the join method never fails but running an actual
> > > asterix
> > > >> instance never succeeds which is quite confusing.
> > > >>
> > > >> I also think that the startup script has a major bug where it might
> > > >> returns before the startup is complete. More on this later......
> > > >>
> > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> > bamousaa@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> It is highly unlikely that it is related.
> > > >>>
> > > >>> Cheers,
> > > >>> Abdullah.
> > > >>>
> > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <ch...@gmail.com> wrote:
> > > >>>
> > > >>>> @Abdullah: Is this issue related to
> > > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian and I
> > plan
> > > to
> > > >>>> look into the details on Monday.
> > > >>>>
> > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> > > bamousaa@gmail.com
> > > >>>> >
> > > >>>> wrote:
> > > >>>>
> > > >>>> > About 3-4 days ago, I was working on the addition of the
> > filesystem
> > > >>>> based
> > > >>>> > feed adapter and it didn't take anytime to complete. However,
> > when I
> > > >>>> wanted
> > > >>>> > to build and make sure all tests pass, I kept getting
> > > >>>> ConnectionRefused
> > > >>>> > errors which caused the installer tests to fail every now and
> > then.
> > > >>>> >
> > > >>>> > I knew the new change had nothing to do with this failure, yet,
> I
> > > >>>> couldn't
> > > >>>> > direct my attention away from this bug (It just bothered me so
> > much
> > > >>>> and I
> > > >>>> > knew it needs to be resolved ASAP). After wasting countless
> > hours, I
> > > >>>> was
> > > >>>> > finally able to figure out what was happening :-)
> > > >>>> >
> > > >>>> > In the startup routine, we start three Jetty web servers (Web
> > > >>>> interface
> > > >>>> > server, JSON API server, and Feed server). Sometime ago, we used
> > to
> > > >>>> end the
> > > >>>> > startup call before making sure the server.isStarted() method
> > > returns
> > > >>>> true
> > > >>>> > on all servers. At that time, I introduced the
> > waitUntilServerStarts
> > > >>>> method
> > > >>>> > to make sure we don't return before the servers are ready.
> Turned
> > > >>>> out, that
> > > >>>> > was an incorrect way to handle this (We can blame stackoverflow
> > for
> > > >>>> this
> > > >>>> > one!) and it is not enough that the server isStarted() returns
> > true.
> > > >>>> The
> > > >>>> > correct way to do this is to call the server.join() method after
> > the
> > > >>>> > server.start().
> > > >>>> >
> > > >>>> > See:
> > > >>>> >
> > > >>>>
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > >>>> >
> > > >>>> > This was equally satisfying as it was frustrating and you are
> > > welcome
> > > >>>> for
> > > >>>> > the future time I saved each of you :)
> > > >>>> > --
> > > >>>> > Amoudi, Abdullah.
> > > >>>> >
> > > >>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Amoudi, Abdullah.
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Amoudi, Abdullah.
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Amoudi, Abdullah.
> > > >
> > >
> > >
> > >
> > > --
> > > Amoudi, Abdullah.
> > >
> >
>
>
>
> --
> Amoudi, Abdullah.
>

Re: The solution to the sporadic connection refused exceptions

Posted by abdullah alamoudi <ba...@gmail.com>.

That works but it doesn't feel right doing it this way. I am going to fix
this one for good.

Cheers,
Abdullah.

On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <im...@uci.edu> wrote:

> The way I assured liveness for the YARN installer was to try running "for
> $x in dataset Metadata.Dataset return $x" via the API. I just polled for a
> reasonable amount of time  (though honestly, thinking about it now, the
> correct parameter to use for the polling interval is the startup wait time
> in the parameters file :) ). It's not perfect, but it gives less false
> positives than just checking ps for processes that look like CCs/NCs.
>
> - Ian.
>
> On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <ba...@gmail.com>
> wrote:
>
> > Now that I think about it. Maybe we should provide multiple ways to do
> > this. A polling mechanism to be used for arbitrary time and a pushing
> > mechanism on startup.
> > I am going to start implementation of this and will probably use RMI for
> > this task both ways (CC to InstallerDriver and InstallerDriver to CC).
> >
> > Cheers,
> > Abdullah.
> >
> > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <ba...@gmail.com>
> > wrote:
> >
> > > So after further investigation, turned out our startup process just
> > starts
> > > the CC and NC processes and then make sure the processes are running
> and
> > if
> > > the processes were found to be running, it returns the state of the
> > cluster
> > > to be active and the subsequent test commands can start immediately.
> > >
> > > This means that the CC could've started but is not yet ready when we
> try
> > > to process the next command. To address this, we need a better way to
> > tell
> > > when the startup procedure has completed. we can do this by pushing (CC
> > > informs installer driver when the startup is complete) or polling (The
> > > installer driver needs to actually query the CC for the state of the
> > > cluster).
> > >
> > > I can do either way so let's vote. My vote goes to the pushing
> mechanism.
> > > Thoughts?
> > >
> > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> bamousaa@gmail.com>
> > > wrote:
> > >
> > >> This solution turned out to be incorrect. Actually, the test cases
> when
> > I
> > >> build after using the join method never fails but running an actual
> > asterix
> > >> instance never succeeds which is quite confusing.
> > >>
> > >> I also think that the startup script has a major bug where it might
> > >> returns before the startup is complete. More on this later......
> > >>
> > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> bamousaa@gmail.com>
> > >> wrote:
> > >>
> > >>> It is highly unlikely that it is related.
> > >>>
> > >>> Cheers,
> > >>> Abdullah.
> > >>>
> > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <ch...@gmail.com> wrote:
> > >>>
> > >>>> @Abdullah: Is this issue related to
> > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian and I
> plan
> > to
> > >>>> look into the details on Monday.
> > >>>>
> > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> > bamousaa@gmail.com
> > >>>> >
> > >>>> wrote:
> > >>>>
> > >>>> > About 3-4 days ago, I was working on the addition of the
> filesystem
> > >>>> based
> > >>>> > feed adapter and it didn't take anytime to complete. However,
> when I
> > >>>> wanted
> > >>>> > to build and make sure all tests pass, I kept getting
> > >>>> ConnectionRefused
> > >>>> > errors which caused the installer tests to fail every now and
> then.
> > >>>> >
> > >>>> > I knew the new change had nothing to do with this failure, yet, I
> > >>>> couldn't
> > >>>> > direct my attention away from this bug (It just bothered me so
> much
> > >>>> and I
> > >>>> > knew it needs to be resolved ASAP). After wasting countless
> hours, I
> > >>>> was
> > >>>> > finally able to figure out what was happening :-)
> > >>>> >
> > >>>> > In the startup routine, we start three Jetty web servers (Web
> > >>>> interface
> > >>>> > server, JSON API server, and Feed server). Sometime ago, we used
> to
> > >>>> end the
> > >>>> > startup call before making sure the server.isStarted() method
> > returns
> > >>>> true
> > >>>> > on all servers. At that time, I introduced the
> waitUntilServerStarts
> > >>>> method
> > >>>> > to make sure we don't return before the servers are ready. Turned
> > >>>> out, that
> > >>>> > was an incorrect way to handle this (We can blame stackoverflow
> for
> > >>>> this
> > >>>> > one!) and it is not enough that the server isStarted() returns
> true.
> > >>>> The
> > >>>> > correct way to do this is to call the server.join() method after
> the
> > >>>> > server.start().
> > >>>> >
> > >>>> > See:
> > >>>> >
> > >>>>
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > >>>> >
> > >>>> > This was equally satisfying as it was frustrating and you are
> > welcome
> > >>>> for
> > >>>> > the future time I saved each of you :)
> > >>>> > --
> > >>>> > Amoudi, Abdullah.
> > >>>> >
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Amoudi, Abdullah.
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Amoudi, Abdullah.
> > >>
> > >
> > >
> > >
> > > --
> > > Amoudi, Abdullah.
> > >
> >
> >
> >
> > --
> > Amoudi, Abdullah.
> >
>



-- 
Amoudi, Abdullah.

Re: The solution to the sporadic connection refused exceptions

Posted by Ian Maxon <im...@uci.edu>.

The way I assured liveness for the YARN installer was to try running "for
$x in dataset Metadata.Dataset return $x" via the API. I just polled for a
reasonable amount of time  (though honestly, thinking about it now, the
correct parameter to use for the polling interval is the startup wait time
in the parameters file :) ). It's not perfect, but it gives less false
positives than just checking ps for processes that look like CCs/NCs.

- Ian.

On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <ba...@gmail.com>
wrote:

> Now that I think about it. Maybe we should provide multiple ways to do
> this. A polling mechanism to be used for arbitrary time and a pushing
> mechanism on startup.
> I am going to start implementation of this and will probably use RMI for
> this task both ways (CC to InstallerDriver and InstallerDriver to CC).
>
> Cheers,
> Abdullah.
>
> On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <ba...@gmail.com>
> wrote:
>
> > So after further investigation, turned out our startup process just
> starts
> > the CC and NC processes and then make sure the processes are running and
> if
> > the processes were found to be running, it returns the state of the
> cluster
> > to be active and the subsequent test commands can start immediately.
> >
> > This means that the CC could've started but is not yet ready when we try
> > to process the next command. To address this, we need a better way to
> tell
> > when the startup procedure has completed. we can do this by pushing (CC
> > informs installer driver when the startup is complete) or polling (The
> > installer driver needs to actually query the CC for the state of the
> > cluster).
> >
> > I can do either way so let's vote. My vote goes to the pushing mechanism.
> > Thoughts?
> >
> > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <ba...@gmail.com>
> > wrote:
> >
> >> This solution turned out to be incorrect. Actually, the test cases when
> I
> >> build after using the join method never fails but running an actual
> asterix
> >> instance never succeeds which is quite confusing.
> >>
> >> I also think that the startup script has a major bug where it might
> >> returns before the startup is complete. More on this later......
> >>
> >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <ba...@gmail.com>
> >> wrote:
> >>
> >>> It is highly unlikely that it is related.
> >>>
> >>> Cheers,
> >>> Abdullah.
> >>>
> >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <ch...@gmail.com> wrote:
> >>>
> >>>> @Abdullah: Is this issue related to
> >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian and I plan
> to
> >>>> look into the details on Monday.
> >>>>
> >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> bamousaa@gmail.com
> >>>> >
> >>>> wrote:
> >>>>
> >>>> > About 3-4 days ago, I was working on the addition of the filesystem
> >>>> based
> >>>> > feed adapter and it didn't take anytime to complete. However, when I
> >>>> wanted
> >>>> > to build and make sure all tests pass, I kept getting
> >>>> ConnectionRefused
> >>>> > errors which caused the installer tests to fail every now and then.
> >>>> >
> >>>> > I knew the new change had nothing to do with this failure, yet, I
> >>>> couldn't
> >>>> > direct my attention away from this bug (It just bothered me so much
> >>>> and I
> >>>> > knew it needs to be resolved ASAP). After wasting countless hours, I
> >>>> was
> >>>> > finally able to figure out what was happening :-)
> >>>> >
> >>>> > In the startup routine, we start three Jetty web servers (Web
> >>>> interface
> >>>> > server, JSON API server, and Feed server). Sometime ago, we used to
> >>>> end the
> >>>> > startup call before making sure the server.isStarted() method
> returns
> >>>> true
> >>>> > on all servers. At that time, I introduced the waitUntilServerStarts
> >>>> method
> >>>> > to make sure we don't return before the servers are ready. Turned
> >>>> out, that
> >>>> > was an incorrect way to handle this (We can blame stackoverflow for
> >>>> this
> >>>> > one!) and it is not enough that the server isStarted() returns true.
> >>>> The
> >>>> > correct way to do this is to call the server.join() method after the
> >>>> > server.start().
> >>>> >
> >>>> > See:
> >>>> >
> >>>>
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> >>>> >
> >>>> > This was equally satisfying as it was frustrating and you are
> welcome
> >>>> for
> >>>> > the future time I saved each of you :)
> >>>> > --
> >>>> > Amoudi, Abdullah.
> >>>> >
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Amoudi, Abdullah.
> >>>
> >>
> >>
> >>
> >> --
> >> Amoudi, Abdullah.
> >>
> >
> >
> >
> > --
> > Amoudi, Abdullah.
> >
>
>
>
> --
> Amoudi, Abdullah.
>

Re: The solution to the sporadic connection refused exceptions

Posted by abdullah alamoudi <ba...@gmail.com>.

Now that I think about it. Maybe we should provide multiple ways to do
this. A polling mechanism to be used for arbitrary time and a pushing
mechanism on startup.
I am going to start implementation of this and will probably use RMI for
this task both ways (CC to InstallerDriver and InstallerDriver to CC).

Cheers,
Abdullah.

On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <ba...@gmail.com>
wrote:

> So after further investigation, turned out our startup process just starts
> the CC and NC processes and then make sure the processes are running and if
> the processes were found to be running, it returns the state of the cluster
> to be active and the subsequent test commands can start immediately.
>
> This means that the CC could've started but is not yet ready when we try
> to process the next command. To address this, we need a better way to tell
> when the startup procedure has completed. we can do this by pushing (CC
> informs installer driver when the startup is complete) or polling (The
> installer driver needs to actually query the CC for the state of the
> cluster).
>
> I can do either way so let's vote. My vote goes to the pushing mechanism.
> Thoughts?
>
> On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <ba...@gmail.com>
> wrote:
>
>> This solution turned out to be incorrect. Actually, the test cases when I
>> build after using the join method never fails but running an actual asterix
>> instance never succeeds which is quite confusing.
>>
>> I also think that the startup script has a major bug where it might
>> returns before the startup is complete. More on this later......
>>
>> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <ba...@gmail.com>
>> wrote:
>>
>>> It is highly unlikely that it is related.
>>>
>>> Cheers,
>>> Abdullah.
>>>
>>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <ch...@gmail.com> wrote:
>>>
>>>> @Abdullah: Is this issue related to
>>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian and I plan to
>>>> look into the details on Monday.
>>>>
>>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <bamousaa@gmail.com
>>>> >
>>>> wrote:
>>>>
>>>> > About 3-4 days ago, I was working on the addition of the filesystem
>>>> based
>>>> > feed adapter and it didn't take anytime to complete. However, when I
>>>> wanted
>>>> > to build and make sure all tests pass, I kept getting
>>>> ConnectionRefused
>>>> > errors which caused the installer tests to fail every now and then.
>>>> >
>>>> > I knew the new change had nothing to do with this failure, yet, I
>>>> couldn't
>>>> > direct my attention away from this bug (It just bothered me so much
>>>> and I
>>>> > knew it needs to be resolved ASAP). After wasting countless hours, I
>>>> was
>>>> > finally able to figure out what was happening :-)
>>>> >
>>>> > In the startup routine, we start three Jetty web servers (Web
>>>> interface
>>>> > server, JSON API server, and Feed server). Sometime ago, we used to
>>>> end the
>>>> > startup call before making sure the server.isStarted() method returns
>>>> true
>>>> > on all servers. At that time, I introduced the waitUntilServerStarts
>>>> method
>>>> > to make sure we don't return before the servers are ready. Turned
>>>> out, that
>>>> > was an incorrect way to handle this (We can blame stackoverflow for
>>>> this
>>>> > one!) and it is not enough that the server isStarted() returns true.
>>>> The
>>>> > correct way to do this is to call the server.join() method after the
>>>> > server.start().
>>>> >
>>>> > See:
>>>> >
>>>> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
>>>> >
>>>> > This was equally satisfying as it was frustrating and you are welcome
>>>> for
>>>> > the future time I saved each of you :)
>>>> > --
>>>> > Amoudi, Abdullah.
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Amoudi, Abdullah.
>>>
>>
>>
>>
>> --
>> Amoudi, Abdullah.
>>
>
>
>
> --
> Amoudi, Abdullah.
>



-- 
Amoudi, Abdullah.

Re: The solution to the sporadic connection refused exceptions

Posted by abdullah alamoudi <ba...@gmail.com>.

So after further investigation, turned out our startup process just starts
the CC and NC processes and then make sure the processes are running and if
the processes were found to be running, it returns the state of the cluster
to be active and the subsequent test commands can start immediately.

This means that the CC could've started but is not yet ready when we try to
process the next command. To address this, we need a better way to tell
when the startup procedure has completed. we can do this by pushing (CC
informs installer driver when the startup is complete) or polling (The
installer driver needs to actually query the CC for the state of the
cluster).

I can do either way so let's vote. My vote goes to the pushing mechanism.
Thoughts?

On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <ba...@gmail.com>
wrote:

> This solution turned out to be incorrect. Actually, the test cases when I
> build after using the join method never fails but running an actual asterix
> instance never succeeds which is quite confusing.
>
> I also think that the startup script has a major bug where it might
> returns before the startup is complete. More on this later......
>
> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <ba...@gmail.com>
> wrote:
>
>> It is highly unlikely that it is related.
>>
>> Cheers,
>> Abdullah.
>>
>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <ch...@gmail.com> wrote:
>>
>>> @Abdullah: Is this issue related to
>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian and I plan to
>>> look into the details on Monday.
>>>
>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <ba...@gmail.com>
>>> wrote:
>>>
>>> > About 3-4 days ago, I was working on the addition of the filesystem
>>> based
>>> > feed adapter and it didn't take anytime to complete. However, when I
>>> wanted
>>> > to build and make sure all tests pass, I kept getting ConnectionRefused
>>> > errors which caused the installer tests to fail every now and then.
>>> >
>>> > I knew the new change had nothing to do with this failure, yet, I
>>> couldn't
>>> > direct my attention away from this bug (It just bothered me so much
>>> and I
>>> > knew it needs to be resolved ASAP). After wasting countless hours, I
>>> was
>>> > finally able to figure out what was happening :-)
>>> >
>>> > In the startup routine, we start three Jetty web servers (Web interface
>>> > server, JSON API server, and Feed server). Sometime ago, we used to
>>> end the
>>> > startup call before making sure the server.isStarted() method returns
>>> true
>>> > on all servers. At that time, I introduced the waitUntilServerStarts
>>> method
>>> > to make sure we don't return before the servers are ready. Turned out,
>>> that
>>> > was an incorrect way to handle this (We can blame stackoverflow for
>>> this
>>> > one!) and it is not enough that the server isStarted() returns true.
>>> The
>>> > correct way to do this is to call the server.join() method after the
>>> > server.start().
>>> >
>>> > See:
>>> >
>>> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
>>> >
>>> > This was equally satisfying as it was frustrating and you are welcome
>>> for
>>> > the future time I saved each of you :)
>>> > --
>>> > Amoudi, Abdullah.
>>> >
>>>
>>
>>
>>
>> --
>> Amoudi, Abdullah.
>>
>
>
>
> --
> Amoudi, Abdullah.
>



-- 
Amoudi, Abdullah.

Re: The solution to the sporadic connection refused exceptions

Posted by abdullah alamoudi <ba...@gmail.com>.

This solution turned out to be incorrect. Actually, the test cases when I
build after using the join method never fails but running an actual asterix
instance never succeeds which is quite confusing.

I also think that the startup script has a major bug where it might returns
before the startup is complete. More on this later......

On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <ba...@gmail.com>
wrote:

> It is highly unlikely that it is related.
>
> Cheers,
> Abdullah.
>
> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <ch...@gmail.com> wrote:
>
>> @Abdullah: Is this issue related to
>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian and I plan to
>> look into the details on Monday.
>>
>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <ba...@gmail.com>
>> wrote:
>>
>> > About 3-4 days ago, I was working on the addition of the filesystem
>> based
>> > feed adapter and it didn't take anytime to complete. However, when I
>> wanted
>> > to build and make sure all tests pass, I kept getting ConnectionRefused
>> > errors which caused the installer tests to fail every now and then.
>> >
>> > I knew the new change had nothing to do with this failure, yet, I
>> couldn't
>> > direct my attention away from this bug (It just bothered me so much and
>> I
>> > knew it needs to be resolved ASAP). After wasting countless hours, I was
>> > finally able to figure out what was happening :-)
>> >
>> > In the startup routine, we start three Jetty web servers (Web interface
>> > server, JSON API server, and Feed server). Sometime ago, we used to end
>> the
>> > startup call before making sure the server.isStarted() method returns
>> true
>> > on all servers. At that time, I introduced the waitUntilServerStarts
>> method
>> > to make sure we don't return before the servers are ready. Turned out,
>> that
>> > was an incorrect way to handle this (We can blame stackoverflow for this
>> > one!) and it is not enough that the server isStarted() returns true. The
>> > correct way to do this is to call the server.join() method after the
>> > server.start().
>> >
>> > See:
>> >
>> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
>> >
>> > This was equally satisfying as it was frustrating and you are welcome
>> for
>> > the future time I saved each of you :)
>> > --
>> > Amoudi, Abdullah.
>> >
>>
>
>
>
> --
> Amoudi, Abdullah.
>



-- 
Amoudi, Abdullah.

Re: The solution to the sporadic connection refused exceptions

Posted by abdullah alamoudi <ba...@gmail.com>.

It is highly unlikely that it is related.

Cheers,
Abdullah.

On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <ch...@gmail.com> wrote:

> @Abdullah: Is this issue related to
> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian and I plan to
> look into the details on Monday.
>
> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <ba...@gmail.com>
> wrote:
>
> > About 3-4 days ago, I was working on the addition of the filesystem based
> > feed adapter and it didn't take anytime to complete. However, when I
> wanted
> > to build and make sure all tests pass, I kept getting ConnectionRefused
> > errors which caused the installer tests to fail every now and then.
> >
> > I knew the new change had nothing to do with this failure, yet, I
> couldn't
> > direct my attention away from this bug (It just bothered me so much and I
> > knew it needs to be resolved ASAP). After wasting countless hours, I was
> > finally able to figure out what was happening :-)
> >
> > In the startup routine, we start three Jetty web servers (Web interface
> > server, JSON API server, and Feed server). Sometime ago, we used to end
> the
> > startup call before making sure the server.isStarted() method returns
> true
> > on all servers. At that time, I introduced the waitUntilServerStarts
> method
> > to make sure we don't return before the servers are ready. Turned out,
> that
> > was an incorrect way to handle this (We can blame stackoverflow for this
> > one!) and it is not enough that the server isStarted() returns true. The
> > correct way to do this is to call the server.join() method after the
> > server.start().
> >
> > See:
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> >
> > This was equally satisfying as it was frustrating and you are welcome for
> > the future time I saved each of you :)
> > --
> > Amoudi, Abdullah.
> >
>



-- 
Amoudi, Abdullah.

Re: The solution to the sporadic connection refused exceptions

Posted by Chen Li <ch...@gmail.com>.

@Abdullah: Is this issue related to
https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian and I plan to
look into the details on Monday.

On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <ba...@gmail.com>
wrote:

> About 3-4 days ago, I was working on the addition of the filesystem based
> feed adapter and it didn't take anytime to complete. However, when I wanted
> to build and make sure all tests pass, I kept getting ConnectionRefused
> errors which caused the installer tests to fail every now and then.
>
> I knew the new change had nothing to do with this failure, yet, I couldn't
> direct my attention away from this bug (It just bothered me so much and I
> knew it needs to be resolved ASAP). After wasting countless hours, I was
> finally able to figure out what was happening :-)
>
> In the startup routine, we start three Jetty web servers (Web interface
> server, JSON API server, and Feed server). Sometime ago, we used to end the
> startup call before making sure the server.isStarted() method returns true
> on all servers. At that time, I introduced the waitUntilServerStarts method
> to make sure we don't return before the servers are ready. Turned out, that
> was an incorrect way to handle this (We can blame stackoverflow for this
> one!) and it is not enough that the server isStarted() returns true. The
> correct way to do this is to call the server.join() method after the
> server.start().
>
> See:
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
>
> This was equally satisfying as it was frustrating and you are welcome for
> the future time I saved each of you :)
> --
> Amoudi, Abdullah.
>