You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jclouds.apache.org by Spandan Thakur <sp...@gmail.com> on 2017/02/24 05:33:08 UTC

Asynchronous Implementation in Jcloud

Hi,

We noticed that 3-4 years back jClouds had async implementations which was then deprecated and removed: https://issues.apache.org/jira/browse/JCLOUDS-40 

We were wondering why this decision to deprecate the Async implementation was made?

The reason I ask this question is that we are using a S3 proxy (https://github.com/andrewgaul/s3proxy) which internally uses jClouds. The problem is that the throughput is limited by the number of threads in the server to handle the request. We were planning to do changes to go to async for better performance and then we noticed your jira issue to remove async.

Regards,
Spandan Thakur

Re: Planning Changes for Only Azure

Posted by Andrew Gaul <ga...@apache.org>.
Spandan, I am still waiting for your response to the third paragraph
below.

On Tue, Mar 21, 2017 at 07:52:29PM -0400, Andrew Gaul wrote:
> [Sorry for my delayed responses previously and for the next week; I am
> traveling.]
> 
> We generally follow a "rule of three" for new additions to the portable
> abstractions like BlobStore, where three implementations need to support
> some functionality before we percolate it to the abstraction.  Since
> this functionality does not really require provider support, what
> benefit does skipping the other implementations give?  I am happy to
> review incomplete work to make sure we have consensus on the approach
> but merging should have all functionality.
> 
> I also wonder if a simpler implementation of async might suit your
> needs?  What if provider added a putBlob implementation which returned
> an OutputStream[1] which S3Proxy could push through Jetty to write
> asynchronously?  This would address a popular user request.
> 
> [1] https://issues.apache.org/jira/browse/JCLOUDS-769
> 
> On Thu, Mar 16, 2017 at 05:17:07AM -0000, Spandan Thakur wrote:
> > Hi Andrew,
> > 
> > Thanks for all the feedback :)
> > 
> > I had one final question. So for the real implementation we are planning to start with important methods on the AzureBlobStore (put, get, delete,etc) and then move to other methods in AzureBlobStore. Do note that our focus is only on Azure (as of now) and we are planning to throw unsupported error for other stores.
> > 
> > Is this ok as far as contribution goes? Can we first have a azure related async implementation and throwing unsupported exceptions for other stores?
> > 
> > Regards,
> > Spandan
> > 
> > 
> 
> -- 
> Andrew Gaul
> http://gaul.org/

-- 
Andrew Gaul
http://gaul.org/

Re: Planning Changes for Only Azure

Posted by Andrew Gaul <ga...@apache.org>.
[Sorry for my delayed responses previously and for the next week; I am
traveling.]

We generally follow a "rule of three" for new additions to the portable
abstractions like BlobStore, where three implementations need to support
some functionality before we percolate it to the abstraction.  Since
this functionality does not really require provider support, what
benefit does skipping the other implementations give?  I am happy to
review incomplete work to make sure we have consensus on the approach
but merging should have all functionality.

I also wonder if a simpler implementation of async might suit your
needs?  What if provider added a putBlob implementation which returned
an OutputStream[1] which S3Proxy could push through Jetty to write
asynchronously?  This would address a popular user request.

[1] https://issues.apache.org/jira/browse/JCLOUDS-769

On Thu, Mar 16, 2017 at 05:17:07AM -0000, Spandan Thakur wrote:
> Hi Andrew,
> 
> Thanks for all the feedback :)
> 
> I had one final question. So for the real implementation we are planning to start with important methods on the AzureBlobStore (put, get, delete,etc) and then move to other methods in AzureBlobStore. Do note that our focus is only on Azure (as of now) and we are planning to throw unsupported error for other stores.
> 
> Is this ok as far as contribution goes? Can we first have a azure related async implementation and throwing unsupported exceptions for other stores?
> 
> Regards,
> Spandan
> 
> 

-- 
Andrew Gaul
http://gaul.org/

Planning Changes for Only Azure

Posted by Spandan Thakur <sp...@gmail.com>.
Hi Andrew,

Thanks for all the feedback :)

I had one final question. So for the real implementation we are planning to start with important methods on the AzureBlobStore (put, get, delete,etc) and then move to other methods in AzureBlobStore. Do note that our focus is only on Azure (as of now) and we are planning to throw unsupported error for other stores.

Is this ok as far as contribution goes? Can we first have a azure related async implementation and throwing unsupported exceptions for other stores?

Regards,
Spandan



Re: Changes we plan in S3 proxy

Posted by Andrew Gaul <ga...@apache.org>.
These sound reasonable and I welcome these improvements in S3Proxy.
Note that there is no official relationship between S3Proxy and jclouds,
although the former pushes the latter in some interesting directions.

On Mon, Mar 13, 2017 at 03:25:22AM -0000, Spandan Thakur wrote:
> Hi Andrew,
> 
> I also just wanted to mention regarding how we are planning to modify the S3 proxy.
> 
> 1. We wish to add a property in the config file which will specify is user wants async. (By default it can remain synchronous)
> 2. Using the flag we will call the appropriate jcloud sync or async method.
> 
> Regards,
> Spandan

-- 
Andrew Gaul
http://gaul.org/

Changes we plan in S3 proxy

Posted by Spandan Thakur <sp...@gmail.com>.
Hi Andrew,

I also just wanted to mention regarding how we are planning to modify the S3 proxy.

1. We wish to add a property in the config file which will specify is user wants async. (By default it can remain synchronous)
2. Using the flag we will call the appropriate jcloud sync or async method.

Regards,
Spandan

Re: POC of the Async Implementation

Posted by Ignasi Barrera <na...@apache.org>.
This looks indeed promising!

I'd like to see the feature properly aligned in the drivers layer. OkHttp
also provides a way to perform async requests [1] and we should also
explore including it in the changeset for this feature.


[1] https://github.com/square/okhttp/wiki/Recipes

On Mar 16, 2017 1:06 AM, "Andrew Gaul" <ga...@apache.org> wrote:

Spandan, thanks for creating this proof of concept!  Generally this idea
shows promise and should give value to the wider jclouds community.  We
might want to support get blob as well as put blob for symmetry,
although this raises more questions.

In terms of structuring, you must isolate all the Apache HTTP client
changes in drivers/apachehc.  Only users who configure the apachehc
driver can use async otherwise jclouds should throw an exception.  This
means fields like BaseHttpCommandExecutorService.httpRequestExecutor
migrate to ApacheHCHttpCommandExecutorService.  I am a bit confused how
you track completion of these Futures since you return an empty
CompletableFuture[1].

I added some comments in your tree mostly about structuring and some
specific questions.  The commit has style, exception handling, and
logging issues we can deal with when you submit the actual pull request.

[1] https://github.com/SpandanThakur/jclouds/commit/
00c378bd07aea07f3cebf4446eae373cdcde1e37#diff-a7d2c9d5b9ddcf7805dd36326f85dc
4bR102

On Wed, Mar 08, 2017 at 11:58:59AM -0000, Spandan Thakur wrote:
> Hi Andrew,
>
> I done a quick Proof of Concept in to add a Azure BlobStore putBlob Async
method:
>
>  https://github.com/SpandanThakur/jclouds/tree/AsyncPOC
>
> Important Classes to look at are:
> AzureBlobClient.java
> InvokeHttpMethod.java
> HttpCommandExecutorService.java (Have a look at this class to see how
call back is assigned)
> BaseHttpCommandExecutorService.java
>
> In S3 proxy we would suspend the jetty thread, complete the future and
then continue the thread.
>
> Though my example moves to java 8, I think we should be able to use guava
and stay on current java version. Also we have not yet decided how many
methods we want to make async.
>
> As far as benefits goes, we saw that the S3 proxy throughput is throttled
by the number of threads in the jetty server. When load is greater than the
jetty threads performance degrades. Where as when we move a async
implementation we are able to scale to any load (till CPU becomes a bottle
neck) using even one jetty thread. Even when the load is less than the
number of jetty threads the async gives slightly better throughput and
response times.
>
> I also wanted to know if you guys have any standard performance tests
that you guys run? I could try running them on the POC.
>
> Please let me know if you guys feel that my approach is correct and if
this will benefit jClouds overall.
>
> Regards,
> Spandan
>

--
Andrew Gaul
http://gaul.org/

Re: POC of the Async Implementation

Posted by Andrew Gaul <ga...@apache.org>.
Spandan, thanks for creating this proof of concept!  Generally this idea
shows promise and should give value to the wider jclouds community.  We
might want to support get blob as well as put blob for symmetry,
although this raises more questions.

In terms of structuring, you must isolate all the Apache HTTP client
changes in drivers/apachehc.  Only users who configure the apachehc
driver can use async otherwise jclouds should throw an exception.  This
means fields like BaseHttpCommandExecutorService.httpRequestExecutor
migrate to ApacheHCHttpCommandExecutorService.  I am a bit confused how
you track completion of these Futures since you return an empty
CompletableFuture[1].

I added some comments in your tree mostly about structuring and some
specific questions.  The commit has style, exception handling, and
logging issues we can deal with when you submit the actual pull request.

[1] https://github.com/SpandanThakur/jclouds/commit/00c378bd07aea07f3cebf4446eae373cdcde1e37#diff-a7d2c9d5b9ddcf7805dd36326f85dc4bR102

On Wed, Mar 08, 2017 at 11:58:59AM -0000, Spandan Thakur wrote:
> Hi Andrew,
> 
> I done a quick Proof of Concept in to add a Azure BlobStore putBlob Async method:
> 
>  https://github.com/SpandanThakur/jclouds/tree/AsyncPOC
> 
> Important Classes to look at are:
> AzureBlobClient.java
> InvokeHttpMethod.java
> HttpCommandExecutorService.java (Have a look at this class to see how call back is assigned)
> BaseHttpCommandExecutorService.java
> 
> In S3 proxy we would suspend the jetty thread, complete the future and then continue the thread.
> 
> Though my example moves to java 8, I think we should be able to use guava and stay on current java version. Also we have not yet decided how many methods we want to make async.
> 
> As far as benefits goes, we saw that the S3 proxy throughput is throttled by the number of threads in the jetty server. When load is greater than the jetty threads performance degrades. Where as when we move a async implementation we are able to scale to any load (till CPU becomes a bottle neck) using even one jetty thread. Even when the load is less than the number of jetty threads the async gives slightly better throughput and response times.
> 
> I also wanted to know if you guys have any standard performance tests that you guys run? I could try running them on the POC.
> 
> Please let me know if you guys feel that my approach is correct and if this will benefit jClouds overall.
> 
> Regards,
> Spandan
> 

-- 
Andrew Gaul
http://gaul.org/

POC of the Async Implementation

Posted by Spandan Thakur <sp...@gmail.com>.
Hi Andrew,

I done a quick Proof of Concept in to add a Azure BlobStore putBlob Async method:

 https://github.com/SpandanThakur/jclouds/tree/AsyncPOC

Important Classes to look at are:
AzureBlobClient.java
InvokeHttpMethod.java
HttpCommandExecutorService.java (Have a look at this class to see how call back is assigned)
BaseHttpCommandExecutorService.java

In S3 proxy we would suspend the jetty thread, complete the future and then continue the thread.

Though my example moves to java 8, I think we should be able to use guava and stay on current java version. Also we have not yet decided how many methods we want to make async.

As far as benefits goes, we saw that the S3 proxy throughput is throttled by the number of threads in the jetty server. When load is greater than the jetty threads performance degrades. Where as when we move a async implementation we are able to scale to any load (till CPU becomes a bottle neck) using even one jetty thread. Even when the load is less than the number of jetty threads the async gives slightly better throughput and response times.

I also wanted to know if you guys have any standard performance tests that you guys run? I could try running them on the POC.

Please let me know if you guys feel that my approach is correct and if this will benefit jClouds overall.

Regards,
Spandan


Re: How we plan to implement Async Behaviour

Posted by Andrew Gaul <ga...@apache.org>.
Can you share what your goals of an asynchronous client would be, e.g.,
does your application have some scalability limit?  Does profiling or
measurement reveal that the new client would address this?

Can you provide details on 4?  I briefly looked at the Apache
HttpAsyncClient -- from the examples[1] it appears to execute
onCharReceived callbacks for HTTP entities and onResponseReceived for
headers.  What internally runs select/epoll and issues the callbacks?

Returning to 1, can you provide this functionality without requiring
Java 8?  HttpAsyncClient only requires Java 6 and Guava provides
ListenableFuture, analogous to Java 8 CompletableFuture.  Requiring a
newer Java version will impact or block a subset of jclouds users.

For 3, will we make all methods async, or just the subset of putBlob and
getBlob methods?  If the latter, can you accomplish your S3Proxy goal
with something simpler?  For example, getBlob returns a (wrapped)
InputStream; can you hook this up to some reactor implementation to copy
it to the S3Proxy OutputStream?  Unfortunately this does not help with
putBlob and jclouds needs some more work for the equivalent.

Overall proposal would be useful to jclouds and its users.  This is a
big undertaking and it might help to hack out a proof of concept for
more discussion before embarking on the whole implementation.

[1] https://hc.apache.org/httpcomponents-asyncclient-dev/examples.html

On Thu, Mar 02, 2017 at 09:06:40AM -0000, Spandan Thakur wrote:
> Hi,
> 
> We are thinking and planning to add proper multiplexing Async behaviour in jClouds. We plan to use the apache async httpClient library to achieve the same.
> 
> We are still thinking about how to implement this in jClouds and s3 proxy but below are are initial thoughts.
> 1. Move to Java 1.8 so we can start using CompletableFutures. (Let us know if you feel there are any issue in upgrading JAVA version)
> 2. Add a new annotation like @async which can be used in the client interfaces like AzureBlobClient.java, S3Client.java.
> 3. Add new methods for async in BlobStore and implement then in the specific stores. The methods will return completableFutures.
> 4. In the core http package add the apache async httpClient based class. Which will execute async http requests and return completable futures. (We are not sure if we can just add a implementation of BaseHttpCommandExecutorService.java for the same)
> 5. Make some changes in the DelegatesToInvocationFunction.java to see if async annotation is present and then using the Async client http executor.
> 6. The S3 proxy use jetty continuations to suspend the thread and then continue after the async method is done (using thenAccept in the Completable Future).
> 
> Please do let me know your thoughts on this. Do you feel like this feature would be something useful for jClouds? Do you think we are on the right track on the way we plan to implement it? Any suggestions from your side?
> 
> Regards,
> Spandan Thakur

-- 
Andrew Gaul
http://gaul.org/

How we plan to implement Async Behaviour

Posted by Spandan Thakur <sp...@gmail.com>.
Hi,

We are thinking and planning to add proper multiplexing Async behaviour in jClouds. We plan to use the apache async httpClient library to achieve the same.

We are still thinking about how to implement this in jClouds and s3 proxy but below are are initial thoughts.
1. Move to Java 1.8 so we can start using CompletableFutures. (Let us know if you feel there are any issue in upgrading JAVA version)
2. Add a new annotation like @async which can be used in the client interfaces like AzureBlobClient.java, S3Client.java.
3. Add new methods for async in BlobStore and implement then in the specific stores. The methods will return completableFutures.
4. In the core http package add the apache async httpClient based class. Which will execute async http requests and return completable futures. (We are not sure if we can just add a implementation of BaseHttpCommandExecutorService.java for the same)
5. Make some changes in the DelegatesToInvocationFunction.java to see if async annotation is present and then using the Async client http executor.
6. The S3 proxy use jetty continuations to suspend the thread and then continue after the async method is done (using thenAccept in the Completable Future).

Please do let me know your thoughts on this. Do you feel like this feature would be something useful for jClouds? Do you think we are on the right track on the way we plan to implement it? Any suggestions from your side?

Regards,
Spandan Thakur

Re: Asynchronous Implementation in Jcloud

Posted by Ignasi Barrera <na...@apache.org>.
Hi,

The async implementation produced some unnecessary overhead and
complexity that was actually out of the scope of jclouds. Applications
that perform async tasks should control the executors, thread pools,
etc, in order to properly setup them according to their concurrency
needs.

The jclouds async interfaces used an internal jclouds executor that
exposed just a bunch of configuration options. Instead of internally
dealing with custom executors, their lifecycle, and exposing just some
configuration, we think it is better to leave them out of the scope of
jclouds and let any user use their own application ones.


I.

On 24 February 2017 at 06:33, Spandan Thakur
<sp...@gmail.com> wrote:
> Hi,
>
> We noticed that 3-4 years back jClouds had async implementations which was then deprecated and removed: https://issues.apache.org/jira/browse/JCLOUDS-40
>
> We were wondering why this decision to deprecate the Async implementation was made?
>
> The reason I ask this question is that we are using a S3 proxy (https://github.com/andrewgaul/s3proxy) which internally uses jClouds. The problem is that the throughput is limited by the number of threads in the server to handle the request. We were planning to do changes to go to async for better performance and then we noticed your jira issue to remove async.
>
> Regards,
> Spandan Thakur

Re: Asynchronous Implementation in Jcloud

Posted by Andrew Gaul <ga...@apache.org>.
I agree with everything Ignasi said, but want to emphasize that the
removed jclouds async code did not do what callers might expect, i.e.,
multiplexing multiple sockets onto a single thread.  Instead it
internally had a thread pool with a 1:1 thread to socket correspondence.
It was no different than users submitting a Callable to their own
ExecutorService.

I wonder how you would change S3Proxy to use m:n threading.  Please
raise an issue on the S3Proxy GitHub with your ideas!

On Fri, Feb 24, 2017 at 05:33:08AM -0000, Spandan Thakur wrote:
> Hi,
> 
> We noticed that 3-4 years back jClouds had async implementations which was then deprecated and removed: https://issues.apache.org/jira/browse/JCLOUDS-40 
> 
> We were wondering why this decision to deprecate the Async implementation was made?
> 
> The reason I ask this question is that we are using a S3 proxy (https://github.com/andrewgaul/s3proxy) which internally uses jClouds. The problem is that the throughput is limited by the number of threads in the server to handle the request. We were planning to do changes to go to async for better performance and then we noticed your jira issue to remove async.
> 
> Regards,
> Spandan Thakur

-- 
Andrew Gaul
http://gaul.org/