You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@sling.apache.org by Andreea Miruna Moise <sa...@adobe.com.INVALID> on 2020/08/06 10:01:41 UTC

Sling GraphQL

Hi,

What would be the recommended way of approaching caching in case of GraphQL? More specifically would it make sense to have an application caching layer at Sling level that would support private caching of the ExecutionResult? The main use case would be not to interrogate the server every time it makes the same query keeping in mind that the requests are authenticated.

Regards,
Andreea

Re: Sling GraphQL

Posted by Radu Cotescu <ra...@apache.org>.

Hi Bertrand,

> On 10 Aug 2020, at 11:14, Bertrand Delacretaz <bd...@apache.org> wrote:
> 
> So a complete scenario could be like
> 
> 1. Client wants to run a query with digest cf81d4 (computed according
> to a definition that we publish)
> 2. Client GETs /prepared/cf81d4.json and receives a 404 as the query
> store is empty
> 3. Client POSTs the query to /prepared and receives a 201 created
> /prepared/cf81d4.json
> 4. Client GETs /prepared/cf81d4.json and receives the results along
> with ETag and Cache-Control headers which allow a front-end HTTP cache
> to store it
> 5. Later, the same or another client GETs /prepared/cf81d4.json
> through the HTTP cache, which based on headers received at 4. serves
> results from cache and does not touch Sling
> 6. Later, the same or another client GETs /prepared/cf81d4.json, the
> results caching period expired but Sling still has that query stored,
> so it runs the query again and returns a new ETag
> 7. If a request comes later with an expired cache and the cf81d4 query
> has been purged, Sling returns a 404 and the client has to start over
> a 2.

Ok. I thought you had proposed that we’d purge the query once we consider the results to be stale, but now I see what you mean. The prepared query can very well outlive the results.

I think it’s a good design to rely on existing caching layers and leave the path open for application-level caching, if we’d ever need that.

Thanks,
Radu

Re: Sling GraphQL

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi Andreea,

On Mon, Aug 10, 2020 at 3:06 PM Andreea Miruna Moise
<sa...@adobe.com.invalid> wrote:
> ...The queries run in authenticated environment and if the CDN does not support private caching
> then I guess it has to be supported by sling right?...

I would just mark such responses with "Cache-Control:private" to allow
clients to cache them privately according to other cache headers.
Considering the they are responses to plain GET requests, browsers
should treat them in the same as non-GraphQL web content.

-Bertrand

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control

Re: Sling GraphQL

Posted by Andreea Miruna Moise <sa...@adobe.com.INVALID>.

Hi Radu and Bertrand,

The only thing left here is the private caching. The queries run in authenticated environment and if the CDN does not support private caching then I guess it has to be supported by sling right?

Regards,
Andreea

On 10/08/2020, 12:15, "Bertrand Delacretaz" <bd...@apache.org> wrote:

    Hi Radu,
    
    On Mon, Aug 10, 2020 at 10:54 AM Radu Cotescu <ra...@apache.org> wrote:
    > > On 7 Aug 2020, at 16:18, Bertrand Delacretaz <bd...@apache.org> wrote:
    ...
    > > 5) There's no guarantee on how long the prepared queries are stored, a
    > > client that gets a 404 on a prepared query request must be prepared to
    > > use the default POST request method or store the prepared query again
    >
    > Instead of replying with a 404 and asking the client to do another POST + GET, couldn’t we just return the
    > updated results with updated cache headers?
    > Judging by how you describe the algorithm, the Sling server would still have to maintain a cache internally...
    
    I wasn't planning to store the query results on the Sling side, just
    the query itself and enough information to be able to process
    conditional HTTP requests.
    
    So a complete scenario could be like
    
    1. Client wants to run a query with digest cf81d4 (computed according
    to a definition that we publish)
    2. Client GETs /prepared/cf81d4.json and receives a 404 as the query
    store is empty
    3. Client POSTs the query to /prepared and receives a 201 created
    /prepared/cf81d4.json
    4. Client GETs /prepared/cf81d4.json and receives the results along
    with ETag and Cache-Control headers which allow a front-end HTTP cache
    to store it
    5. Later, the same or another client GETs /prepared/cf81d4.json
    through the HTTP cache, which based on headers received at 4. serves
    results from cache and does not touch Sling
    6. Later, the same or another client GETs /prepared/cf81d4.json, the
    results caching period expired but Sling still has that query stored,
    so it runs the query again and returns a new ETag
    7. If a request comes later with an expired cache and the cf81d4 query
    has been purged, Sling returns a 404 and the client has to start over
    a 2.
    
    With this scenario (assuming it works, your sanity check of that is
    welcome) Sling just needs to store the queries, ETag and cache
    expiration time, which I suppose is much less than query results?
    
    -Bertrand

Re: Sling GraphQL

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi Radu,

On Mon, Aug 10, 2020 at 10:54 AM Radu Cotescu <ra...@apache.org> wrote:
> > On 7 Aug 2020, at 16:18, Bertrand Delacretaz <bd...@apache.org> wrote:
...
> > 5) There's no guarantee on how long the prepared queries are stored, a
> > client that gets a 404 on a prepared query request must be prepared to
> > use the default POST request method or store the prepared query again
>
> Instead of replying with a 404 and asking the client to do another POST + GET, couldn’t we just return the
> updated results with updated cache headers?
> Judging by how you describe the algorithm, the Sling server would still have to maintain a cache internally...

I wasn't planning to store the query results on the Sling side, just
the query itself and enough information to be able to process
conditional HTTP requests.

So a complete scenario could be like

1. Client wants to run a query with digest cf81d4 (computed according
to a definition that we publish)
2. Client GETs /prepared/cf81d4.json and receives a 404 as the query
store is empty
3. Client POSTs the query to /prepared and receives a 201 created
/prepared/cf81d4.json
4. Client GETs /prepared/cf81d4.json and receives the results along
with ETag and Cache-Control headers which allow a front-end HTTP cache
to store it
5. Later, the same or another client GETs /prepared/cf81d4.json
through the HTTP cache, which based on headers received at 4. serves
results from cache and does not touch Sling
6. Later, the same or another client GETs /prepared/cf81d4.json, the
results caching period expired but Sling still has that query stored,
so it runs the query again and returns a new ETag
7. If a request comes later with an expired cache and the cf81d4 query
has been purged, Sling returns a 404 and the client has to start over
a 2.

With this scenario (assuming it works, your sanity check of that is
welcome) Sling just needs to store the queries, ETag and cache
expiration time, which I suppose is much less than query results?

-Bertrand

Re: Sling GraphQL

Posted by Radu Cotescu <ra...@apache.org>.

Hi Bertrand,

> On 7 Aug 2020, at 16:18, Bertrand Delacretaz <bd...@apache.org> wrote:
> 
> Here's what I suggest:
> 
> 1) GraphQL queries executed via POST are not cached bySling
> 
> 2) Queries can be prepared in advance by POSTing the query text to
> Sling, which returns a "201 created" status with a URL that contains
> the query's digest, like cf81d4
> 
> 3) Clients run such prepared queries by making GET requests to URLs
> like /graphqlservlet/prepared/cf81d4.json
> 
> 4) The responses to such prepared queries requests contain useful HTTP
> Cache headers, which might be set from hints supplied by data fetchers
> with configurable defaults.
> 
> 5) There's no guarantee on how long the prepared queries are stored, a
> client that gets a 404 on a prepared query request must be prepared to
> use the default POST request method or store the prepared query again

Instead of replying with a 404 and asking the client to do another POST + GET, couldn’t we just return the updated results with updated cache headers? Judging by how you describe the algorithm, the Sling server would still have to maintain a cache internally. And we can return a 404 only when we purge the query from Sling’s internal cache, which doesn’t have to coincide with the moment the query’s results have to be updated. In my view the query’s endpoint is the resource, whereas what we return would be the resource’s version. This way we can reply with ETag and Cache-Control headers.

Thanks,
Radu

Re: Sling GraphQL

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi,

On Fri, Aug 7, 2020 at 4:48 PM Andreea Miruna Moise
<sa...@adobe.com.invalid> wrote:
> >    3) Clients run such prepared queries by making GET requests to URLs
>     like /graphqlservlet/prepared/cf81d4.json
> To be able to do this a different endpoint would be needed instead of
> org.apache.sling.graphql.core.servlet. GraphQLServlet right?...

We might extend that servlet to support this behavior, I think that's
an implementation detail.

> >    4) The responses to such prepared queries requests contain useful HTTP
>     Cache headers, which might be set from hints supplied by data fetchers
>     with configurable defaults.
> This means that sling would have to compute the headers based on the cache hints...

Yes, and I think a module or utility class that does this can be
useful in other parts of Sling.

>
> On the other hand I was thinking that the KeyValueCache service can be implemented at
> the sling-graphql-adapter level and sling-graphql-core would only provide the interface. ..

We could also provide extension points to do that, without providing
an actual cache implementation, see below.

> It just seemed a simpler solution...

Apart from small internal things like caching servlet resolution I
don't think we've been doing that kind of caching in Sling so far so I
prefer avoiding it if possible. Delegating caching to the HTTP layer
is much more consistent with what we're doing already.

-Bertrand

Re: Sling GraphQL

Posted by Andreea Miruna Moise <sa...@adobe.com.INVALID>.

>    3) Clients run such prepared queries by making GET requests to URLs
    like /graphqlservlet/prepared/cf81d4.json
To be able to do this a different endpoint would be needed instead of org.apache.sling.graphql.core.servlet. GraphQLServlet right?

>    4) The responses to such prepared queries requests contain useful HTTP
    Cache headers, which might be set from hints supplied by data fetchers
    with configurable defaults.
This means that sling would have to compute the headers based on the cache hints.

On the other hand I was thinking that the KeyValueCache service can be implemented at the sling-graphql-adapter level and sling-graphql-core would only provide the interface. It just seemed a simpler solution.

Regards,
Andreea

On 07/08/2020, 17:18, "Bertrand Delacretaz" <bd...@apache.org> wrote:

    Hi Andreea,
    
    On Fri, Aug 7, 2020 at 12:41 PM Andreea Miruna Moise
    <sa...@adobe.com.invalid> wrote:
    > ...1. In case we provide hooks for SlingDataFetchers we will end up with fine-grained cache hints...
    > But the major limitation is that this can be used only in case of GET requests....
    
    > ...2. Now if we think of using POST requests that are not cached by CDN the only option is application
    > level caching...
    
    I've done a bit more research on GraphQL caching and in particular
    noticed the "Caching & GraphQL: Setting the Story Straight" talk by
    Marc-André Giroux [1] and I agree very much with his view.
    
    My summary of that is:
    -Like any flexible API, GraphQL is harder to cache than requests which
    have a narrower scope
    -There's no built-in way to use HTTP caching as GraphQL says nothing
    about which request/response protocol is used
    -Moving to a more HTTP-friendly way to express requests allows using
    HTTP caching.
    
    As you say, POST requests are usually not cached, and using GET is
    problematic due to the GraphQL query size.
    
    Which means that the client will need to do something beyond plain
    GraphQL requests, for caching to work.
    
    However, Sling-based systems usually have an HTTP cache in front, so
    if we can take advantage of that it avoids having to reinvent and
    maintain something else.
    
    I've also studied Apollo's "Automatic Persisted Queries" [2] which
    suggest a client/server protocol extension to cope with this. It's not
    as automatic as they claim IMHO but I like the general idea and I
    think we could do something similar for Sling while remaining within
    our usual HTTP best practices.
    
    Here's what I suggest:
    
    1) GraphQL queries executed via POST are not cached bySling
    
    2) Queries can be prepared in advance by POSTing the query text to
    Sling, which returns a "201 created" status with a URL that contains
    the query's digest, like cf81d4
    
    3) Clients run such prepared queries by making GET requests to URLs
    like /graphqlservlet/prepared/cf81d4.json
    
    4) The responses to such prepared queries requests contain useful HTTP
    Cache headers, which might be set from hints supplied by data fetchers
    with configurable defaults.
    
    5) There's no guarantee on how long the prepared queries are stored, a
    client that gets a 404 on a prepared query request must be prepared to
    use the default POST request method or store the prepared query again
    
    I don't think we can achieve efficient caching without some
    collaboration with the client, and with this the requirements on the
    client are pretty simple to fulfill.
    
    Would that work for your use cases?
    
    -Bertrand
    
    [1] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DCV3puKM_G14&amp;data=02%7C01%7Csandru%40adobe.com%7C17470ed70bb843dd6ac508d83adcc578%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637324067161406786&amp;sdata=wbx768cmx1qkOicioeJErI1xhPZHQPR5PZ%2FaHO8Kf%2BU%3D&amp;reserved=0
    [2] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.apollographql.com%2Fblog%2Fimprove-graphql-performance-with-automatic-persisted-queries-c31d27b8e6ea%2F&amp;data=02%7C01%7Csandru%40adobe.com%7C17470ed70bb843dd6ac508d83adcc578%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637324067161406786&amp;sdata=ulttaOpdURR8M6mSNSHDb8AbNnrRkoY1Y5oGcqFd5%2Fg%3D&amp;reserved=0

Re: Sling GraphQL

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi Andreea,

On Fri, Aug 7, 2020 at 12:41 PM Andreea Miruna Moise
<sa...@adobe.com.invalid> wrote:
> ...1. In case we provide hooks for SlingDataFetchers we will end up with fine-grained cache hints...
> But the major limitation is that this can be used only in case of GET requests....

> ...2. Now if we think of using POST requests that are not cached by CDN the only option is application
> level caching...

I've done a bit more research on GraphQL caching and in particular
noticed the "Caching & GraphQL: Setting the Story Straight" talk by
Marc-André Giroux [1] and I agree very much with his view.

My summary of that is:
-Like any flexible API, GraphQL is harder to cache than requests which
have a narrower scope
-There's no built-in way to use HTTP caching as GraphQL says nothing
about which request/response protocol is used
-Moving to a more HTTP-friendly way to express requests allows using
HTTP caching.

As you say, POST requests are usually not cached, and using GET is
problematic due to the GraphQL query size.

Which means that the client will need to do something beyond plain
GraphQL requests, for caching to work.

However, Sling-based systems usually have an HTTP cache in front, so
if we can take advantage of that it avoids having to reinvent and
maintain something else.

I've also studied Apollo's "Automatic Persisted Queries" [2] which
suggest a client/server protocol extension to cope with this. It's not
as automatic as they claim IMHO but I like the general idea and I
think we could do something similar for Sling while remaining within
our usual HTTP best practices.

Here's what I suggest:

1) GraphQL queries executed via POST are not cached bySling

2) Queries can be prepared in advance by POSTing the query text to
Sling, which returns a "201 created" status with a URL that contains
the query's digest, like cf81d4

3) Clients run such prepared queries by making GET requests to URLs
like /graphqlservlet/prepared/cf81d4.json

4) The responses to such prepared queries requests contain useful HTTP
Cache headers, which might be set from hints supplied by data fetchers
with configurable defaults.

5) There's no guarantee on how long the prepared queries are stored, a
client that gets a 404 on a prepared query request must be prepared to
use the default POST request method or store the prepared query again

I don't think we can achieve efficient caching without some
collaboration with the client, and with this the requirements on the
client are pretty simple to fulfill.

Would that work for your use cases?

-Bertrand

[1] https://www.youtube.com/watch?v=CV3puKM_G14
[2] https://www.apollographql.com/blog/improve-graphql-performance-with-automatic-persisted-queries-c31d27b8e6ea/

Re: Sling GraphQL

Posted by Andreea Miruna Moise <sa...@adobe.com.INVALID>.

Hi Bertrand,

I'm mostly referring to server-side queries. Here are my thoughts:
1. In case we provide hooks for SlingDataFetchers we will end up with fine-grained cache hints (for eg https://github.com/graphql-java/graphql-java/blob/c40fc1d50e91f5584cd8995b46c464d3692f20b3/src/main/java/graphql/cachecontrol/CacheControl.java could be used). After getting these hints into the ExecutionResult we would need a mechanism that combines them into an overall cache policy for the response. But the major limitation is that this can be used only in case of GET requests. Now the downside in this is that many CDNs and caching proxies only cache GET requests (not POST requests) and may have a limit on the size of a GET URL. So if we hit the limit it won't work and unfortunately GraphQL queries can be very long. Also I think that building such a mechanism is complicated.

2. Now if we think of using POST requests that are not cached by CDN the only option is application level caching. And the extension point that can be added to sling graphql is a KeyValueCache interface that will work as a service and will be implemented by the client so that the client has control on the cache implementation as you said. If the key is an Object it's even better because it helps setting the scope of the cache to PRIVATE or PUBLIC. One client can use a combination of UserId + locale + query or sessionId + query or sessionId + excutionInput for the key. And for using the cache we could update https://github.com/apache/sling-org-apache-sling-graphql-core/blob/master/src/main/java/org/apache/sling/graphql/core/engine/GraphQLResourceQuery.java#L101 to check the cache first and if not execute the query. But anyway in case of POST requests I don't see the need for caching hints.

Andreea

[0] https://www.apollographql.com/docs/apollo-server/performance/caching/

On 06/08/2020, 18:04, "Bertrand Delacretaz" <bd...@apache.org> wrote:

    [Hi Andreea,

    On Thu, Aug 6, 2020 at 12:02 PM Andreea Miruna Moise
    <sa...@adobe.com.invalid> wrote:
    > What would be the recommended way of approaching caching in case of GraphQL?...

    I haven't given much thought to that so far, and reading [1] [2] and
    [3] it looks like the best way to cache GraphQL query responses is to
    run those queries server-side, driven by GET requests and use
    traditional HTTP caching.

    The GraphQL core module does support server-side queries, without
    caching HTTP headers so far but that could be added.

    But I suppose you are more looking at client-driven GraphQL queries.
    [3] mentions an interesting (if a bit hacky) way of moving queries to
    the server-side to make them easier to cache. I suppose that only
    works if you control the client and that's probably not a standard.

    > ...would it make sense to have an application caching layer at Sling level that would support private
    > caching of the ExecutionResult?..

    What we might do is provide hooks in the GraphQL Core for
    SlingDataFetchers to supply caching hints, along with an extension
    point where that caching can happen - would that work for your use
    cases?

    If the answer is yes, suggestions on how those hooks can look are very welcome!

    Although we do have caching services in Sling [4] I'm not sure if they
    are in active use at the moment, their code doesn't seem to have been
    touched in a long time. But if we provide somewhat abstract hooks,
    people can use whatever caching mechanism they want.

    -Bertrand

    [1] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.apollographql.com%2Fblog%2Fgraphql-caching-the-elephant-in-the-room-11a3df0c23ad%2F&amp;data=02%7C01%7Csandru%40adobe.com%7C10564da5b3a94cb5c2ac08d83a19ffa3%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637323230626053305&amp;sdata=Eh%2Ba8ZBlOrJEv%2BGe4MfmT0JmE3AauMEjfGAe8fV4HhM%3D&amp;reserved=0
    [2] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.apollographql.com%2Fdocs%2Fapollo-server%2Fperformance%2Fcaching%2F%23adding-cache-hints-statically-in-your-schema&amp;data=02%7C01%7Csandru%40adobe.com%7C10564da5b3a94cb5c2ac08d83a19ffa3%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637323230626053305&amp;sdata=LKF9VSqnyXk3jcNqmvQYVd2u6rMWfgwvnrnxq3ZOXWw%3D&amp;reserved=0
    [3] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapollographql%2Fapollo-link-persisted-queries&amp;data=02%7C01%7Csandru%40adobe.com%7C10564da5b3a94cb5c2ac08d83a19ffa3%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637323230626053305&amp;sdata=BDqMOoCm3MeaontiJVsCf9oSZRhQu7m17seFeaYc%2Fcw%3D&amp;reserved=0
    [4] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsling.apache.org%2Fdocumentation%2Fbundles%2Fcaching-services.html&amp;data=02%7C01%7Csandru%40adobe.com%7C10564da5b3a94cb5c2ac08d83a19ffa3%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637323230626053305&amp;sdata=hq9GDy%2FYWFbidPxB%2FhD1UIBsGJ1Ya1AY969jc1KkFQ4%3D&amp;reserved=0

Re: Sling GraphQL

Posted by Bertrand Delacretaz <bd...@apache.org>.

[Hi Andreea,

On Thu, Aug 6, 2020 at 12:02 PM Andreea Miruna Moise
<sa...@adobe.com.invalid> wrote:
> What would be the recommended way of approaching caching in case of GraphQL?...

I haven't given much thought to that so far, and reading [1] [2] and
[3] it looks like the best way to cache GraphQL query responses is to
run those queries server-side, driven by GET requests and use
traditional HTTP caching.

The GraphQL core module does support server-side queries, without
caching HTTP headers so far but that could be added.

But I suppose you are more looking at client-driven GraphQL queries.
[3] mentions an interesting (if a bit hacky) way of moving queries to
the server-side to make them easier to cache. I suppose that only
works if you control the client and that's probably not a standard.

> ...would it make sense to have an application caching layer at Sling level that would support private
> caching of the ExecutionResult?..

What we might do is provide hooks in the GraphQL Core for
SlingDataFetchers to supply caching hints, along with an extension
point where that caching can happen - would that work for your use
cases?

If the answer is yes, suggestions on how those hooks can look are very welcome!

Although we do have caching services in Sling [4] I'm not sure if they
are in active use at the moment, their code doesn't seem to have been
touched in a long time. But if we provide somewhat abstract hooks,
people can use whatever caching mechanism they want.

-Bertrand

[1] https://www.apollographql.com/blog/graphql-caching-the-elephant-in-the-room-11a3df0c23ad/
[2] https://www.apollographql.com/docs/apollo-server/performance/caching/#adding-cache-hints-statically-in-your-schema
[3] https://github.com/apollographql/apollo-link-persisted-queries
[4] https://sling.apache.org/documentation/bundles/caching-services.html