You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Marcell Ortutay (JIRA)" <ji...@apache.org> on 2018/03/21 22:47:00 UTC

[jira] [Comment Edited] (PHOENIX-4666) Add a subquery cache that persists beyond the life of a query

    [ https://issues.apache.org/jira/browse/PHOENIX-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408720#comment-16408720 ] 

Marcell Ortutay edited comment on PHOENIX-4666 at 3/21/18 10:46 PM:
--------------------------------------------------------------------

Thanks for the input [~maryannxue]. My current implementation is here: [https://github.com/ortutay23andme/phoenix/tree/4.7.0-HBase-1.1] and in particular this is my hacky patch: [https://github.com/ortutay23andme/phoenix/commit/04c96f672eb4bcdccec27f124373be766f8dd5af] . (Implemented on 4.7 for unrelated reasons, but the same idea I think is transferable to HEAD) Instead of a random cache ID it takes a hash of the query statement and uses that as the cache ID. Each Phoenix client maintains it's own memory of which cache IDs have already been executed (this is not ideal, but it was easy to implement this way).

If I'm understanding your proposal, the Phoenix client would attempt to use a cache ID with the expectation that it exists on region servers. The region server would throw an exception if the cache ID is not found, which indicates to Phoenix client that it should evaluate the subquery as usual.


was (Author: ortutay):
Thanks for the input [~maryannxue]. My current implementation is here: [https://github.com/ortutay23andme/phoenix/tree/4.7.0-HBase-1.1] and in particular this is my hacky patch: [https://github.com/ortutay23andme/phoenix/commit/04c96f672eb4bcdccec27f124373be766f8dd5af] . Instead of a random cache ID it takes a hash of the query statement and uses that as the cache ID. Each Phoenix client maintains it's own memory of which cache IDs have already been executed (this is not ideal, but it was easy to implement this way).

If I'm understanding your proposal, the Phoenix client would attempt to use a cache ID with the expectation that it exists on region servers. The region server would throw an exception if the cache ID is not found, which indicates to Phoenix client that it should evaluate the subquery as usual.

> Add a subquery cache that persists beyond the life of a query
> -------------------------------------------------------------
>
>                 Key: PHOENIX-4666
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4666
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Marcell Ortutay
>            Priority: Major
>
> The user list thread for additional context is here: [https://lists.apache.org/thread.html/e62a6f5d79bdf7cd238ea79aed8886816d21224d12b0f1fe9b6bb075@%3Cuser.phoenix.apache.org%3E]
> ----
> A Phoenix query may contain expensive subqueries, and moreover those expensive subqueries may be used across multiple different queries. While whole result caching is possible at the application level, it is not possible to cache subresults in the application. This can cause bad performance for queries in which the subquery is the most expensive part of the query, and the application is powerless to do anything at the query level. It would be good if Phoenix provided a way to cache subquery results, as it would provide a significant performance gain.
> An illustrative example:
>     SELECT * FROM table1 JOIN (SELECT id_1 FROM large_table WHERE x = 10) expensive_result ON table1.id_1 = expensive_result.id_2 AND table1.id_1 = \{id}
> In this case, the subquery "expensive_result" is expensive to compute, but it doesn't change between queries. The rest of the query does because of the \{id} parameter. This means the application can't cache it, but it would be good if there was a way to cache expensive_result.
> Note that there is currently a coprocessor based "server cache", but the data in this "cache" is not persisted across queries. It is deleted after a TTL expires (30sec by default), or when the query completes.
> This is issue is fairly high priority for us at 23andMe and we'd be happy to provide a patch with some guidance from Phoenix maintainers. We are currently putting together a design document for a solution, and we'll post it to this Jira ticket for review in a few days.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)