You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Gabriel Reid (JIRA)" <ji...@apache.org> on 2014/03/16 08:18:51 UTC

[jira] [Resolved] (PHOENIX-412) Pipeline and buffer UPSERT SELECT to prevent writing results of SELECT to client

     [ https://issues.apache.org/jira/browse/PHOENIX-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gabriel Reid resolved PHOENIX-412.
----------------------------------

    Resolution: Fixed

Bulk resolve of closed issues imported from GitHub. This status was reached by first re-opening all closed imported issues and then resolving them in bulk.

> Pipeline and buffer UPSERT SELECT to prevent writing results of SELECT to client
> --------------------------------------------------------------------------------
>
>                 Key: PHOENIX-412
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-412
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: James Taylor
>            Assignee: James Taylor
>              Labels: enhancement
>
> A non limited SELECT currently runs in parallel, buffering the results on the client side. This works well in the typical use case of a selective WHERE clause (since the scan runs in parallel), but not so well otherwise. For UPSERT SELECT, a typical use case would be to create a new table based on an existing table. Often times, no WHERE clause will be present, thus causing us to write the entire table being selected on to the client machine, which is obviously bad.
> With secondary indexing coming in soon, and given that we use UPSERT SELECT to initially populate the index table, we should optimize this doing the following:
> * Modify ParallelIterators to be able to provide a factory to create the SpoolingResultIterator
> * In the case of UPSERT SELECT, create a spooling iterator that buffers the results into a MutationState (see existing code in UpsertCompiler:359 for upsert select run on client-side)
> * When the MutationState reaches the batch size limit, commit the batch (again as is done in UpsertCompiler) and clear the MutationState
> This will perform much better. Probably can just move the UpsertCompile code for this case into the new spooling iterator implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)