You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/10/16 04:19:00 UTC

[jira] [Commented] (ASTERIXDB-2286) Parallel Sort Optimization

    [ https://issues.apache.org/jira/browse/ASTERIXDB-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651096#comment-16651096 ] 

ASF subversion and git services commented on ASTERIXDB-2286:
------------------------------------------------------------

Commit 80225e2c27d77514ecaa774235951187ef524193 in asterixdb's branch refs/heads/master from [~alsuliman]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=80225e2 ]

[ASTERIXDB-2286][COMP][FUN][HYR] Parallel Sort Optimization

- user model changes: yes
- storage format changes: no
- interface changes: yes

details:
- new plan for sort operation which includes sampling and
replicating the stream of data to be sorted. Sort-merge connector
is removed from the plan. The sorted result now is in multiple partitions.
- new optimization rule to check whether full parallel sort is applicable.
- new Forward operator to read the replicated sort input stream and
to receive the ouput of the sampling.
- new sequential merge connector to merge a globally ordered result residing
in multiple partitions (in addition to the connector's partition computer).
- "asterix-lang-aql/pom.xml" is changed as a result of refactoring
code related to the range map handling.
- new private sampling function to generate the range map object
(local & global functions) & their type computers.

user model changes:
- new compiler property is added to enable and disable parallel sort.

interface changes:
- "ILogicalOperatorVisitor.java" includes Forward Operator.
- "ITuplePartitionComputer.java" includes initialize() to enable partitioner
to do some initialization. FieldRangePartitionComputerFactory uses it to
pick a range map.
- "ITuplePartitionComputerFactory.java". createPartitioner() is changed to
createPartitioner(IHyracksTaskContext hyracksTaskContext). Context is needed
for transferring the range map throught the context.

Change-Id: I73e128029a46f45e6b68c23dfb9310d5de10582f
Reviewed-on: https://asterix-gerrit.ics.uci.edu/2393
Tested-by: Jenkins <je...@fulliautomatix.ics.uci.edu>
Contrib: Jenkins <je...@fulliautomatix.ics.uci.edu>
Integration-Tests: Jenkins <je...@fulliautomatix.ics.uci.edu>
Reviewed-by: Dmitry Lychagin <dm...@couchbase.com>


> Parallel Sort Optimization
> --------------------------
>
>                 Key: ASTERIXDB-2286
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2286
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>          Components: COMP - Compiler, FUN - Functions, HYR - Hyracks
>            Reporter: Ali Alsuliman
>            Assignee: Ali Alsuliman
>            Priority: Major
>              Labels: triaged
>
> The current plan for queries with ORDER BY clauses consists of two phases; sorting the data locally in each partition and then sort-merging the data in one single partition. Even though the local sort happens in parallel, this effort is wasted by the fact that the merge is happening at one partition. It is desired to remove the merge step and do a true parallel sort where data is range-partitioned across the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)