You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by "Ravindra Pesala (JIRA)" <ji...@apache.org> on 2018/02/03 10:17:00 UTC

[jira] [Updated] (CARBONDATA-2018) Optimization in reading/writing for sort temp row during data loading

     [ https://issues.apache.org/jira/browse/CARBONDATA-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ravindra Pesala updated CARBONDATA-2018:
----------------------------------------
    Fix Version/s:     (was: 1.3.0)
                   1.4.0

> Optimization in reading/writing for sort temp row during data loading
> ---------------------------------------------------------------------
>
>                 Key: CARBONDATA-2018
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2018
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: data-load
>    Affects Versions: 1.3.0
>            Reporter: xuchuanyin
>            Assignee: xuchuanyin
>            Priority: Major
>             Fix For: 1.4.0
>
>          Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> # SCENARIO
> Currently in carbondata data loading, during sort process step, records will be sorted partially and spilled to the disk. And then carbondata will read these records and do merge sort.
> Since sort step is CPU-tense, during writing/reading these records, we can optimize the serialization/deserialization for these rows and reduce CPU consumption in parsing the rows.
> This should enhance the data loading performance.
> # RESOLVE
> We can pick up the un-sorted fields in the row and pack them as bytes array and skip paring them.
> # RESULT
> I've tested it in my cluster and seen about 8% performance gained (74MB/s/Node -> 81MB/s/Node).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)