You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "Ravindra Pesala (JIRA)" <ji...@apache.org> on 2018/02/03 10:17:00 UTC
[jira] [Updated] (CARBONDATA-2018) Optimization in reading/writing
for sort temp row during data loading
[ https://issues.apache.org/jira/browse/CARBONDATA-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ravindra Pesala updated CARBONDATA-2018:
----------------------------------------
Fix Version/s: (was: 1.3.0)
1.4.0
> Optimization in reading/writing for sort temp row during data loading
> ---------------------------------------------------------------------
>
> Key: CARBONDATA-2018
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2018
> Project: CarbonData
> Issue Type: Improvement
> Components: data-load
> Affects Versions: 1.3.0
> Reporter: xuchuanyin
> Assignee: xuchuanyin
> Priority: Major
> Fix For: 1.4.0
>
> Time Spent: 10h 40m
> Remaining Estimate: 0h
>
> # SCENARIO
> Currently in carbondata data loading, during sort process step, records will be sorted partially and spilled to the disk. And then carbondata will read these records and do merge sort.
> Since sort step is CPU-tense, during writing/reading these records, we can optimize the serialization/deserialization for these rows and reduce CPU consumption in parsing the rows.
> This should enhance the data loading performance.
> # RESOLVE
> We can pick up the un-sorted fields in the row and pack them as bytes array and skip paring them.
> # RESULT
> I've tested it in my cluster and seen about 8% performance gained (74MB/s/Node -> 81MB/s/Node).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)