You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "Yahui Liu (Jira)" <ji...@apache.org> on 2020/10/27 07:56:00 UTC

[jira] [Commented] (CARBONDATA-4044) Fix dirty data in indexfile while IUD with stale data in segment folder

    [ https://issues.apache.org/jira/browse/CARBONDATA-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221222#comment-17221222 ] 

Yahui Liu commented on CARBONDATA-4044:
---------------------------------------

I have faced this issue, and try to solve it. Currently we will call CarbonLoaderUtil.checkAndCreateCarbonDataLocation to check and create Segment_XXX folder(if not exist), but we didn't check wherther stale data exist in segment folder when Segment_XXX folder already exists. My idea is to try to remove Segment_XXX folder always before creating Segment_XXX folder again. It will make sure there will be no stale data. Is this solution validation for all cases? Please provide some ideas to me. Thanks. 

> Fix dirty data in indexfile while IUD with stale data in segment folder
> -----------------------------------------------------------------------
>
>                 Key: CARBONDATA-4044
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-4044
>             Project: CarbonData
>          Issue Type: Bug
>            Reporter: Xingjun Hao
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> XX.mergecarbonindex and XX..segment records the indexfiles list of a segment. now, we generate xx.mergeindexfile and xx.segment  based on filter out all indexfiles(including carbonindex and mergecarbonindex), which will leading dirty data when there is stale data in segment folder.
> For example, there are a stale index file in segment_0 folder, "0_1603763776.carbonindex".
> While loading, a new carbonindex "0_16037752342.carbonindex" is wrote, when merge carbonindex files, we expect to only merge 0_16037752342.carbonindex, But If we filter out all carbonindex in segment folder, both "0_1603763776.carbonindex" and 0_16037752342.carbonindex will be merged and recorded into segment file.
>  
> While updating, there has same problem. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)