You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by "Chandni Singh (JIRA)" <ji...@apache.org> on 2016/06/03 05:43:59 UTC

[jira] [Comment Edited] (APEXMALHAR-2063) Integrate WAL to FS WindowDataManager

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313600#comment-15313600 ] 

Chandni Singh edited comment on APEXMALHAR-2063 at 6/3/16 5:43 AM:
-------------------------------------------------------------------

Yes. Managed State flushes data only at checkpoints and this is how it uses WindowDataManager


was (Author: csingh):
Yes. Managed State flushes data only at checkpoints.

> Integrate WAL to FS WindowDataManager
> -------------------------------------
>
>                 Key: APEXMALHAR-2063
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2063
>             Project: Apache Apex Malhar
>          Issue Type: Improvement
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>
> FS Window Data Manager is used to save meta-data that helps in replaying tuples every completed application window after failure. For this it saves meta-data in a file per window. Having multiple small size files on hdfs cause issues as highlighted here:
> http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> Instead FS Window Data Manager can utilize the WAL to write data and maintain a mapping of how much data was flushed to WAL each window. 
> In order to use FileSystemWAL for replaying data of a finished window, there are few changes made to FileSystemWAL this is because of following:
> 1. WindowDataManager needs to reply data of every finished window. This window may not be checkpointed. 
> FileSystemWAL truncates the WAL file to the checkpointed point after recovery so this poses a problem. 
> WindowDataManager should be able to control recovery of FileSystemWAL.
> 2.  FileSystemWAL writes to temporary files. The mapping of temp files to actual file is part of its state which is checkpointed. Since WindowDataManager replays data of a window not yet checkpointed, it needs to know the actual temporary file the data is being persisted to.
> At a high level, WindowDataManager will persist meta information on file system which includes following details for every window 
> - start wal pointer
> - end was pointer
> - wal file path
> This is a single file which is updated every end-window along with the actual data in WAL file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)