You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/07/24 20:36:20 UTC
[jira] [Commented] (APEXMALHAR-2063) Integrate WAL to FS
WindowDataManager
[ https://issues.apache.org/jira/browse/APEXMALHAR-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391167#comment-15391167 ]
ASF GitHub Bot commented on APEXMALHAR-2063:
--------------------------------------------
Github user ilooner commented on a diff in the pull request:
https://github.com/apache/apex-malhar/pull/322#discussion_r71995041
--- Diff: library/src/main/java/org/apache/apex/malhar/lib/wal/WindowDataManager.java ---
@@ -41,15 +41,42 @@
*
* @since 2.0.0
*/
-public interface WindowDataManager extends StorageAgent, Component<Context.OperatorContext>
+public interface WindowDataManager extends Component<Context.OperatorContext>
{
/**
+ * Save the state for a window id.
+ * @param object state
+ * @param windowId window id
+ * @throws IOException
+ */
+ void save(Object object, long windowId) throws IOException;
+
+ /**
+ * Gets the object saved for the provided window id. <br/>
+ * Typically it is used to replay tuples of successive windows in input operators after failure.
+ *
+ * @param windowId window id
+ * @return saved state for the window id.
+ * @throws IOException
+ */
+ Object retrieve(long windowId) throws IOException;
+
+ /**
+ * Delete the artifact corresponding to the
--- End diff --
complete javadoc here?
> Integrate WAL to FS WindowDataManager
> -------------------------------------
>
> Key: APEXMALHAR-2063
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2063
> Project: Apache Apex Malhar
> Issue Type: Improvement
> Reporter: Chandni Singh
> Assignee: Chandni Singh
>
> FS Window Data Manager is used to save meta-data that helps in replaying tuples every completed application window after failure. For this it saves meta-data in a file per window. Having multiple small size files on hdfs cause issues as highlighted here:
> http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> Instead FS Window Data Manager can utilize the WAL to write data and maintain a mapping of how much data was flushed to WAL each window.
> In order to use FileSystemWAL for replaying data of a finished window, there are few changes made to FileSystemWAL this is because of following:
> 1. WindowDataManager needs to reply data of every finished window. This window may not be checkpointed.
> FileSystemWAL truncates the WAL file to the checkpointed point after recovery so this poses a problem.
> WindowDataManager should be able to control recovery of FileSystemWAL.
> 2. FileSystemWAL writes to temporary files. The mapping of temp files to actual file is part of its state which is checkpointed. Since WindowDataManager replays data of a window not yet checkpointed, it needs to know the actual temporary file the data is being persisted to.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)