You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by tushargosavi <gi...@git.apache.org> on 2016/02/27 18:13:37 UTC

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-1965 Utility classe...

GitHub user tushargosavi opened a pull request:

    https://github.com/apache/incubator-apex-malhar/pull/204

    APEXMALHAR-1965 Utility classes for Write Ahead Log support

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tushargosavi/incubator-apex-malhar APEXMALHAR-1965

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-apex-malhar/pull/204.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #204
    
----
commit ab8332958643d9cb41ba287250b289ebb7fafada
Author: Tushar R. Gosavi <tu...@apache.org>
Date:   2016-02-27T17:03:17Z

    APEXMALHAR-1965 Utility classes for Write Ahead Log support

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-1965 Utility classe...

Posted by chandnisingh <gi...@git.apache.org>.
Github user chandnisingh commented on a diff in the pull request:

    https://github.com/apache/incubator-apex-malhar/pull/204#discussion_r57546213
  
    --- Diff: library/src/test/java/org/apache/apex/malhar/lib/wal/WALTest.java ---
    @@ -0,0 +1,236 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.apex.malhar.lib.wal;
    +
    +import java.io.File;
    +import java.io.IOException;
    +import java.util.Random;
    +import java.util.Set;
    +
    +import org.junit.Assert;
    +import org.junit.Test;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import org.apache.commons.io.FileUtils;
    +
    +import com.google.common.collect.Sets;
    +
    +import com.datatorrent.lib.fileaccess.FileAccessFSImpl;
    +
    +public class WALTest
    --- End diff --
    
    If this is test for FSWal can you please rename to that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-1965 Utility classe...

Posted by chandnisingh <gi...@git.apache.org>.
Github user chandnisingh commented on a diff in the pull request:

    https://github.com/apache/incubator-apex-malhar/pull/204#discussion_r57832121
  
    --- Diff: library/src/main/java/org/apache/apex/malhar/lib/wal/FSWal.java ---
    @@ -0,0 +1,178 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.apex.malhar.lib.wal;
    +
    +import java.io.DataInputStream;
    +import java.io.DataOutputStream;
    +import java.io.EOFException;
    +import java.io.IOException;
    +
    +import org.apache.hadoop.fs.FSDataOutputStream;
    +
    +import com.google.common.base.Preconditions;
    +
    +import com.datatorrent.lib.fileaccess.FileAccess;
    +
    +/**
    + * WAL implementation which allows writing entries to single file.
    + * the pointer type is the offset in the file. The entry is serialized into byte array and
    + * first length of the entry is written followed by the serialized bytes.
    + * @param <T>
    + */
    +public class FSWal<T> implements WAL<T, Long>
    +{
    +  transient long bucketKey;
    +  private transient String name;
    +  private transient FileAccess fa;
    +  private transient WAL.Serde<T> serde;
    +
    +  public FSWal(FileAccess fa, WAL.Serde<T> serde, long bucketKey, String name)
    --- End diff --
    
    My earlier comment just pertains to FSWal. I think bucket key is leftover from HDHT implementation of File System Wal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-1965 Utility classe...

Posted by tweise <gi...@git.apache.org>.
Github user tweise commented on a diff in the pull request:

    https://github.com/apache/incubator-apex-malhar/pull/204#discussion_r60134700
  
    --- Diff: library/src/main/java/org/apache/apex/malhar/lib/wal/WAL.java ---
    @@ -0,0 +1,130 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.apex.malhar.lib.wal;
    +
    +import java.io.Closeable;
    +import java.io.IOException;
    +
    +/**
    + * This interface represents a write ahead log that can be used by operator.
    + * the WAL is split into two interfaces, a WALWriter which allows writing
    + * data, and WALReader which provides iterator like interface to read entries
    + * writen to the WAL.
    + *
    + * @param <T> Tuple type
    + * @param <P> WAL Pointer Type.
    + */
    +public interface WAL<T, P>
    +{
    +  WALReader<T, P> getReader() throws IOException;
    +
    +  WALWriter<T, P> getWriter() throws IOException;
    +
    +  /**
    +   * Provides iterator like interface to read entries from the WAL.
    +   * @param <T> type of WAL entries
    +   * @param <P> type of Pointer in the WAL
    +   */
    +  interface WALReader<T, P> extends Closeable
    +  {
    +    /**
    +     * Close WAL after read.
    +     *
    +     * @param offset seek offset.
    +     * @throws IOException
    +     */
    +    @Override
    +    void close() throws IOException;
    +
    +    /**
    +     * Seek to middle of the WAL. This is used primarily during recovery,
    +     * when we need to start recovering data from middle of WAL file.
    +     */
    +    void seek(P offset) throws IOException;
    +
    +    /**
    +     * Advance WAL by one entry, returns true if it can advance, else false
    +     * in case of any other error throws an Exception.
    +     *
    +     * @return true if next data item is read successfully, false if data can not be read.
    +     * @throws IOException
    +     */
    +    boolean advance() throws IOException;
    +
    +    /**
    +     * Return current entry from WAL, returns null if end of file has reached.
    +     *
    +     * @return MutableKeyValue
    +     */
    +    T get();
    +
    +    /**
    +     * Return the offset corresponding to the last read entry.
    +     * @return
    +     */
    +    P getOffset();
    +  }
    +
    +  /**
    +   * Provide method to write entries to the WAL.
    +   * @param <T>
    +   * @param <P>
    +   */
    +  interface WALWriter<T, P>
    +  {
    +    /**
    +     * flush pending data to disk and close file.
    +     *
    +     * @throws IOException
    +     */
    +    void close() throws IOException;
    +
    +    /**
    +     * Write an entry to the WAL, this operation need not flush the data.
    +     */
    +    int append(T entry) throws IOException;
    +
    +    /**
    +     * Flush data to persistent storage.
    +     *
    +     * @throws IOException
    +     */
    +    void flush() throws IOException;
    +
    +    /**
    +     * Returns size of the WAL, last part of the log may not be persisted on disk.
    +     * In case of file backed WAL this will be the size of file, in case of kafka
    +     * like log, this will be similar to the message offset.
    +     *
    +     * @return The log size
    +     */
    +    P getOffset();
    --- End diff --
    
    @chandnisingh I see it as getSize() used in HDHT for rollover. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-1965 Utility classe...

Posted by tweise <gi...@git.apache.org>.
Github user tweise commented on a diff in the pull request:

    https://github.com/apache/incubator-apex-malhar/pull/204#discussion_r57830582
  
    --- Diff: library/src/main/java/org/apache/apex/malhar/lib/wal/FSWal.java ---
    @@ -0,0 +1,178 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.apex.malhar.lib.wal;
    +
    +import java.io.DataInputStream;
    +import java.io.DataOutputStream;
    +import java.io.EOFException;
    +import java.io.IOException;
    +
    +import org.apache.hadoop.fs.FSDataOutputStream;
    +
    +import com.google.common.base.Preconditions;
    +
    +import com.datatorrent.lib.fileaccess.FileAccess;
    +
    +/**
    + * WAL implementation which allows writing entries to single file.
    + * the pointer type is the offset in the file. The entry is serialized into byte array and
    + * first length of the entry is written followed by the serialized bytes.
    + * @param <T>
    + */
    +public class FSWal<T> implements WAL<T, Long>
    +{
    +  transient long bucketKey;
    +  private transient String name;
    +  private transient FileAccess fa;
    +  private transient WAL.Serde<T> serde;
    +
    +  public FSWal(FileAccess fa, WAL.Serde<T> serde, long bucketKey, String name)
    --- End diff --
    
    Another consideration for the WAL interface is to use backend other than FS, for example Kafka.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-1965 Utility classe...

Posted by chandnisingh <gi...@git.apache.org>.
Github user chandnisingh commented on a diff in the pull request:

    https://github.com/apache/incubator-apex-malhar/pull/204#discussion_r59621181
  
    --- Diff: library/src/main/java/org/apache/apex/malhar/lib/wal/WAL.java ---
    @@ -0,0 +1,130 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.apex.malhar.lib.wal;
    +
    +import java.io.Closeable;
    +import java.io.IOException;
    +
    +/**
    + * This interface represents a write ahead log that can be used by operator.
    + * the WAL is split into two interfaces, a WALWriter which allows writing
    + * data, and WALReader which provides iterator like interface to read entries
    + * writen to the WAL.
    + *
    + * @param <T> Tuple type
    + * @param <P> WAL Pointer Type.
    + */
    +public interface WAL<T, P>
    +{
    +  WALReader<T, P> getReader() throws IOException;
    +
    +  WALWriter<T, P> getWriter() throws IOException;
    +
    +  /**
    +   * Provides iterator like interface to read entries from the WAL.
    +   * @param <T> type of WAL entries
    +   * @param <P> type of Pointer in the WAL
    +   */
    +  interface WALReader<T, P> extends Closeable
    +  {
    +    /**
    +     * Close WAL after read.
    +     *
    +     * @param offset seek offset.
    +     * @throws IOException
    +     */
    +    @Override
    +    void close() throws IOException;
    +
    +    /**
    +     * Seek to middle of the WAL. This is used primarily during recovery,
    +     * when we need to start recovering data from middle of WAL file.
    +     */
    +    void seek(P offset) throws IOException;
    +
    +    /**
    +     * Advance WAL by one entry, returns true if it can advance, else false
    +     * in case of any other error throws an Exception.
    +     *
    +     * @return true if next data item is read successfully, false if data can not be read.
    +     * @throws IOException
    +     */
    +    boolean advance() throws IOException;
    +
    +    /**
    +     * Return current entry from WAL, returns null if end of file has reached.
    +     *
    +     * @return MutableKeyValue
    +     */
    +    T get();
    +
    +    /**
    +     * Return the offset corresponding to the last read entry.
    +     * @return
    +     */
    +    P getOffset();
    +  }
    +
    +  /**
    +   * Provide method to write entries to the WAL.
    +   * @param <T>
    +   * @param <P>
    +   */
    +  interface WALWriter<T, P>
    +  {
    +    /**
    +     * flush pending data to disk and close file.
    +     *
    +     * @throws IOException
    +     */
    +    void close() throws IOException;
    +
    +    /**
    +     * Write an entry to the WAL, this operation need not flush the data.
    +     */
    +    int append(T entry) throws IOException;
    +
    +    /**
    +     * Flush data to persistent storage.
    +     *
    +     * @throws IOException
    +     */
    +    void flush() throws IOException;
    +
    +    /**
    +     * Returns size of the WAL, last part of the log may not be persisted on disk.
    +     * In case of file backed WAL this will be the size of file, in case of kafka
    +     * like log, this will be similar to the message offset.
    +     *
    +     * @return The log size
    +     */
    +    P getOffset();
    --- End diff --
    
    @tushargosavi the javadoc is not clear here. Can you please tell me the intention of this method? I have changed the implementation of FsWALWriter and I am unable to understand  the significance of this method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-1965 Utility classe...

Posted by chandnisingh <gi...@git.apache.org>.
Github user chandnisingh commented on a diff in the pull request:

    https://github.com/apache/incubator-apex-malhar/pull/204#discussion_r57802416
  
    --- Diff: library/src/main/java/org/apache/apex/malhar/lib/wal/FSWal.java ---
    @@ -0,0 +1,178 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.apex.malhar.lib.wal;
    +
    +import java.io.DataInputStream;
    +import java.io.DataOutputStream;
    +import java.io.EOFException;
    +import java.io.IOException;
    +
    +import org.apache.hadoop.fs.FSDataOutputStream;
    +
    +import com.google.common.base.Preconditions;
    +
    +import com.datatorrent.lib.fileaccess.FileAccess;
    +
    +/**
    + * WAL implementation which allows writing entries to single file.
    + * the pointer type is the offset in the file. The entry is serialized into byte array and
    + * first length of the entry is written followed by the serialized bytes.
    + * @param <T>
    + */
    +public class FSWal<T> implements WAL<T, Long>
    +{
    +  transient long bucketKey;
    +  private transient String name;
    +  private transient FileAccess fa;
    +  private transient WAL.Serde<T> serde;
    +
    +  public FSWal(FileAccess fa, WAL.Serde<T> serde, long bucketKey, String name)
    --- End diff --
    
    I agree with Isha on using simple File API.
    I don't think there should be a bucket key here. IMO I was thinking of a WAL interface where you provide a path of the WAL and it automatically create part files as and when a file is complete (configured by size)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-1965 Utility classe...

Posted by ishark <gi...@git.apache.org>.
Github user ishark commented on a diff in the pull request:

    https://github.com/apache/incubator-apex-malhar/pull/204#discussion_r57801373
  
    --- Diff: library/src/main/java/org/apache/apex/malhar/lib/wal/FSWal.java ---
    @@ -0,0 +1,178 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.apex.malhar.lib.wal;
    +
    +import java.io.DataInputStream;
    +import java.io.DataOutputStream;
    +import java.io.EOFException;
    +import java.io.IOException;
    +
    +import org.apache.hadoop.fs.FSDataOutputStream;
    +
    +import com.google.common.base.Preconditions;
    +
    +import com.datatorrent.lib.fileaccess.FileAccess;
    +
    +/**
    + * WAL implementation which allows writing entries to single file.
    + * the pointer type is the offset in the file. The entry is serialized into byte array and
    + * first length of the entry is written followed by the serialized bytes.
    + * @param <T>
    + */
    +public class FSWal<T> implements WAL<T, Long>
    +{
    +  transient long bucketKey;
    +  private transient String name;
    +  private transient FileAccess fa;
    +  private transient WAL.Serde<T> serde;
    +
    +  public FSWal(FileAccess fa, WAL.Serde<T> serde, long bucketKey, String name)
    --- End diff --
    
    Does it make sense to have WAL with simple File APIs instead of FileAccess? That would make it generic to use. Also, can bucketKey be a string instead of long?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-1965 Utility classe...

Posted by tushargosavi <gi...@git.apache.org>.
Github user tushargosavi closed the pull request at:

    https://github.com/apache/incubator-apex-malhar/pull/204


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-1965 Utility classe...

Posted by chandnisingh <gi...@git.apache.org>.
Github user chandnisingh commented on a diff in the pull request:

    https://github.com/apache/incubator-apex-malhar/pull/204#discussion_r57546194
  
    --- Diff: library/src/main/java/org/apache/apex/malhar/lib/wal/WAL.java ---
    @@ -0,0 +1,130 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.apex.malhar.lib.wal;
    +
    +import java.io.Closeable;
    +import java.io.IOException;
    +
    +/**
    + * This interface represents a write ahead log that can be used by operator.
    + * the WAL is split into two interfaces, a WALWriter which allows writing
    + * data, and WALReader which provides iterator like interface to read entries
    + * writen to the WAL.
    + *
    + * @param <T> Tuple type
    + * @param <P> WAL Pointer Type.
    + */
    +public interface WAL<T, P>
    +{
    +  WALReader<T, P> getReader() throws IOException;
    +
    +  WALWriter<T, P> getWriter() throws IOException;
    +
    +  /**
    +   * Provides iterator like interface to read entries from the WAL.
    +   * @param <T> type of WAL entries
    +   * @param <P> type of Pointer in the WAL
    +   */
    +  interface WALReader<T, P> extends Closeable
    +  {
    +    /**
    +     * Close WAL after read.
    +     *
    +     * @param offset seek offset.
    --- End diff --
    
    there isn't any offset argument in close method


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-1965 Utility classe...

Posted by chandnisingh <gi...@git.apache.org>.
Github user chandnisingh commented on a diff in the pull request:

    https://github.com/apache/incubator-apex-malhar/pull/204#discussion_r57424510
  
    --- Diff: library/src/main/java/org/apache/apex/malhar/lib/wal/FSWal.java ---
    @@ -0,0 +1,178 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.apex.malhar.lib.wal;
    +
    +import java.io.DataInputStream;
    +import java.io.DataOutputStream;
    +import java.io.EOFException;
    +import java.io.IOException;
    +
    +import org.apache.hadoop.fs.FSDataOutputStream;
    +
    +import com.google.common.base.Preconditions;
    +
    +import com.datatorrent.lib.fileaccess.FileAccess;
    +
    +/**
    + * WAL implementation which allows writing entries to single file.
    + * the pointer type is the offset in the file. The entry is serialized into byte array and
    + * first length of the entry is written followed by the serialized bytes.
    + * @param <T>
    + */
    +public class FSWal<T> implements WAL<T, Long>
    +{
    +  transient long bucketKey;
    +  private transient String name;
    +  private transient FileAccess fa;
    --- End diff --
    
    These fields are all transient.  Is this intended? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-1965 Utility classe...

Posted by chandnisingh <gi...@git.apache.org>.
Github user chandnisingh commented on a diff in the pull request:

    https://github.com/apache/incubator-apex-malhar/pull/204#discussion_r60138670
  
    --- Diff: library/src/main/java/org/apache/apex/malhar/lib/wal/WAL.java ---
    @@ -0,0 +1,130 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.apex.malhar.lib.wal;
    +
    +import java.io.Closeable;
    +import java.io.IOException;
    +
    +/**
    + * This interface represents a write ahead log that can be used by operator.
    + * the WAL is split into two interfaces, a WALWriter which allows writing
    + * data, and WALReader which provides iterator like interface to read entries
    + * writen to the WAL.
    + *
    + * @param <T> Tuple type
    + * @param <P> WAL Pointer Type.
    + */
    +public interface WAL<T, P>
    +{
    +  WALReader<T, P> getReader() throws IOException;
    +
    +  WALWriter<T, P> getWriter() throws IOException;
    +
    +  /**
    +   * Provides iterator like interface to read entries from the WAL.
    +   * @param <T> type of WAL entries
    +   * @param <P> type of Pointer in the WAL
    +   */
    +  interface WALReader<T, P> extends Closeable
    +  {
    +    /**
    +     * Close WAL after read.
    +     *
    +     * @param offset seek offset.
    +     * @throws IOException
    +     */
    +    @Override
    +    void close() throws IOException;
    +
    +    /**
    +     * Seek to middle of the WAL. This is used primarily during recovery,
    +     * when we need to start recovering data from middle of WAL file.
    +     */
    +    void seek(P offset) throws IOException;
    +
    +    /**
    +     * Advance WAL by one entry, returns true if it can advance, else false
    +     * in case of any other error throws an Exception.
    +     *
    +     * @return true if next data item is read successfully, false if data can not be read.
    +     * @throws IOException
    +     */
    +    boolean advance() throws IOException;
    +
    +    /**
    +     * Return current entry from WAL, returns null if end of file has reached.
    +     *
    +     * @return MutableKeyValue
    +     */
    +    T get();
    +
    +    /**
    +     * Return the offset corresponding to the last read entry.
    +     * @return
    +     */
    +    P getOffset();
    +  }
    +
    +  /**
    +   * Provide method to write entries to the WAL.
    +   * @param <T>
    +   * @param <P>
    +   */
    +  interface WALWriter<T, P>
    +  {
    +    /**
    +     * flush pending data to disk and close file.
    +     *
    +     * @throws IOException
    +     */
    +    void close() throws IOException;
    +
    +    /**
    +     * Write an entry to the WAL, this operation need not flush the data.
    +     */
    +    int append(T entry) throws IOException;
    +
    +    /**
    +     * Flush data to persistent storage.
    +     *
    +     * @throws IOException
    +     */
    +    void flush() throws IOException;
    +
    +    /**
    +     * Returns size of the WAL, last part of the log may not be persisted on disk.
    +     * In case of file backed WAL this will be the size of file, in case of kafka
    +     * like log, this will be similar to the message offset.
    +     *
    +     * @return The log size
    +     */
    +    P getOffset();
    --- End diff --
    
    In the changes I have made roll-over is taken by the FileSystemWAL automatically so I have removed this method from the api of WAL Writer. I think rolling over is specific to File System Wal. 
    I
     can add it back but if it make sense for other types of Writers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---