You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pirk.apache.org by tellison <gi...@git.apache.org> on 2016/07/22 15:16:26 UTC

[GitHub] incubator-pirk pull request #18: [WIP] Serialization and storage changes to ...

GitHub user tellison opened a pull request:

    https://github.com/apache/incubator-pirk/pull/18

    [WIP] Serialization and storage changes to Querier, Query, and Response.

     - Introduce serialization services that can flatten objects into a
    variety of different formats.  Currently just reusing the Java
    serialization code until the project is ready to adopt other
    serializers.
     
     - Introduce storage services that can store objects and retrieve them.
    Currently just reusing the local file system and HDFS.  The storage
    service uses one of the serialization services to flatten the objects.
     
     - Remove code from Querier, Query, and Response that was serializing
    and storing each of them; and update the callers to use the storage
    services.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tellison/incubator-pirk serialization

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-pirk/pull/18.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18
    
----
commit aef6aea88896f808c578e3660785a9b8483e375d
Author: Tim Ellison <t....@gmail.com>
Date:   2016-07-22T14:40:12Z

    Serialization and storage changes to Querier, Query, and Response.
    
     - Introduce serialization services that can flatten objects into a
    variety of different formats.  Currently just reusing the Java
    serialization code until the project is ready to adopt other
    serializers.
     
     - Introduce storage services that can store objects and retrieve them.
    Currently just reusing the local file system and HDFS.  The storage
    service uses one of the serialization services to flatten the objects.
     
     - Remove code from Querier, Query, and Response that was serializing
    and storing each of them; and update the callers to use the storage
    services.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #18: [WIP] Serialization and storage changes to ...

Posted by smarthi <gi...@git.apache.org>.
Github user smarthi commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/18#discussion_r71946446
  
    --- Diff: src/main/java/org/apache/pirk/serialization/JavaSerializer.java ---
    @@ -0,0 +1,49 @@
    +/*******************************************************************************
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + *******************************************************************************/
    +package org.apache.pirk.serialization;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.ObjectInputStream;
    +import java.io.ObjectOutputStream;
    +import java.io.OutputStream;
    +
    +public class JavaSerializer extends SerializationService
    +{
    +
    +  public void write(OutputStream stream, Storable obj) throws IOException
    +  {
    +    ObjectOutputStream oos = new ObjectOutputStream(stream);
    +    oos.writeObject(obj);
    +  }
    +
    +  @SuppressWarnings("unchecked")
    +  public <T> T read(InputStream stream, Class<T> type) throws IOException
    +  {
    +    ObjectInputStream oin = new ObjectInputStream(stream);
    +    try
    --- End diff --
    
    how bout 
    `try(ObjectInputStream oin = new ObjectInputStream(stream);)` 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk issue #18: [WIP] Serialization and storage changes to Querier...

Posted by tellison <gi...@git.apache.org>.
Github user tellison commented on the issue:

    https://github.com/apache/incubator-pirk/pull/18
  
    @ellisonanne Sorry, I don't see ```fs``` defined in ```ExpTableMapper```?
    
    Yep, as serializers and storage services are added they should come with tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #18: [WIP] Serialization and storage changes to ...

Posted by smarthi <gi...@git.apache.org>.
Github user smarthi commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/18#discussion_r71920324
  
    --- Diff: src/main/java/org/apache/pirk/serialization/HadoopFileSystemStore.java ---
    @@ -0,0 +1,94 @@
    +/*******************************************************************************
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + *******************************************************************************/
    +package org.apache.pirk.serialization;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +
    +import org.apache.hadoop.fs.FileSystem;
    +import org.apache.hadoop.fs.Path;
    +
    +public class HadoopFileSystemStore extends StorageService
    +{
    +
    +  private FileSystem hadoopFileSystem;
    +
    +  // Prevents others from using default constructor.
    +  HadoopFileSystemStore()
    +  {
    +    super();
    +  }
    +
    +  /**
    +   * Creates a new storage service on the given HDFS file system using default Java serialization.
    +   */
    +  public HadoopFileSystemStore(FileSystem fs)
    +  {
    +    super();
    +    hadoopFileSystem = fs;
    +  }
    +
    +  public HadoopFileSystemStore(FileSystem fs, SerializationService serial)
    +  {
    +    super(serial);
    +    hadoopFileSystem = fs;
    +  }
    +
    +  public void store(String pathName, Storable value) throws IOException
    +  {
    +    store(new Path(pathName), value);
    +  }
    +
    +  public void store(Path path, Storable obj) throws IOException
    +  {
    +    OutputStream os = hadoopFileSystem.create(path);
    +    try
    +    {
    +      serializer.write(os, obj);
    +    } finally
    +    {
    +      if (os != null)
    +      {
    +        os.close();
    +      }
    +    }
    +  }
    +
    +  public <T> T recall(String pathName, Class<T> type) throws IOException
    +  {
    +    return recall(new Path(pathName), type);
    +  }
    +
    +  public <T> T recall(Path path, Class<T> type) throws IOException
    +  {
    +    InputStream is = hadoopFileSystem.open(path);
    +    try
    --- End diff --
    
    try-all ? Here and other places in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk issue #18: [PIRK-25] Serialization and storage changes to Que...

Posted by tellison <gi...@git.apache.org>.
Github user tellison commented on the issue:

    https://github.com/apache/incubator-pirk/pull/18
  
    Ah, I guess you are looking at the current code (pre-PR).  I in-lined that ```fs``` definition as it is only used once.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk issue #18: [WIP] Serialization and storage changes to Querier...

Posted by smarthi <gi...@git.apache.org>.
Github user smarthi commented on the issue:

    https://github.com/apache/incubator-pirk/pull/18
  
    @tellison meant to say use  - try-all-resources
    
    `//    FileOutputStream fos = new FileOutputStream(file);
        try(FileOutputStream fos = new FileOutputStream(file)){ .... }`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #18: [WIP] Serialization and storage changes to ...

Posted by smarthi <gi...@git.apache.org>.
Github user smarthi commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/18#discussion_r71946503
  
    --- Diff: src/main/java/org/apache/pirk/serialization/LocalFileSystemStore.java ---
    @@ -0,0 +1,82 @@
    +/*******************************************************************************
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + *******************************************************************************/
    +package org.apache.pirk.serialization;
    +
    +import java.io.File;
    +import java.io.FileInputStream;
    +import java.io.FileOutputStream;
    +import java.io.IOException;
    +
    +public class LocalFileSystemStore extends StorageService
    +{
    +
    +  /**
    +   * Creates a new storage service on the local file system using default Java serialization.
    +   */
    +  public LocalFileSystemStore()
    +  {
    +    super();
    +  }
    +
    +  public LocalFileSystemStore(SerializationService serial)
    +  {
    +    super(serial);
    +  }
    +
    +  public void store(String path, Storable obj) throws IOException
    +  {
    +    store(new File(path), obj);
    +  }
    +
    +  public void store(File file, Storable obj) throws IOException
    +  {
    +    FileOutputStream fos = new FileOutputStream(file);
    +    try
    --- End diff --
    
    `try(FileOutputStream fos = new FileOutputStream(file);)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #18: [PIRK-25] Serialization and storage changes...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-pirk/pull/18


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk issue #18: [WIP] Serialization and storage changes to Querier...

Posted by ellisonanne <gi...@git.apache.org>.
Github user ellisonanne commented on the issue:

    https://github.com/apache/incubator-pirk/pull/18
  
    Very nice - much cleaner. 
    
    One very minor thing that you may want to go ahead and change in ExpTableMapper:
    
    `query = new HadoopFileSystemStore(FileSystem.newInstance(ctx.getConfiguration())).recall(queryDir, Query.class);`
     
    to
    
    `query = new HadoopFileSystemStore(fs).recall(queryDir, Query.class);`
    
    as the FileSystem fs has already been grabbed a couple of lines above.
    
    Otherwise, we will need to make sure that custom StorageServices and SerializationServices have corresponding tests. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #18: [WIP] Serialization and storage changes to ...

Posted by smarthi <gi...@git.apache.org>.
Github user smarthi commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/18#discussion_r71920276
  
    --- Diff: src/main/java/org/apache/pirk/serialization/HadoopFileSystemStore.java ---
    @@ -0,0 +1,94 @@
    +/*******************************************************************************
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + *******************************************************************************/
    +package org.apache.pirk.serialization;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +
    +import org.apache.hadoop.fs.FileSystem;
    +import org.apache.hadoop.fs.Path;
    +
    +public class HadoopFileSystemStore extends StorageService
    +{
    +
    +  private FileSystem hadoopFileSystem;
    +
    +  // Prevents others from using default constructor.
    +  HadoopFileSystemStore()
    +  {
    +    super();
    +  }
    +
    +  /**
    +   * Creates a new storage service on the given HDFS file system using default Java serialization.
    +   */
    +  public HadoopFileSystemStore(FileSystem fs)
    +  {
    +    super();
    +    hadoopFileSystem = fs;
    +  }
    +
    +  public HadoopFileSystemStore(FileSystem fs, SerializationService serial)
    +  {
    +    super(serial);
    +    hadoopFileSystem = fs;
    +  }
    +
    +  public void store(String pathName, Storable value) throws IOException
    +  {
    +    store(new Path(pathName), value);
    +  }
    +
    +  public void store(Path path, Storable obj) throws IOException
    +  {
    +    OutputStream os = hadoopFileSystem.create(path);
    +    try
    --- End diff --
    
    How bout using a try-all here ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk issue #18: [WIP] Serialization and storage changes to Querier...

Posted by tellison <gi...@git.apache.org>.
Github user tellison commented on the issue:

    https://github.com/apache/incubator-pirk/pull/18
  
    @smarthi  Ah, ok.  Yep, I had mistakenly thought that try-with-resources was a Java 8 feature :-)  I'll update in due course (not today).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk issue #18: [PIRK-25] Serialization and storage changes to Que...

Posted by ellisonanne <gi...@git.apache.org>.
Github user ellisonanne commented on the issue:

    https://github.com/apache/incubator-pirk/pull/18
  
    Correct - I was looking at the old code
    
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #18: [PIRK-25] Serialization and storage changes...

Posted by smarthi <gi...@git.apache.org>.
Github user smarthi commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/18#discussion_r71979713
  
    --- Diff: src/main/java/org/apache/pirk/serialization/SerializationService.java ---
    @@ -0,0 +1,49 @@
    +/*******************************************************************************
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + *******************************************************************************/
    +package org.apache.pirk.serialization;
    +
    +import java.io.ByteArrayOutputStream;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +
    +/*
    + * Ability to read and write objects to/from a stream.
    + */
    +public abstract class SerializationService
    +{
    +  public abstract <T> T read(InputStream stream, Class<T> type) throws IOException;
    +
    +  public abstract void write(OutputStream w, Storable obj) throws IOException;
    +
    +  public byte[] toBytes(Storable obj)
    +  {
    +    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    +
    +    try
    --- End diff --
    
    Change to try(ByteArrayOutputStream bos = new ByteArrayOutputStream(); )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk issue #18: [WIP] Serialization and storage changes to Querier...

Posted by ellisonanne <gi...@git.apache.org>.
Github user ellisonanne commented on the issue:

    https://github.com/apache/incubator-pirk/pull/18
  
    `fs` is defined in setup of `ExpTableMapper`:
    
     ```
    public void setup(Context ctx) throws IOException, InterruptedException
      {
        super.setup(ctx);
    
        valueOut = new Text();
    
        FileSystem fs = FileSystem.newInstance(ctx.getConfiguration());
        String queryDir = ctx.getConfiguration().get("pirMR.queryInputDir");
        query = Query.readFromHDFSFile(new Path(queryDir), fs);
    
        dataPartitionBitSize = query.getQueryInfo().getDataPartitionBitSize();
        maxValue = (int) Math.pow(2, dataPartitionBitSize) - 1;
    
        NSquared = query.getNSquared();
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk issue #18: [WIP] Serialization and storage changes to Querier...

Posted by tellison <gi...@git.apache.org>.
Github user tellison commented on the issue:

    https://github.com/apache/incubator-pirk/pull/18
  
    Ha, managed to take down the Travis VM ;-)
    > *** buffer overflow detected ***: /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java terminated
    
    Yet it worked on my Travis branch before I opened the PR.  Maybe an admin can retry it pls.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk issue #18: [WIP] Serialization and storage changes to Querier...

Posted by tellison <gi...@git.apache.org>.
Github user tellison commented on the issue:

    https://github.com/apache/incubator-pirk/pull/18
  
    @smarthi   Sorry, not groking "try-all" can you explain?
    
    These follow the method overloading pattern seen in a variety of places like  Arrays.fill(byte[], byte), Arrays.fill(char[], char), etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---