You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pirk.apache.org by Tim Ellison <t....@gmail.com> on 2016/07/22 15:36:40 UTC

Serialization and storage changes

I picked up PIRK-19, about the schema storage format -- but then got a
bit distracted :-)

I've ended up pulling out the code that deals with serialization and
storage in the Querier, Query, and Response classes.

As we discussed on another thread, these types shouldn't really be
responsible for storing themselves.  The current implementation has
(duplicated) code for Java serializing each of these types to a local
file or HDFS.  As we add more storage and serialization types that will
soon be unmanageable, so I've pulled all that out into some helper
classes that deal with serialization and storage separately.

In the proposed changes, the usage pattern changes from
  query.writeToHDFSFile(new Path("filename"), fs);
to
  new HadoopFileSystemStore(fs).store("filename", query);

that is, rather than ask the query to store itself, you ask the storage
to store the object.  Each storage type can be configured with different
serializers to get different formats, or anything else we want.

Likewise, loading objects changes from
  Query query = Query.readFromFile("filename");
to
  Query query = new LocalFileSystemStore().recall("filename", Query.class);

Clearly we can define new storage types easily without having to add new
methods to Querier, Query, and Response.

I've submitted a [WIP] pull request with the changes for folk to have a
chance to look at this direction and decide if it is desirable.  It
needs some comments before it is ready to be submitted formally.  All
comments welcomed.

Regards,
Tim