You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pirk.apache.org by Jacob Wilder <ja...@gmail.com> on 2016/07/27 15:04:24 UTC

[PIRK-20] Switching to Use Enum for DataTypes | Seeking input

Hello everyone,
I've tried a few things while working on this but would appreciate some
suggestions on how to move forward. Or perhaps decide not to.

I've pushed a very limited version of this up here:
https://github.com/jacobwil/incubator-pirk/compare/master...jacobwil:EnumForTypesInDataPartitioner

In this limited version we move knowledge of the size of each type (and
appropriate number of partitions) into the enum making it easier to reuse
that logic without reproducing entire getNumPartitions and getBits methods
in each DataPartitioner implementation. I think this is worth keeping.

Basically, I made PrimitiveTypeEnum which holds each of the primitive
types, can provide their bit size, and their number of partitions. Callers
to PrimitiveTypePartitioner still provide types as strings which
PrimitiveTypePartitioner converts into PrimitiveTypeEnums.

In my original attempt I took this further and started to change
interface DataPartitioner into DataPartitioner<T extends TypeEnum> and
PrimitiveTypePartitioner into PrimitiveTypePartitioner<PrimitiveTypeEnum>,
etc.

I managed to push the string-to-enum conversion most of the way up to
LoadDataSchemas. That's when I started to wrestle with just how much more
complex this was becoming.
While it's possible to have the instantiation of the DataPartitioner (as
requested by the XML file) provide a factory method that converts
string-represented-type information into the correct TypeEnum, I'd
appreciate some feedback on if we consider this to truly be valuable.

—
Jacob WIlder

Re: [PIRK-20] Switching to Use Enum for DataTypes | Seeking input

Posted by Ellison Anne Williams <ea...@gmail.com>.

Comments inline.

On Wed, Jul 27, 2016 at 12:14 PM, Tim Ellison <t....@gmail.com> wrote:

> On 27/07/16 16:04, Jacob Wilder wrote:
> > Hello everyone,
> > I've tried a few things while working on this but would appreciate some
> > suggestions on how to move forward. Or perhaps decide not to.
> >
> > I've pushed a very limited version of this up here:
> >
> https://github.com/jacobwil/incubator-pirk/compare/master...jacobwil:EnumForTypesInDataPartitioner
> >
> > In this limited version we move knowledge of the size of each type (and
> > appropriate number of partitions) into the enum making it easier to reuse
> > that logic without reproducing entire getNumPartitions and getBits
> methods
> > in each DataPartitioner implementation. I think this is worth keeping.
>
> Cool.  It also gives some further typing rather than dealing with
> strings everywhere.
>
> > Basically, I made PrimitiveTypeEnum which holds each of the primitive
> > types, can provide their bit size, and their number of partitions.
> Callers
> > to PrimitiveTypePartitioner still provide types as strings which
> > PrimitiveTypePartitioner converts into PrimitiveTypeEnums.
> >
> > In my original attempt I took this further and started to change
> > interface DataPartitioner into DataPartitioner<T extends TypeEnum> and
> > PrimitiveTypePartitioner into
> PrimitiveTypePartitioner<PrimitiveTypeEnum>,
> > etc.
> >
> > I managed to push the string-to-enum conversion most of the way up to
> > LoadDataSchemas. That's when I started to wrestle with just how much more
> > complex this was becoming.
>
> You're right that ideally the loader should be doing the string
> conversion (i.e. move the safelyConvertTypeToEnum into the loader).
> BTW if you do go there you should merge in the new schema and schema
> loader code to avoid rework pain.
>
> I like the idea of having the DataPartitioner<T extends TypeEnum> too,
> but without trying it I'll take your word that it gets hairy.  If you
> succeed you'd be able to change things like
>   toPartitions(Object obj, String typeAsString)
> to drop the string arg, right?
>
> > While it's possible to have the instantiation of the DataPartitioner (as
> > requested by the XML file) provide a factory method that converts
> > string-represented-type information into the correct TypeEnum, I'd
> > appreciate some feedback on if we consider this to truly be valuable.
>
> I'm still trying to get my head around what it means for non-primitive
> partitioners... in particular what the onus will be on people wanting to
> add a new partitioner type.
>  - implement a new TypeEnum
>  - update the factory/whatever for the schema loader
>  - anything else?
>
>
Folks wanting to implement a non-primitive type partitioner just implement
the DataPartitioner interface -- if TypeEnum is added as a part of the
interface, then folks would have to implement it too.

The DataPartitioner dictates how to parse a specific data element and,
currently, the DataPartitioner for each element for a data schema is stored
within the corresponding DataSchema object. There is room for improvement
here - for example, a DataPartitionerRegistry could be created (similar to
the DataSchemaRegistry) to avoid duplicate instantiations across multiple
DataSchemas within the DataSchemaRegistry. The downside is that it will
introduce some additional complication/overhead to the code.


> Regards,
> Tim
>
>
>

Re: [PIRK-20] Switching to Use Enum for DataTypes | Seeking input

Posted by Tim Ellison <t....@gmail.com>.

On 27/07/16 16:04, Jacob Wilder wrote:
> Hello everyone,
> I've tried a few things while working on this but would appreciate some
> suggestions on how to move forward. Or perhaps decide not to.
> 
> I've pushed a very limited version of this up here:
> https://github.com/jacobwil/incubator-pirk/compare/master...jacobwil:EnumForTypesInDataPartitioner
> 
> In this limited version we move knowledge of the size of each type (and
> appropriate number of partitions) into the enum making it easier to reuse
> that logic without reproducing entire getNumPartitions and getBits methods
> in each DataPartitioner implementation. I think this is worth keeping.

Cool.  It also gives some further typing rather than dealing with
strings everywhere.

> Basically, I made PrimitiveTypeEnum which holds each of the primitive
> types, can provide their bit size, and their number of partitions. Callers
> to PrimitiveTypePartitioner still provide types as strings which
> PrimitiveTypePartitioner converts into PrimitiveTypeEnums.
> 
> In my original attempt I took this further and started to change
> interface DataPartitioner into DataPartitioner<T extends TypeEnum> and
> PrimitiveTypePartitioner into PrimitiveTypePartitioner<PrimitiveTypeEnum>,
> etc.
> 
> I managed to push the string-to-enum conversion most of the way up to
> LoadDataSchemas. That's when I started to wrestle with just how much more
> complex this was becoming.

You're right that ideally the loader should be doing the string
conversion (i.e. move the safelyConvertTypeToEnum into the loader).
BTW if you do go there you should merge in the new schema and schema
loader code to avoid rework pain.

I like the idea of having the DataPartitioner<T extends TypeEnum> too,
but without trying it I'll take your word that it gets hairy.  If you
succeed you'd be able to change things like
  toPartitions(Object obj, String typeAsString)
to drop the string arg, right?

> While it's possible to have the instantiation of the DataPartitioner (as
> requested by the XML file) provide a factory method that converts
> string-represented-type information into the correct TypeEnum, I'd
> appreciate some feedback on if we consider this to truly be valuable.

I'm still trying to get my head around what it means for non-primitive
partitioners... in particular what the onus will be on people wanting to
add a new partitioner type.
 - implement a new TypeEnum
 - update the factory/whatever for the schema loader
 - anything else?

Regards,
Tim