You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@accumulo.apache.org by Keith Turner <ke...@deenlo.com> on 2012/08/11 02:07:00 UTC

feedback on Typo

I put together a simple abstraction layer for Accumulo that makes it
easier to read and write Java objects to Accumulo key and value
fields.  The data written to Accumulo sort correctly
lexicographically.

I put the code on github and would like some feedback on the design
and whether it should be included with Accumulo.

https://github.com/keith-turner/typo

Its still a little rough and I need to add encoder for all of the
primitive types.

Keith

Re: feedback on Typo

Posted by Keith Turner <ke...@deenlo.com>.

On Mon, Aug 13, 2012 at 6:03 PM, Josh Elser <jo...@gmail.com> wrote:
> Even with something as simple as a pair, things can start getting difficult.
> I suppose it really revolves around the level of support you want to provide
> at scan time, e.g. "find all pairs where the second is 'x'?".

I implemented support for Pair and Triple.  Getting the tuples to sort
correctly lexicographically is tricky, which is why a library like
Typo is nice.  Below is a link to an example that uses Pair to store
an edge in the row of the Accumulo key.  The example scans over all
Pairs where the first is X.  This can be done efficiently by
leveraging the way Pair sorts.  Finding all pairs where the second is
X would require a full table scan.  One way to avoid this is to insert
the edge twice, insert Pair(X,Y) and Pair(Y,X), then you can find what
you are looking for w/o a full table scan.  I think this what you
mentioned below.

https://github.com/keith-turner/typo/blob/master/src/main/java/org/apache/accumulo/client/typo/example/GraphExample.java

>
> Spending a few minutes thinking about it, an index could be a separate table
> but wouldn't necessarily have to be. It depends on the complexity of the
> structure you're trying to index. Using the Pair example again, you could
> reserve a column (family) to place index records in which simply inverts the
> Pair in the colqual.

Right, so you could use Typo to do this but it would not do it for you.

>
>
> On 08/13/2012 11:06 AM, Keith Turner wrote:
>>
>> On Sun, Aug 12, 2012 at 9:36 PM, Josh Elser<jo...@gmail.com>  wrote:
>>>
>>> Neat idea, Keith.
>>>
>>> Have you thought about how to support more complex types? Specifically,
>>> arrays, hashes and the nesting of those? Any thoughts about indexing for
>>> those complex types?
>>
>> Yeah I was thinking that would be nice.  I see a lot of users putting
>> multiple types into the row and/or columns.  Could have something like
>> TupleEncoder<List<A>>.   TupleEncoder would need to encode it elements
>> such that it sorts correctly.  However, this may be cumbersome to use
>> if you want to use different types.  For example I want a row composed
>> of a Long and String.  I was thinking of having the following types to
>> handle this case.
>>
>> class Pair<A,B>  extends LexEncoder{
>>     Pair(LexEncoder<A>  enc1, LexEncoder<B>  enc2);
>>     A getFirst(){}
>>     B getSecond(){}
>> }
>>
>> class Triple<A,B,C>{//follows same pattern as Pair}
>> class Quadruple<A,B,C,D>{//follows same pattern as Pair}
>>
>> This would allow a user to write code like the following that makes it
>> easy to work with a row composed of a Long and String.
>>
>> Pair<Long, String>  pair;
>> long l = pair.getFirst();
>> String s = pair.getSecond();
>>
>> I am still thinking the tuple concept through.
>>
>> I was not considering indexing.  I assuming you mean creating an index
>> in another table?
>>
>>> Initial thoughts are that it would make the most sense to place Typo at
>>> the
>>> contrib level (or something equivalent). The reason being: Typo doesn't
>>> change the underlying functionality of Accumulo; it only provides a layer
>>> on
>>> top of it that makes life easier for developers.
>>
>> I think putting it in contrib makes sense.
>>
>>>
>>> On 08/10/2012 07:07 PM, Keith Turner wrote:
>>>>
>>>> I put together a simple abstraction layer for Accumulo that makes it
>>>> easier to read and write Java objects to Accumulo key and value
>>>> fields.  The data written to Accumulo sort correctly
>>>> lexicographically.
>>>>
>>>> I put the code on github and would like some feedback on the design
>>>> and whether it should be included with Accumulo.
>>>>
>>>> https://github.com/keith-turner/typo
>>>>
>>>> Its still a little rough and I need to add encoder for all of the
>>>> primitive types.
>>>>
>>>> Keith

Re: feedback on Typo

Posted by Marc Parisi <ma...@accumulo.net>.

write an annotation called TYPEDEF that creates the source code for you at
compilation. all you really need to do is extend the type to your defined
name.

On Wed, Aug 15, 2012 at 9:38 AM, Keith Turner <ke...@deenlo.com> wrote:

> On Wed, Aug 15, 2012 at 9:19 AM, Ed Kohlwey <ek...@gmail.com> wrote:
> > I think its not just about types, but specifically primitive types and
> > tuples.
> Right.  And sorting is another very important aspect.   User want to
> do things like store dates that sort in reverese order as part of a
> tuple in the row.  We tell them its possible if they encode their data
> in a certain way.  And we also tell them "oh, BTW if you have binary
> data in your tuple it can be tricky to get it right".  So one goal of
> Typo is to make this easier for users.   I think something like the
> following would do this and get the lexicographic sort order correct.
>
> class RDTypo extends Typo<Pair<Long, Date>,String,String,Text> {
>   public RDTypo() {
>     super(new PairLexicoder<Long,Date>(new LongLexicoder(), new
> ReverseLexicoder<Date>(new DateLexicoder())),
>              new StringLexicoder(), new StringLexicoder(), new
> TextLexicoder());
>   }
> }
>
> I so wish that Java had typedef, it could make the Typo API much more
> concise.  I never thought I would actually miss C++ template
> programming :)  I still need to do some more research on Java generics
> to see if I can make things more concise.
>
> >
> > So its avoiding being a full-fledged ORM solution like Gora.
> >
> >
> >> Am I right in assuming that this is about simplifying the API for
> >> storing typed data in the key, and not about providing a mechanism for
> >> query. Isn't this really just about storing stuff you've already
> >> decided was a good structure for whatever your query mechanism is?
>

Re: feedback on Typo

Posted by Ed Kohlwey <ek...@gmail.com>.

I started looking this morning at what it would take to change the encoders
to be nio based, and then realized that this was actually an issue in the
core accumulo API.

I think it would be nice to start a dialogue around introducing some
generic superclasses to Encoder, Key, and Value in order to provide a
cleaner, NIO based API that things like Typo can be implemented on top of.
I've started an issue to track thoughts on this:
https://issues.apache.org/jira/browse/ACCUMULO-731

This would also be a major help to projects like Gora.

On Wed, Aug 15, 2012 at 12:50 PM, Keith Turner <ke...@deenlo.com> wrote:

> On Wed, Aug 15, 2012 at 10:09 AM, Ed Kohlwey <ek...@gmail.com> wrote:
> > One suggestion I'd make is to force users to name their tuples by making
> > the tuple types abstract. This won't help your complexity but IMHO makes
> > code more readable.
>
> Thats a good suggestion, I made Typo abstract.  I also made class like
> TypoScanner, TypoMutation, etc inner classes of Typo.  Doing this I
> was able to achieve what I wanted to with typedef, making code that
> uses Typo more concise.  The inner classes and type parameters
> actually work the way I want, I was not sure it would before I tried
> it.
>
> >
> > This an issue of java style, but there's nothing more irritating than
> > tuples floating around code without having an obvious explanation of "why
> > do these things belong together"?
> >
> > On Wed, Aug 15, 2012 at 9:38 AM, Keith Turner <ke...@deenlo.com> wrote:
> >
> >> On Wed, Aug 15, 2012 at 9:19 AM, Ed Kohlwey <ek...@gmail.com> wrote:
> >> > I think its not just about types, but specifically primitive types and
> >> > tuples.
> >> Right.  And sorting is another very important aspect.   User want to
> >> do things like store dates that sort in reverese order as part of a
> >> tuple in the row.  We tell them its possible if they encode their data
> >> in a certain way.  And we also tell them "oh, BTW if you have binary
> >> data in your tuple it can be tricky to get it right".  So one goal of
> >> Typo is to make this easier for users.   I think something like the
> >> following would do this and get the lexicographic sort order correct.
> >>
> >> class RDTypo extends Typo<Pair<Long, Date>,String,String,Text> {
> >>   public RDTypo() {
> >>     super(new PairLexicoder<Long,Date>(new LongLexicoder(), new
> >> ReverseLexicoder<Date>(new DateLexicoder())),
> >>              new StringLexicoder(), new StringLexicoder(), new
> >> TextLexicoder());
> >>   }
> >> }
> >>
> >> I so wish that Java had typedef, it could make the Typo API much more
> >> concise.  I never thought I would actually miss C++ template
> >> programming :)  I still need to do some more research on Java generics
> >> to see if I can make things more concise.
> >>
> >> >
> >> > So its avoiding being a full-fledged ORM solution like Gora.
> >> >
> >> >
> >> >> Am I right in assuming that this is about simplifying the API for
> >> >> storing typed data in the key, and not about providing a mechanism
> for
> >> >> query. Isn't this really just about storing stuff you've already
> >> >> decided was a good structure for whatever your query mechanism is?
> >>
>

Re: feedback on Typo

Posted by Keith Turner <ke...@deenlo.com>.

On Wed, Aug 15, 2012 at 10:09 AM, Ed Kohlwey <ek...@gmail.com> wrote:
> One suggestion I'd make is to force users to name their tuples by making
> the tuple types abstract. This won't help your complexity but IMHO makes
> code more readable.

Thats a good suggestion, I made Typo abstract.  I also made class like
TypoScanner, TypoMutation, etc inner classes of Typo.  Doing this I
was able to achieve what I wanted to with typedef, making code that
uses Typo more concise.  The inner classes and type parameters
actually work the way I want, I was not sure it would before I tried
it.

>
> This an issue of java style, but there's nothing more irritating than
> tuples floating around code without having an obvious explanation of "why
> do these things belong together"?
>
> On Wed, Aug 15, 2012 at 9:38 AM, Keith Turner <ke...@deenlo.com> wrote:
>
>> On Wed, Aug 15, 2012 at 9:19 AM, Ed Kohlwey <ek...@gmail.com> wrote:
>> > I think its not just about types, but specifically primitive types and
>> > tuples.
>> Right.  And sorting is another very important aspect.   User want to
>> do things like store dates that sort in reverese order as part of a
>> tuple in the row.  We tell them its possible if they encode their data
>> in a certain way.  And we also tell them "oh, BTW if you have binary
>> data in your tuple it can be tricky to get it right".  So one goal of
>> Typo is to make this easier for users.   I think something like the
>> following would do this and get the lexicographic sort order correct.
>>
>> class RDTypo extends Typo<Pair<Long, Date>,String,String,Text> {
>>   public RDTypo() {
>>     super(new PairLexicoder<Long,Date>(new LongLexicoder(), new
>> ReverseLexicoder<Date>(new DateLexicoder())),
>>              new StringLexicoder(), new StringLexicoder(), new
>> TextLexicoder());
>>   }
>> }
>>
>> I so wish that Java had typedef, it could make the Typo API much more
>> concise.  I never thought I would actually miss C++ template
>> programming :)  I still need to do some more research on Java generics
>> to see if I can make things more concise.
>>
>> >
>> > So its avoiding being a full-fledged ORM solution like Gora.
>> >
>> >
>> >> Am I right in assuming that this is about simplifying the API for
>> >> storing typed data in the key, and not about providing a mechanism for
>> >> query. Isn't this really just about storing stuff you've already
>> >> decided was a good structure for whatever your query mechanism is?
>>

Re: feedback on Typo

Posted by Ed Kohlwey <ek...@gmail.com>.

One suggestion I'd make is to force users to name their tuples by making
the tuple types abstract. This won't help your complexity but IMHO makes
code more readable.

This an issue of java style, but there's nothing more irritating than
tuples floating around code without having an obvious explanation of "why
do these things belong together"?

On Wed, Aug 15, 2012 at 9:38 AM, Keith Turner <ke...@deenlo.com> wrote:

> On Wed, Aug 15, 2012 at 9:19 AM, Ed Kohlwey <ek...@gmail.com> wrote:
> > I think its not just about types, but specifically primitive types and
> > tuples.
> Right.  And sorting is another very important aspect.   User want to
> do things like store dates that sort in reverese order as part of a
> tuple in the row.  We tell them its possible if they encode their data
> in a certain way.  And we also tell them "oh, BTW if you have binary
> data in your tuple it can be tricky to get it right".  So one goal of
> Typo is to make this easier for users.   I think something like the
> following would do this and get the lexicographic sort order correct.
>
> class RDTypo extends Typo<Pair<Long, Date>,String,String,Text> {
>   public RDTypo() {
>     super(new PairLexicoder<Long,Date>(new LongLexicoder(), new
> ReverseLexicoder<Date>(new DateLexicoder())),
>              new StringLexicoder(), new StringLexicoder(), new
> TextLexicoder());
>   }
> }
>
> I so wish that Java had typedef, it could make the Typo API much more
> concise.  I never thought I would actually miss C++ template
> programming :)  I still need to do some more research on Java generics
> to see if I can make things more concise.
>
> >
> > So its avoiding being a full-fledged ORM solution like Gora.
> >
> >
> >> Am I right in assuming that this is about simplifying the API for
> >> storing typed data in the key, and not about providing a mechanism for
> >> query. Isn't this really just about storing stuff you've already
> >> decided was a good structure for whatever your query mechanism is?
>

Re: feedback on Typo

Posted by Keith Turner <ke...@deenlo.com>.

On Wed, Aug 15, 2012 at 9:19 AM, Ed Kohlwey <ek...@gmail.com> wrote:
> I think its not just about types, but specifically primitive types and
> tuples.
Right.  And sorting is another very important aspect.   User want to
do things like store dates that sort in reverese order as part of a
tuple in the row.  We tell them its possible if they encode their data
in a certain way.  And we also tell them "oh, BTW if you have binary
data in your tuple it can be tricky to get it right".  So one goal of
Typo is to make this easier for users.   I think something like the
following would do this and get the lexicographic sort order correct.

class RDTypo extends Typo<Pair<Long, Date>,String,String,Text> {
  public RDTypo() {
    super(new PairLexicoder<Long,Date>(new LongLexicoder(), new
ReverseLexicoder<Date>(new DateLexicoder())),
             new StringLexicoder(), new StringLexicoder(), new TextLexicoder());
  }
}

I so wish that Java had typedef, it could make the Typo API much more
concise.  I never thought I would actually miss C++ template
programming :)  I still need to do some more research on Java generics
to see if I can make things more concise.

>
> So its avoiding being a full-fledged ORM solution like Gora.
>
>
>> Am I right in assuming that this is about simplifying the API for
>> storing typed data in the key, and not about providing a mechanism for
>> query. Isn't this really just about storing stuff you've already
>> decided was a good structure for whatever your query mechanism is?

Re: feedback on Typo

Posted by Ed Kohlwey <ek...@gmail.com>.

I think its not just about types, but specifically primitive types and
tuples.

So its avoiding being a full-fledged ORM solution like Gora.


> Am I right in assuming that this is about simplifying the API for
> storing typed data in the key, and not about providing a mechanism for
> query. Isn't this really just about storing stuff you've already
> decided was a good structure for whatever your query mechanism is?

Re: feedback on Typo

Posted by Christopher Tubbs <ct...@gmail.com>.

Am I right in assuming that this is about simplifying the API for
storing typed data in the key, and not about providing a mechanism for
query. Isn't this really just about storing stuff you've already
decided was a good structure for whatever your query mechanism is?

On Mon, Aug 13, 2012 at 6:03 PM, Josh Elser <jo...@gmail.com> wrote:
> Even with something as simple as a pair, things can start getting difficult.
> I suppose it really revolves around the level of support you want to provide
> at scan time, e.g. "find all pairs where the second is 'x'?".
>
> Spending a few minutes thinking about it, an index could be a separate table
> but wouldn't necessarily have to be. It depends on the complexity of the
> structure you're trying to index. Using the Pair example again, you could
> reserve a column (family) to place index records in which simply inverts the
> Pair in the colqual.
>
>
> On 08/13/2012 11:06 AM, Keith Turner wrote:
>>
>> On Sun, Aug 12, 2012 at 9:36 PM, Josh Elser<jo...@gmail.com>  wrote:
>>>
>>> Neat idea, Keith.
>>>
>>> Have you thought about how to support more complex types? Specifically,
>>> arrays, hashes and the nesting of those? Any thoughts about indexing for
>>> those complex types?
>>
>> Yeah I was thinking that would be nice.  I see a lot of users putting
>> multiple types into the row and/or columns.  Could have something like
>> TupleEncoder<List<A>>.   TupleEncoder would need to encode it elements
>> such that it sorts correctly.  However, this may be cumbersome to use
>> if you want to use different types.  For example I want a row composed
>> of a Long and String.  I was thinking of having the following types to
>> handle this case.
>>
>> class Pair<A,B>  extends LexEncoder{
>>     Pair(LexEncoder<A>  enc1, LexEncoder<B>  enc2);
>>     A getFirst(){}
>>     B getSecond(){}
>> }
>>
>> class Triple<A,B,C>{//follows same pattern as Pair}
>> class Quadruple<A,B,C,D>{//follows same pattern as Pair}
>>
>> This would allow a user to write code like the following that makes it
>> easy to work with a row composed of a Long and String.
>>
>> Pair<Long, String>  pair;
>> long l = pair.getFirst();
>> String s = pair.getSecond();
>>
>> I am still thinking the tuple concept through.
>>
>> I was not considering indexing.  I assuming you mean creating an index
>> in another table?
>>
>>> Initial thoughts are that it would make the most sense to place Typo at
>>> the
>>> contrib level (or something equivalent). The reason being: Typo doesn't
>>> change the underlying functionality of Accumulo; it only provides a layer
>>> on
>>> top of it that makes life easier for developers.
>>
>> I think putting it in contrib makes sense.
>>
>>>
>>> On 08/10/2012 07:07 PM, Keith Turner wrote:
>>>>
>>>> I put together a simple abstraction layer for Accumulo that makes it
>>>> easier to read and write Java objects to Accumulo key and value
>>>> fields.  The data written to Accumulo sort correctly
>>>> lexicographically.
>>>>
>>>> I put the code on github and would like some feedback on the design
>>>> and whether it should be included with Accumulo.
>>>>
>>>> https://github.com/keith-turner/typo
>>>>
>>>> Its still a little rough and I need to add encoder for all of the
>>>> primitive types.
>>>>
>>>> Keith

Re: feedback on Typo

Posted by Josh Elser <jo...@gmail.com>.

Even with something as simple as a pair, things can start getting 
difficult. I suppose it really revolves around the level of support you 
want to provide at scan time, e.g. "find all pairs where the second is 
'x'?".

Spending a few minutes thinking about it, an index could be a separate 
table but wouldn't necessarily have to be. It depends on the complexity 
of the structure you're trying to index. Using the Pair example again, 
you could reserve a column (family) to place index records in which 
simply inverts the Pair in the colqual.

On 08/13/2012 11:06 AM, Keith Turner wrote:
> On Sun, Aug 12, 2012 at 9:36 PM, Josh Elser<jo...@gmail.com>  wrote:
>> Neat idea, Keith.
>>
>> Have you thought about how to support more complex types? Specifically,
>> arrays, hashes and the nesting of those? Any thoughts about indexing for
>> those complex types?
> Yeah I was thinking that would be nice.  I see a lot of users putting
> multiple types into the row and/or columns.  Could have something like
> TupleEncoder<List<A>>.   TupleEncoder would need to encode it elements
> such that it sorts correctly.  However, this may be cumbersome to use
> if you want to use different types.  For example I want a row composed
> of a Long and String.  I was thinking of having the following types to
> handle this case.
>
> class Pair<A,B>  extends LexEncoder{
>     Pair(LexEncoder<A>  enc1, LexEncoder<B>  enc2);
>     A getFirst(){}
>     B getSecond(){}
> }
>
> class Triple<A,B,C>{//follows same pattern as Pair}
> class Quadruple<A,B,C,D>{//follows same pattern as Pair}
>
> This would allow a user to write code like the following that makes it
> easy to work with a row composed of a Long and String.
>
> Pair<Long, String>  pair;
> long l = pair.getFirst();
> String s = pair.getSecond();
>
> I am still thinking the tuple concept through.
>
> I was not considering indexing.  I assuming you mean creating an index
> in another table?
>
>> Initial thoughts are that it would make the most sense to place Typo at the
>> contrib level (or something equivalent). The reason being: Typo doesn't
>> change the underlying functionality of Accumulo; it only provides a layer on
>> top of it that makes life easier for developers.
> I think putting it in contrib makes sense.
>
>>
>> On 08/10/2012 07:07 PM, Keith Turner wrote:
>>> I put together a simple abstraction layer for Accumulo that makes it
>>> easier to read and write Java objects to Accumulo key and value
>>> fields.  The data written to Accumulo sort correctly
>>> lexicographically.
>>>
>>> I put the code on github and would like some feedback on the design
>>> and whether it should be included with Accumulo.
>>>
>>> https://github.com/keith-turner/typo
>>>
>>> Its still a little rough and I need to add encoder for all of the
>>> primitive types.
>>>
>>> Keith

Re: feedback on Typo

Posted by Keith Turner <ke...@deenlo.com>.

On Sun, Aug 12, 2012 at 9:36 PM, Josh Elser <jo...@gmail.com> wrote:
> Neat idea, Keith.
>
> Have you thought about how to support more complex types? Specifically,
> arrays, hashes and the nesting of those? Any thoughts about indexing for
> those complex types?

Yeah I was thinking that would be nice.  I see a lot of users putting
multiple types into the row and/or columns.  Could have something like
TupleEncoder<List<A>>.   TupleEncoder would need to encode it elements
such that it sorts correctly.  However, this may be cumbersome to use
if you want to use different types.  For example I want a row composed
of a Long and String.  I was thinking of having the following types to
handle this case.

class Pair<A,B> extends LexEncoder{
   Pair(LexEncoder<A> enc1, LexEncoder<B> enc2);
   A getFirst(){}
   B getSecond(){}
}

class Triple<A,B,C>{//follows same pattern as Pair}
class Quadruple<A,B,C,D>{//follows same pattern as Pair}

This would allow a user to write code like the following that makes it
easy to work with a row composed of a Long and String.

Pair<Long, String> pair;
long l = pair.getFirst();
String s = pair.getSecond();

I am still thinking the tuple concept through.

I was not considering indexing.  I assuming you mean creating an index
in another table?

>
> Initial thoughts are that it would make the most sense to place Typo at the
> contrib level (or something equivalent). The reason being: Typo doesn't
> change the underlying functionality of Accumulo; it only provides a layer on
> top of it that makes life easier for developers.

I think putting it in contrib makes sense.

>
>
> On 08/10/2012 07:07 PM, Keith Turner wrote:
>>
>> I put together a simple abstraction layer for Accumulo that makes it
>> easier to read and write Java objects to Accumulo key and value
>> fields.  The data written to Accumulo sort correctly
>> lexicographically.
>>
>> I put the code on github and would like some feedback on the design
>> and whether it should be included with Accumulo.
>>
>> https://github.com/keith-turner/typo
>>
>> Its still a little rough and I need to add encoder for all of the
>> primitive types.
>>
>> Keith

Re: feedback on Typo

Posted by Josh Elser <jo...@gmail.com>.

Neat idea, Keith.

Have you thought about how to support more complex types? Specifically, 
arrays, hashes and the nesting of those? Any thoughts about indexing for 
those complex types?

Initial thoughts are that it would make the most sense to place Typo at 
the contrib level (or something equivalent). The reason being: Typo 
doesn't change the underlying functionality of Accumulo; it only 
provides a layer on top of it that makes life easier for developers.

On 08/10/2012 07:07 PM, Keith Turner wrote:
> I put together a simple abstraction layer for Accumulo that makes it
> easier to read and write Java objects to Accumulo key and value
> fields.  The data written to Accumulo sort correctly
> lexicographically.
>
> I put the code on github and would like some feedback on the design
> and whether it should be included with Accumulo.
>
> https://github.com/keith-turner/typo
>
> Its still a little rough and I need to add encoder for all of the
> primitive types.
>
> Keith

Re: feedback on Typo

Posted by Keith Turner <ke...@deenlo.com>.

On Mon, Aug 13, 2012 at 12:34 PM, Billie Rinaldi <bi...@apache.org> wrote:
> On Fri, Aug 10, 2012 at 8:07 PM, Keith Turner <ke...@deenlo.com> wrote:
>
>> I put together a simple abstraction layer for Accumulo that makes it
>> easier to read and write Java objects to Accumulo key and value
>> fields.  The data written to Accumulo sort correctly
>> lexicographically.
>>
>> I put the code on github and would like some feedback on the design
>> and whether it should be included with Accumulo.
>>
>> https://github.com/keith-turner/typo
>>
>> Its still a little rough and I need to add encoder for all of the
>> primitive types.
>>
>> Keith
>>
>
> Looks interesting.  It would be nice to have the TypedValueCombiner use the
> same Encoder, which leads to the question of where we should put this.  If

I agree, it would be nice for it share code w/ those iterators.
Also, it would be nice to have a DisplayFormatter and Constraint
support.

> Typo is moved to contrib, perhaps the TypedValueCombiner should be there,
> too.  Another option might be a submodule of examples.  It would be nice to
> have a set of standard encodings shipped with Accumulo.
>
> I'd like to discuss the LexEncoder.  It is an Encoder that preserves sort

I am thinking of changing the name to Lexicoder.

> order, but it doesn't have a way to enforce or test the sorting, or even to
> encourage the preservation of sort order except through the javadoc.  Is
> there anything we can do about this?  We could at least make some reusable
> testing patterns, but it would be nice if we could do more.

If you have Lexicoder<A> and A is comparable, the we could provide
infrastructure to make it easy to write test that confirm they agree.
This could be completely automated if you could generate a good set of
representative data for type A.  For example, for Long we would at
least want to test that MIN, MIN+1, -1, 0, 1, MAX-1, and MAX sort
correctly lexicograpically and via the comparable interface.  I am not
sure how we would automatically generate this set of test data for an
arbitrary type though.  We could make running the test simple if
someone provides the data.

>
> Billie

Re: feedback on Typo

Posted by Billie Rinaldi <bi...@apache.org>.

On Fri, Aug 10, 2012 at 8:07 PM, Keith Turner <ke...@deenlo.com> wrote:

> I put together a simple abstraction layer for Accumulo that makes it
> easier to read and write Java objects to Accumulo key and value
> fields.  The data written to Accumulo sort correctly
> lexicographically.
>
> I put the code on github and would like some feedback on the design
> and whether it should be included with Accumulo.
>
> https://github.com/keith-turner/typo
>
> Its still a little rough and I need to add encoder for all of the
> primitive types.
>
> Keith
>

Looks interesting.  It would be nice to have the TypedValueCombiner use the
same Encoder, which leads to the question of where we should put this.  If
Typo is moved to contrib, perhaps the TypedValueCombiner should be there,
too.  Another option might be a submodule of examples.  It would be nice to
have a set of standard encodings shipped with Accumulo.

I'd like to discuss the LexEncoder.  It is an Encoder that preserves sort
order, but it doesn't have a way to enforce or test the sorting, or even to
encourage the preservation of sort order except through the javadoc.  Is
there anything we can do about this?  We could at least make some reusable
testing patterns, but it would be nice if we could do more.

Billie

Re: feedback on Typo

Posted by Keith Turner <ke...@deenlo.com>.

On Sun, Aug 12, 2012 at 8:11 PM, Ed Kohlwey <ek...@gmail.com> wrote:
> I really like this. I've thought for some time that something of this sort
> should be part of the Accumulo core API. The inconsistent use use
> CharSequence, String, Text, and byte[] objects to represent the n-tuples
> gets very old very quickly, and distracts programmers in a multitude of
> ways.
>
> The current API should really be refactored to make CharSequence, Text,
> byte[], and ByteBuffer types available for setting the contents of Key and
> Value types just for consistency's sake. It would be nice to have something
> like this added as well.

I made TypoMutation extend Mutation.  I wanted to make TypoKey extend
Key, but could not because the return types conflicted.  This could be
done, but getRow(),etc would need different names.

>
> It would be good to see this package use nio-ish strategies to reduce the
> load on the garbage collector, by using buffer classes instead of arrays.
> But otherwise the design looks solid.

I would definitely like to avoid allocations if possible, I will look
into that.

>
> On Fri, Aug 10, 2012 at 8:07 PM, Keith Turner <ke...@deenlo.com> wrote:
>
>> I put together a simple abstraction layer for Accumulo that makes it
>> easier to read and write Java objects to Accumulo key and value
>> fields.  The data written to Accumulo sort correctly
>> lexicographically.
>>
>> I put the code on github and would like some feedback on the design
>> and whether it should be included with Accumulo.
>>
>> https://github.com/keith-turner/typo
>>
>> Its still a little rough and I need to add encoder for all of the
>> primitive types.
>>
>> Keith
>>

Re: feedback on Typo

Posted by Ed Kohlwey <ek...@gmail.com>.

I really like this. I've thought for some time that something of this sort
should be part of the Accumulo core API. The inconsistent use use
CharSequence, String, Text, and byte[] objects to represent the n-tuples
gets very old very quickly, and distracts programmers in a multitude of
ways.

The current API should really be refactored to make CharSequence, Text,
byte[], and ByteBuffer types available for setting the contents of Key and
Value types just for consistency's sake. It would be nice to have something
like this added as well.

It would be good to see this package use nio-ish strategies to reduce the
load on the garbage collector, by using buffer classes instead of arrays.
But otherwise the design looks solid.

On Fri, Aug 10, 2012 at 8:07 PM, Keith Turner <ke...@deenlo.com> wrote:

> I put together a simple abstraction layer for Accumulo that makes it
> easier to read and write Java objects to Accumulo key and value
> fields.  The data written to Accumulo sort correctly
> lexicographically.
>
> I put the code on github and would like some feedback on the design
> and whether it should be included with Accumulo.
>
> https://github.com/keith-turner/typo
>
> Its still a little rough and I need to add encoder for all of the
> primitive types.
>
> Keith
>