You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by Gabriel Reid <ga...@gmail.com> on 2012/07/03 09:42:03 UTC

Invalid schema created in AvrosTest#testNestedTables

Hi guys,

While implementing the map side joins, I needed to make the PType
interface extend Serializable, and that caused me to stumble upon an
issue in AvrosTest. The testNestedTables method creates a nested table
schema (unsurprisingly), with the call in question being equivalent to
this:

    Avros.tableOf(Avros.strings(), Avros.tableOf(Avros.ints(),
Avros.doubles()));

This results in an invalid schema being created due to the same
namespace and name (org.apache.avro.mapred.Pair) being used twice in
the schema.

The error with the the invalid schema occurs in the Schema#toString
method -- it should probably result in an exception during the
creation of the Schema itself, but the toString method is used
everywhere, so this will fail if it's used no matter what.

Does anyone know nested Avro tables is a real use case that needs to
be supported (seeing as they won't actually work right now)? Am I
right in assuming that pretty much the same thing could be
accomplished by just doing the following call?

    Avros.tableOf(Avros.strings(), Avros.pairs(Avros.ints(), Avros.doubles()));

I know that unique names are created for Avro Tuple schemas in Crunch,
and I assume that this is done to avoid name collisions like this
case. I'm thinking that we could do some kind of similar trick to
allow the nested tables to work, but I don't think that this is worth
the effort (unless someone says that nested tables are a real use case
that needs to be supported).

I'll report the late catching of the invalid Schema to the Avro
project, but unless anyone objects, I think we can probably just
remove this test case. Anyone against that idea?


- Gabriel

Re: Invalid schema created in AvrosTest#testNestedTables

Posted by Josh Wills <jw...@cloudera.com>.
On Tue, Jul 3, 2012 at 7:35 AM, Gabriel Reid <ga...@gmail.com> wrote:

> On Tue, Jul 3, 2012 at 4:26 PM, Josh Wills <jo...@gmail.com> wrote:
> > On Tue, Jul 3, 2012 at 12:42 AM, Gabriel Reid <ga...@gmail.com>
> wrote:
> >>
> >> While implementing the map side joins, I needed to make the PType
> >> interface extend Serializable, and that caused me to stumble upon an
> >> issue in AvrosTest. The testNestedTables method creates a nested table
> >> schema (unsurprisingly), with the call in question being equivalent to
> >> this:
> >>
> >>     Avros.tableOf(Avros.strings(), Avros.tableOf(Avros.ints(),
> >> Avros.doubles()));
> >>
> >> This results in an invalid schema being created due to the same
> >> namespace and name (org.apache.avro.mapred.Pair) being used twice in
> >> the schema.
> ...
> >
> > I added it as a convenience method for a project I was working on-- my
> > thought was that if you tried to put a PTableType inside of another
> > PTableType, we should detect it and automatically convert the nested
> > PTableType to a pairs(keyType, valueType), as you indicated was the
> > right thing to do. I'm surprised that it didn't work properly, since I
> > thought that I was doing the conversion from PTableType to
> > pairs(keyType, valueType) when the PType was being constructed. Do you
> > mind if I take a look at it first and try to fix it?
>
> Yep, please do -- if there's a use case for it, it definitely makes
> sense to keep it and fix it. You can spot the issue if you call
> toString on the schema that is created (in the nested table unit
> test).
>

Ack, found it. I'll check in the patch in a few minutes, assuming it's not
going to conflict with your stuff.


>
> - Gabriel
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: Invalid schema created in AvrosTest#testNestedTables

Posted by Gabriel Reid <ga...@gmail.com>.
On Tue, Jul 3, 2012 at 4:26 PM, Josh Wills <jo...@gmail.com> wrote:
> On Tue, Jul 3, 2012 at 12:42 AM, Gabriel Reid <ga...@gmail.com> wrote:
>>
>> While implementing the map side joins, I needed to make the PType
>> interface extend Serializable, and that caused me to stumble upon an
>> issue in AvrosTest. The testNestedTables method creates a nested table
>> schema (unsurprisingly), with the call in question being equivalent to
>> this:
>>
>>     Avros.tableOf(Avros.strings(), Avros.tableOf(Avros.ints(),
>> Avros.doubles()));
>>
>> This results in an invalid schema being created due to the same
>> namespace and name (org.apache.avro.mapred.Pair) being used twice in
>> the schema.
...
>
> I added it as a convenience method for a project I was working on-- my
> thought was that if you tried to put a PTableType inside of another
> PTableType, we should detect it and automatically convert the nested
> PTableType to a pairs(keyType, valueType), as you indicated was the
> right thing to do. I'm surprised that it didn't work properly, since I
> thought that I was doing the conversion from PTableType to
> pairs(keyType, valueType) when the PType was being constructed. Do you
> mind if I take a look at it first and try to fix it?

Yep, please do -- if there's a use case for it, it definitely makes
sense to keep it and fix it. You can spot the issue if you call
toString on the schema that is created (in the nested table unit
test).

- Gabriel

Re: Invalid schema created in AvrosTest#testNestedTables

Posted by Josh Wills <jo...@gmail.com>.
On Tue, Jul 3, 2012 at 12:42 AM, Gabriel Reid <ga...@gmail.com> wrote:
> Hi guys,
>
> While implementing the map side joins, I needed to make the PType
> interface extend Serializable, and that caused me to stumble upon an
> issue in AvrosTest. The testNestedTables method creates a nested table
> schema (unsurprisingly), with the call in question being equivalent to
> this:
>
>     Avros.tableOf(Avros.strings(), Avros.tableOf(Avros.ints(),
> Avros.doubles()));
>
> This results in an invalid schema being created due to the same
> namespace and name (org.apache.avro.mapred.Pair) being used twice in
> the schema.
>
> The error with the the invalid schema occurs in the Schema#toString
> method -- it should probably result in an exception during the
> creation of the Schema itself, but the toString method is used
> everywhere, so this will fail if it's used no matter what.
>
> Does anyone know nested Avro tables is a real use case that needs to
> be supported (seeing as they won't actually work right now)? Am I
> right in assuming that pretty much the same thing could be
> accomplished by just doing the following call?
>
>     Avros.tableOf(Avros.strings(), Avros.pairs(Avros.ints(), Avros.doubles()));
>
> I know that unique names are created for Avro Tuple schemas in Crunch,
> and I assume that this is done to avoid name collisions like this
> case. I'm thinking that we could do some kind of similar trick to
> allow the nested tables to work, but I don't think that this is worth
> the effort (unless someone says that nested tables are a real use case
> that needs to be supported).
>
> I'll report the late catching of the invalid Schema to the Avro
> project, but unless anyone objects, I think we can probably just
> remove this test case. Anyone against that idea?

I added it as a convenience method for a project I was working on-- my
thought was that if you tried to put a PTableType inside of another
PTableType, we should detect it and automatically convert the nested
PTableType to a pairs(keyType, valueType), as you indicated was the
right thing to do. I'm surprised that it didn't work properly, since I
thought that I was doing the conversion from PTableType to
pairs(keyType, valueType) when the PType was being constructed. Do you
mind if I take a look at it first and try to fix it?

>
>
> - Gabriel