You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Mohammad Islam <mi...@yahoo.com> on 2013/06/18 05:26:48 UTC

Review Request: Hive-3159 Update AvroSerde to determine schema of new tables

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/
-----------------------------------------------------------

Review request for hive, Jakob Homan and Ashutosh Chauhan.


Description
-------

Problem:
Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
For multiple cases, it is very inconvenient for user.
Some of the un-supported use cases:
1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
3. Create  table  without specifying Avro schema.


This addresses bug HIVE-3159.
    https://issues.apache.org/jira/browse/HIVE-3159


Diffs
-----

  ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 

Diff: https://reviews.apache.org/r/11925/diff/


Testing
-------

Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.


Thanks,

Mohammad Islam


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Mohammad Islam <mi...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/
-----------------------------------------------------------

(Updated May 9, 2014, 12:23 a.m.)


Review request for hive, Ashutosh Chauhan and Jakob Homan.


Changes
-------

Rebased with the latest commit.


Bugs: HIVE-3159
    https://issues.apache.org/jira/browse/HIVE-3159


Repository: hive-git


Description
-------

Problem:
Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
For multiple cases, it is very inconvenient for user.
Some of the un-supported use cases:
1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
3. Create  table  without specifying Avro schema.


Diffs (updated)
-----

  ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_nested_complex.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_nullable_fields.q f90ceb9 
  ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_nested_complex.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_nullable_fields.q.out 77a6a2e 
  ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 4564e75 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 

Diff: https://reviews.apache.org/r/11925/diff/


Testing
-------

Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.


Thanks,

Mohammad Islam


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Mohammad Islam <mi...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/
-----------------------------------------------------------

(Updated May 6, 2014, 10:06 p.m.)


Review request for hive, Ashutosh Chauhan and Jakob Homan.


Changes
-------

New patch that addressed Carl's review comments.

This patch addressed the following missing functions.
1. Create AVRO table from using HIVE schema ( w/o  specifying Avro Schema).
2. Copy AVRO table structure and data from existing non-AVRO table using CTAS.
3. Copy AVRO table structure and data from existing AVRO table using CTAS.

Note: We can close dependent JIRA HIVE-5803 that is no longer required. Another JIRA has already taken care of this.


Bugs: HIVE-3159
    https://issues.apache.org/jira/browse/HIVE-3159


Repository: hive-git


Description
-------

Problem:
Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
For multiple cases, it is very inconvenient for user.
Some of the un-supported use cases:
1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
3. Create  table  without specifying Avro schema.


Diffs (updated)
-----

  ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_nested_complex.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_nullable_fields.q f90ceb9 
  ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_nested_complex.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_nullable_fields.q.out 77a6a2e 
  ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 

Diff: https://reviews.apache.org/r/11925/diff/


Testing
-------

Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.


Thanks,

Mohammad Islam


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Mohammad Islam <mi...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/
-----------------------------------------------------------

(Updated Jan. 31, 2014, 2:18 a.m.)


Review request for hive, Ashutosh Chauhan and Jakob Homan.


Changes
-------

Addressed Carl's review comments


Bugs: HIVE-3159
    https://issues.apache.org/jira/browse/HIVE-3159


Repository: hive-git


Description
-------

Problem:
Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
For multiple cases, it is very inconvenient for user.
Some of the un-supported use cases:
1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
3. Create  table  without specifying Avro schema.


Diffs (updated)
-----

  ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_nested_complex.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_nullable_fields.q f90ceb9 
  ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_nested_complex.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_nullable_fields.q.out 81950b0 
  ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 

Diff: https://reviews.apache.org/r/11925/diff/


Testing
-------

Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.


Thanks,

Mohammad Islam


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Mohammad Islam <mi...@yahoo.com>.

> On Jan. 11, 2014, 8:05 a.m., Carl Steinbach wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 146
> > <https://reviews.apache.org/r/11925/diff/6/?file=420187#file420187line146>
> >
> >     wrapWithUnion is not used.

Used in updated code.


> On Jan. 11, 2014, 8:05 a.m., Carl Steinbach wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 163
> > <https://reviews.apache.org/r/11925/diff/6/?file=420187#file420187line163>
> >
> >     wrapWithUnion is not used.

Used in updated code.


> On Jan. 11, 2014, 8:05 a.m., Carl Steinbach wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 174
> > <https://reviews.apache.org/r/11925/diff/6/?file=420187#file420187line174>
> >
> >     wrapWithUnion is not used.

Used in updated code.


> On Jan. 11, 2014, 8:05 a.m., Carl Steinbach wrote:
> > ql/src/test/queries/clientpositive/avro_create_as_select.q, line 3
> > <https://reviews.apache.org/r/11925/diff/6/?file=420178#file420178line3>
> >
> >     Please populate the source table with data before running a CTAS statement against it (the current example is functionally equivalent to CREATE TABLE LIKE -- does that work?). Better yet, please use the alltypesorc test table (which has columns for each type), as well as a query where only some columns are projected, and a query where nested fields are explicitly projected by name.

I added a new table creation using all avro types.
But load data into Avro table from non-Avro doesn't work (REF:HIVE-5803).


> On Jan. 11, 2014, 8:05 a.m., Carl Steinbach wrote:
> > ql/src/test/queries/clientpositive/avro_without_schema.q, line 1
> > <https://reviews.apache.org/r/11925/diff/6/?file=420181#file420181line1>
> >
> >     Why have both avro_no_schema_test.q and avro_without_schema.q? Why does avro_without_schema.q create the table, but not load it with data or select from it?

remove one and put into one file with proper comments.


> On Jan. 11, 2014, 8:05 a.m., Carl Steinbach wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 53
> > <https://reviews.apache.org/r/11925/diff/6/?file=420187#file420187line53>
> >
> >     "Moreover, since any type could potentially hold a NULL value, all corresponding Avro schema should be a union of null and the type (UNION<T, NULL>).
> >     
> >     This statement is contradicted by the code which won't generate a record or array field with nullable elements.
> >     
> >     Can you please add a test to show what happens when a non-avro table with record and array columns containing NULL elements is streamed into an Avro table using CTAS?

Added test case.


- Mohammad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review31584
-----------------------------------------------------------


On Jan. 10, 2014, 7:58 p.m., Mohammad Islam wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11925/
> -----------------------------------------------------------
> 
> (Updated Jan. 10, 2014, 7:58 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-3159
>     https://issues.apache.org/jira/browse/HIVE-3159
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
> For multiple cases, it is very inconvenient for user.
> Some of the un-supported use cases:
> 1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
> 2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
> 3. Create  table  without specifying Avro schema.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11925/diff/
> 
> 
> Testing
> -------
> 
> Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Carl Steinbach <cw...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review31584
-----------------------------------------------------------



ql/src/test/queries/clientpositive/avro_create_as_select.q
<https://reviews.apache.org/r/11925/#comment60123>

    Please populate the source table with data before running a CTAS statement against it (the current example is functionally equivalent to CREATE TABLE LIKE -- does that work?). Better yet, please use the alltypesorc test table (which has columns for each type), as well as a query where only some columns are projected, and a query where nested fields are explicitly projected by name.



ql/src/test/queries/clientpositive/avro_without_schema.q
<https://reviews.apache.org/r/11925/#comment60124>

    Why have both avro_no_schema_test.q and avro_without_schema.q? Why does avro_without_schema.q create the table, but not load it with data or select from it?



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment60122>

    "Moreover, since any type could potentially hold a NULL value, all corresponding Avro schema should be a union of null and the type (UNION<T, NULL>).
    
    This statement is contradicted by the code which won't generate a record or array field with nullable elements.
    
    Can you please add a test to show what happens when a non-avro table with record and array columns containing NULL elements is streamed into an Avro table using CTAS?



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment60119>

    wrapWithUnion is not used.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment60120>

    wrapWithUnion is not used.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment60121>

    wrapWithUnion is not used.


- Carl Steinbach


On Jan. 10, 2014, 7:58 p.m., Mohammad Islam wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11925/
> -----------------------------------------------------------
> 
> (Updated Jan. 10, 2014, 7:58 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-3159
>     https://issues.apache.org/jira/browse/HIVE-3159
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
> For multiple cases, it is very inconvenient for user.
> Some of the un-supported use cases:
> 1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
> 2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
> 3. Create  table  without specifying Avro schema.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11925/diff/
> 
> 
> Testing
> -------
> 
> Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Carl Steinbach <cw...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review31583
-----------------------------------------------------------



ql/src/test/queries/clientpositive/avro_create_as_select.q
<https://reviews.apache.org/r/11925/#comment60098>

    Please maintain reasonable limits on line length (80 or 100 characters works for me).



ql/src/test/queries/clientpositive/avro_create_as_select.q
<https://reviews.apache.org/r/11925/#comment60099>

    Please put each clause on its own line:
    CREATE TABLE xxx
    ROW FORMAT SERDE '....'
    STORED AS 
      INPUTFORMAT '....'
      OUTPUTFORMAT '....'
    AS
      SELECT ......
    



ql/src/test/queries/clientpositive/avro_create_as_select2.q
<https://reviews.apache.org/r/11925/#comment60100>

    Line length



ql/src/test/queries/clientpositive/avro_no_schema_test.q
<https://reviews.apache.org/r/11925/#comment60101>

    Line length.



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment60102>

    Formatting



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment60103>

    Formatting



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment60104>

    Formatting



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment60105>

    s/Defintion/Definition/



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment60106>

    Please throw the exception here instead of returning null. This method should never return null.



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment60107>

    What's the significance of "org.apache.hive.auto_gen_schema"? I can't find any other references to this namespace in the code.



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment60115>

    table or partition properties, or both?



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment60108>

    Formatting



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment60109>

    Missing ASF license header.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment60110>

    To me "TypeInfoToSchema" sounds like the name of method, not a class. Please either change the name to AvroSchemaUtils or merge this code into AvroSerdeUtils.
    
    Also, there is an existing Hive class named "Schema" (org.apache.hadoop.hive.metastore.api.Schema). In order to avoid confusion please qualify "schema" with "avro" in the method names, e.g. generateAvroSchema, generateAvroUnionSchema, etc...



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment60118>

    map<string, any-type>



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment60113>

    Why Hashtable instead of HashMap?



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment60111>

    People who don't already know how Avro encodes nullable values will find this method easier to understand if the parameter name "wrapWithUnion" is changed to "isNullable".



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment60116>

    Instead of adding assertions to each of these methods I think it would be cleaner to specify the expected type in the parameter list and force the caller to do the cast before invoking the method.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment60114>

    The code snippet "tag + "_" + tInfo.getCategory().name()" gets repeated a lot. I think this should be moved to a helper method.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment60112>

    Is it possible to change the LHS to List?



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment60117>

    Formatting.



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment60096>

    Missing ASF license header.



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment60097>

    Formatting.


- Carl Steinbach


On Jan. 10, 2014, 7:58 p.m., Mohammad Islam wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11925/
> -----------------------------------------------------------
> 
> (Updated Jan. 10, 2014, 7:58 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-3159
>     https://issues.apache.org/jira/browse/HIVE-3159
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
> For multiple cases, it is very inconvenient for user.
> Some of the un-supported use cases:
> 1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
> 2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
> 3. Create  table  without specifying Avro schema.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11925/diff/
> 
> 
> Testing
> -------
> 
> Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Mohammad Islam <mi...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/
-----------------------------------------------------------

(Updated Jan. 10, 2014, 7:58 p.m.)


Review request for hive, Ashutosh Chauhan and Jakob Homan.


Changes
-------

Incorporating the reviewers' comments,


Bugs: HIVE-3159
    https://issues.apache.org/jira/browse/HIVE-3159


Repository: hive-git


Description
-------

Problem:
Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
For multiple cases, it is very inconvenient for user.
Some of the un-supported use cases:
1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
3. Create  table  without specifying Avro schema.


Diffs (updated)
-----

  ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 9d58d13 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 

Diff: https://reviews.apache.org/r/11925/diff/


Testing
-------

Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.


Thanks,

Mohammad Islam


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Mohammad Islam <mi...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/
-----------------------------------------------------------

(Updated Aug. 8, 2013, 12:24 a.m.)


Review request for hive, Ashutosh Chauhan and Jakob Homan.


Changes
-------

Adding support for binary in type info to schema generation.


Bugs: HIVE-3159
    https://issues.apache.org/jira/browse/HIVE-3159


Repository: hive-git


Description
-------

Problem:
Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
For multiple cases, it is very inconvenient for user.
Some of the un-supported use cases:
1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
3. Create  table  without specifying Avro schema.


Diffs (updated)
-----

  ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 

Diff: https://reviews.apache.org/r/11925/diff/


Testing
-------

Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.


Thanks,

Mohammad Islam


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Mohammad Islam <mi...@yahoo.com>.

> On July 29, 2013, 5:41 p.m., Jakob Homan wrote:
> > There is still no text covering a map-reduce job on an already existing, non-Avro table into an avro table.  ie, create a text table, populate it, run a CTAS to manipulate the data into an Avro table.

In general, Hive creates "internal" column names such as col0, col1 etc. Due to this, I didn't able to copy non-avro data to avro data and run select SQL. Only option is to change the current behavior to reuse the provided column names. Separate JIRA regarding this could be a choice.


> On July 29, 2013, 5:41 p.m., Jakob Homan wrote:
> > ql/src/test/queries/clientpositive/avro_create_as_select.q, line 3
> > <https://reviews.apache.org/r/11925/diff/4/?file=325386#file325386line3>
> >
> >     This is testing that one can copy data into an already existing table, but doesn't verify that the already existing, non-avro data is converted correctly.

same as above.


- Mohammad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review24149
-----------------------------------------------------------


On July 23, 2013, 9:51 a.m., Mohammad Islam wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11925/
> -----------------------------------------------------------
> 
> (Updated July 23, 2013, 9:51 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-3159
>     https://issues.apache.org/jira/browse/HIVE-3159
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
> For multiple cases, it is very inconvenient for user.
> Some of the un-supported use cases:
> 1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
> 2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
> 3. Create  table  without specifying Avro schema.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11925/diff/
> 
> 
> Testing
> -------
> 
> Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Mohammad Islam <mi...@yahoo.com>.

> On July 29, 2013, 5:41 p.m., Jakob Homan wrote:
> > There is still no text covering a map-reduce job on an already existing, non-Avro table into an avro table.  ie, create a text table, populate it, run a CTAS to manipulate the data into an Avro table.
> 
> Mohammad Islam wrote:
>     In general, Hive creates "internal" column names such as col0, col1 etc. Due to this, I didn't able to copy non-avro data to avro data and run select SQL. Only option is to change the current behavior to reuse the provided column names. Separate JIRA regarding this could be a choice.
>     
>
> 
> Jakob Homan wrote:
>     Wouldn't select * or using the new column names (they're named deterministically) work?  This is a major test since otherwise we're missing the most important code path...
>     ie
>     have a text file c1, c2, c3
>     create table t1
>     load data into t1 from text file
>     create table a1 as select c3, c2 where c2 = "foo" order by c3;
>     select * from a1;
>     describe extended a1;
>     
>     And verify in the q file's result that the table is avro and that the correct rows and columns got converted.

I agree that we need to support this use case as well.
Current patch works on two other uses case.

Created a separate JIRA to handle this use case that I suspect is a slightly different issue.
New JIRA: https://issues.apache.org/jira/browse/HIVE-5803


- Mohammad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review24149
-----------------------------------------------------------


On Aug. 8, 2013, 12:24 a.m., Mohammad Islam wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11925/
> -----------------------------------------------------------
> 
> (Updated Aug. 8, 2013, 12:24 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-3159
>     https://issues.apache.org/jira/browse/HIVE-3159
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
> For multiple cases, it is very inconvenient for user.
> Some of the un-supported use cases:
> 1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
> 2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
> 3. Create  table  without specifying Avro schema.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11925/diff/
> 
> 
> Testing
> -------
> 
> Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Jakob Homan <jg...@apache.org>.

> On July 29, 2013, 10:41 a.m., Jakob Homan wrote:
> > There is still no text covering a map-reduce job on an already existing, non-Avro table into an avro table.  ie, create a text table, populate it, run a CTAS to manipulate the data into an Avro table.
> 
> Mohammad Islam wrote:
>     In general, Hive creates "internal" column names such as col0, col1 etc. Due to this, I didn't able to copy non-avro data to avro data and run select SQL. Only option is to change the current behavior to reuse the provided column names. Separate JIRA regarding this could be a choice.
>     
>

Wouldn't select * or using the new column names (they're named deterministically) work?  This is a major test since otherwise we're missing the most important code path...
ie
have a text file c1, c2, c3
create table t1
load data into t1 from text file
create table a1 as select c3, c2 where c2 = "foo" order by c3;
select * from a1;
describe extended a1;

And verify in the q file's result that the table is avro and that the correct rows and columns got converted. 


- Jakob


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review24149
-----------------------------------------------------------


On Aug. 7, 2013, 5:24 p.m., Mohammad Islam wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11925/
> -----------------------------------------------------------
> 
> (Updated Aug. 7, 2013, 5:24 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-3159
>     https://issues.apache.org/jira/browse/HIVE-3159
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
> For multiple cases, it is very inconvenient for user.
> Some of the un-supported use cases:
> 1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
> 2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
> 3. Create  table  without specifying Avro schema.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11925/diff/
> 
> 
> Testing
> -------
> 
> Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Jakob Homan <jg...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review24149
-----------------------------------------------------------


There is still no text covering a map-reduce job on an already existing, non-Avro table into an avro table.  ie, create a text table, populate it, run a CTAS to manipulate the data into an Avro table.


ql/src/test/queries/clientpositive/avro_create_as_select.q
<https://reviews.apache.org/r/11925/#comment47977>

    This is testing that one can copy data into an already existing table, but doesn't verify that the already existing, non-avro data is converted correctly.


- Jakob Homan


On July 23, 2013, 2:51 a.m., Mohammad Islam wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11925/
> -----------------------------------------------------------
> 
> (Updated July 23, 2013, 2:51 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-3159
>     https://issues.apache.org/jira/browse/HIVE-3159
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
> For multiple cases, it is very inconvenient for user.
> Some of the un-supported use cases:
> 1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
> 2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
> 3. Create  table  without specifying Avro schema.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11925/diff/
> 
> 
> Testing
> -------
> 
> Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Mohammad Islam <mi...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/
-----------------------------------------------------------

(Updated July 23, 2013, 9:51 a.m.)


Review request for hive, Ashutosh Chauhan and Jakob Homan.


Changes
-------

Updated with Jacob feedbacks.


Bugs: HIVE-3159
    https://issues.apache.org/jira/browse/HIVE-3159


Repository: hive-git


Description
-------

Problem:
Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
For multiple cases, it is very inconvenient for user.
Some of the un-supported use cases:
1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
3. Create  table  without specifying Avro schema.


Diffs (updated)
-----

  ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 

Diff: https://reviews.apache.org/r/11925/diff/


Testing
-------

Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.


Thanks,

Mohammad Islam


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Mohammad Islam <mi...@yahoo.com>.

> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java, line 67
> > <https://reviews.apache.org/r/11925/diff/3/?file=320914#file320914line67>
> >
> >     Should be a debug

Done


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java, line 77
> > <https://reviews.apache.org/r/11925/diff/3/?file=320914#file320914line77>
> >
> >     spacing

Done


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java, line 125
> > <https://reviews.apache.org/r/11925/diff/3/?file=320914#file320914line125>
> >
> >     Use j here instead, so as not to shadow the i.

Not clear. What is it shadowing? There is no other 'i'.


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 36
> > <https://reviews.apache.org/r/11925/diff/3/?file=320915#file320915line36>
> >
> >     Some spacing on the table would help.

Done


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 77
> > <https://reviews.apache.org/r/11925/diff/3/?file=320915#file320915line77>
> >
> >     formatting

Done


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 93
> > <https://reviews.apache.org/r/11925/diff/3/?file=320915#file320915line93>
> >
> >     fot -> for.  type info -> TypeInfo

Done


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 98
> > <https://reviews.apache.org/r/11925/diff/3/?file=320915#file320915line98>
> >
> >     Is there any way to turn off the union wrapping?  If not, does it require its own separate parameter?

'wrapUnion' flag is needed to avoid the duplicated "Union". Union schema does not required to wrap with union. I think it is an exception, if it does.


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 100
> > <https://reviews.apache.org/r/11925/diff/3/?file=320915#file320915line100>
> >
> >     formatting

Done


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 109
> > <https://reviews.apache.org/r/11925/diff/3/?file=320915#file320915line109>
> >
> >     Move this above the first function so it's clear to readers.

Done.


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 111
> > <https://reviews.apache.org/r/11925/diff/3/?file=320915#file320915line111>
> >
> >     Yeah, do that.

Done


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 125
> > <https://reviews.apache.org/r/11925/diff/3/?file=320915#file320915line125>
> >
> >     Doesn't this duplicate generateSchema above?

Removed


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 211
> > <https://reviews.apache.org/r/11925/diff/3/?file=320915#file320915line211>
> >
> >     can this be private?

It is used by Test case.


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java, line 56
> > <https://reviews.apache.org/r/11925/diff/3/?file=320917#file320917line56>
> >
> >     delete

Done


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java, line 121
> > <https://reviews.apache.org/r/11925/diff/3/?file=320917#file320917line121>
> >
> >     delete

Done


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java, line 127
> > <https://reviews.apache.org/r/11925/diff/3/?file=320917#file320917line127>
> >
> >     odd name...

Done


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java, line 26
> > <https://reviews.apache.org/r/11925/diff/3/?file=320917#file320917line26>
> >
> >     Either add msg string or break out into separate tests.

Done. Added msg.


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java, line 244
> > <https://reviews.apache.org/r/11925/diff/3/?file=320916#file320916line244>
> >
> >     assert on the content of the exception, or subtype it and use the expects annotation.

Done. content checked .


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java, line 220
> > <https://reviews.apache.org/r/11925/diff/3/?file=320916#file320916line220>
> >
> >     nit: can this be made more readable?

Done


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java, line 236
> > <https://reviews.apache.org/r/11925/diff/3/?file=320916#file320916line236>
> >
> >     Please separate out all the test cases into individual tests.

Done


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 189
> > <https://reviews.apache.org/r/11925/diff/3/?file=320915#file320915line189>
> >
> >     Hive TypeInfo allows for non-string keys, while Avro does not.  There should be a check here for that.

Done


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java, line 116
> > <https://reviews.apache.org/r/11925/diff/3/?file=320914#file320914line116>
> >
> >     Rather than building the avro schema string by hand and then generating the schema object, why not generate the avro schema and then run toString on it?  This guarantees the schema will be correctly formed and type checked.

Done


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java, line 105
> > <https://reviews.apache.org/r/11925/diff/3/?file=320914#file320914line105>
> >
> >     We lose any column comments that may have been on the original schema.  Does that normally happen with CTAS in Hive?

Normal CTAS also does not copy the comments.


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java, line 229
> > <https://reviews.apache.org/r/11925/diff/3/?file=320916#file320916line229>
> >
> >     Rather than hand-coding these, can we have Hive generate them?  This will catch any changes if Hive changes how it generates them later on.

Done


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > ql/src/test/queries/clientpositive/avro_without_schema.q, line 26
> > <https://reviews.apache.org/r/11925/diff/3/?file=320909#file320909line26>
> >
> >     I don't see a test that exercises the generated schema through whole mapreduce query, just ones that exercise the metadata store...

This test has the "group by".


> On July 12, 2013, 10:32 p.m., Jakob Homan wrote:
> > ql/src/test/queries/clientpositive/avro_create_as_select.q, line 3
> > <https://reviews.apache.org/r/11925/diff/3/?file=320906#file320906line3>
> >
> >     This test doesn't actually work on a non-avro table.  It just works on the definition of a non-avro table.  We should have another one that does a select from a populated, non-avro-backed table and verify the values are converted corrected.

There is a hive problem of creating internal column names such as  col0, col1 that creates CTAS command to fail. This one needs to be resolved in a separate JIRA.


- Mohammad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review23102
-----------------------------------------------------------


On July 12, 2013, 6:49 p.m., Mohammad Islam wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11925/
> -----------------------------------------------------------
> 
> (Updated July 12, 2013, 6:49 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-3159
>     https://issues.apache.org/jira/browse/HIVE-3159
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
> For multiple cases, it is very inconvenient for user.
> Some of the un-supported use cases:
> 1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
> 2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
> 3. Create  table  without specifying Avro schema.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11925/diff/
> 
> 
> Testing
> -------
> 
> Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Jakob Homan <jg...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review23102
-----------------------------------------------------------


Overall, looks good.  I'm concerned that there's no end-to-end test of having some non-Avro data in Hive, using Hive to write (and join it) to an Avro file, and verifying the content of the actual Avro file.  Would something like that be feasible?


ql/src/test/queries/clientpositive/avro_create_as_select.q
<https://reviews.apache.org/r/11925/#comment46910>

    This test doesn't actually work on a non-avro table.  It just works on the definition of a non-avro table.  We should have another one that does a select from a populated, non-avro-backed table and verify the values are converted corrected.



ql/src/test/queries/clientpositive/avro_without_schema.q
<https://reviews.apache.org/r/11925/#comment46911>

    I don't see a test that exercises the generated schema through whole mapreduce query, just ones that exercise the metadata store...



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46912>

    Should be a debug



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46913>

    spacing



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46916>

    We lose any column comments that may have been on the original schema.  Does that normally happen with CTAS in Hive?



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46921>

    Rather than building the avro schema string by hand and then generating the schema object, why not generate the avro schema and then run toString on it?  This guarantees the schema will be correctly formed and type checked.



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46919>

    Use j here instead, so as not to shadow the i.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46922>

    Some spacing on the table would help.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46924>

    formatting



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46925>

    fot -> for.  type info -> TypeInfo



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46942>

    Is there any way to turn off the union wrapping?  If not, does it require its own separate parameter?



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46927>

    formatting



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46923>

    Move this above the first function so it's clear to readers.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46933>

    Yeah, do that.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46929>

    Doesn't this duplicate generateSchema above?



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46930>

    Hive TypeInfo allows for non-string keys, while Avro does not.  There should be a check here for that.



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46931>

    can this be private?



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46934>

    nit: can this be made more readable?



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46936>

    Rather than hand-coding these, can we have Hive generate them?  This will catch any changes if Hive changes how it generates them later on.



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46938>

    Please separate out all the test cases into individual tests.



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46939>

    assert on the content of the exception, or subtype it and use the expects annotation.



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46943>

    Either add msg string or break out into separate tests.



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46944>

    delete



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46946>

    delete



serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46947>

    odd name...


- Jakob Homan


On July 12, 2013, 11:49 a.m., Mohammad Islam wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11925/
> -----------------------------------------------------------
> 
> (Updated July 12, 2013, 11:49 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-3159
>     https://issues.apache.org/jira/browse/HIVE-3159
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
> For multiple cases, it is very inconvenient for user.
> Some of the un-supported use cases:
> 1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
> 2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
> 3. Create  table  without specifying Avro schema.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11925/diff/
> 
> 
> Testing
> -------
> 
> Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Mohammad Islam <mi...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/
-----------------------------------------------------------

(Updated July 12, 2013, 6:49 p.m.)


Review request for hive, Ashutosh Chauhan and Jakob Homan.


Changes
-------

Updated with Ashutosh's comments.


Bugs: HIVE-3159
    https://issues.apache.org/jira/browse/HIVE-3159


Repository: hive-git


Description
-------

Problem:
Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
For multiple cases, it is very inconvenient for user.
Some of the un-supported use cases:
1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
3. Create  table  without specifying Avro schema.


Diffs (updated)
-----

  ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 

Diff: https://reviews.apache.org/r/11925/diff/


Testing
-------

Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.


Thanks,

Mohammad Islam


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Mohammad Islam <mi...@yahoo.com>.

> On June 30, 2013, 2:43 a.m., Ashutosh Chauhan wrote:
> > Can you also run all new tests with ant test -Dhadoop.mr.rev=23 to make sure we are getting right results. Else, you might need to add more columns in order-by columns.

Tested


> On June 30, 2013, 2:43 a.m., Ashutosh Chauhan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java, line 101
> > <https://reviews.apache.org/r/11925/diff/2/?file=307412#file307412line101>
> >
> >     Any particular reason you made this synchronized ?

Removed.


> On June 30, 2013, 2:43 a.m., Ashutosh Chauhan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java, line 103
> > <https://reviews.apache.org/r/11925/diff/2/?file=307412#file307412line103>
> >
> >     Have you tested this for both default db as well as non-default db?

Test case added.


> On June 30, 2013, 2:43 a.m., Ashutosh Chauhan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java, line 121
> > <https://reviews.apache.org/r/11925/diff/2/?file=307412#file307412line121>
> >
> >     Instead of \n, can you use File.Seperator?

Removed.


> On June 30, 2013, 2:43 a.m., Ashutosh Chauhan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 43
> > <https://reviews.apache.org/r/11925/diff/2/?file=307413#file307413line43>
> >
> >     Is this meant to be Array[tinyint] => bytes?

Done


> On June 30, 2013, 2:43 a.m., Ashutosh Chauhan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java, line 110
> > <https://reviews.apache.org/r/11925/diff/2/?file=307413#file307413line110>
> >
> >     Lets take care of this TODO. Should be straight fwd.

Done


- Mohammad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review22571
-----------------------------------------------------------


On June 18, 2013, 3:26 a.m., Mohammad Islam wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11925/
> -----------------------------------------------------------
> 
> (Updated June 18, 2013, 3:26 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-3159
>     https://issues.apache.org/jira/browse/HIVE-3159
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
> For multiple cases, it is very inconvenient for user.
> Some of the un-supported use cases:
> 1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
> 2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
> 3. Create  table  without specifying Avro schema.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11925/diff/
> 
> 
> Testing
> -------
> 
> Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Jakob Homan <jg...@apache.org>.

> On June 29, 2013, 7:43 p.m., Ashutosh Chauhan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java, line 70
> > <https://reviews.apache.org/r/11925/diff/2/?file=307412#file307412line70>
> >
> >     I think determining schema from table definition should be default. There are multiple of determining schema. I think order should be:
> >     a) Try table definition.
> >     b) Try schema literal in properties.
> >     c) Try from hdfs.
> >     d) Try from url.

This is a big change.  Avro tables have always been defined via a property.  This change is to support a small use case; why switch the entire order?


- Jakob


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review22571
-----------------------------------------------------------


On July 23, 2013, 2:51 a.m., Mohammad Islam wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11925/
> -----------------------------------------------------------
> 
> (Updated July 23, 2013, 2:51 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-3159
>     https://issues.apache.org/jira/browse/HIVE-3159
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
> For multiple cases, it is very inconvenient for user.
> Some of the un-supported use cases:
> 1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
> 2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
> 3. Create  table  without specifying Avro schema.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11925/diff/
> 
> 
> Testing
> -------
> 
> Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>


Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review22571
-----------------------------------------------------------


Can you also run all new tests with ant test -Dhadoop.mr.rev=23 to make sure we are getting right results. Else, you might need to add more columns in order-by columns.


serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46291>

    I think determining schema from table definition should be default. There are multiple of determining schema. I think order should be:
    a) Try table definition.
    b) Try schema literal in properties.
    c) Try from hdfs.
    d) Try from url. 



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46292>

    Any particular reason you made this synchronized ?



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46293>

    Have you tested this for both default db as well as non-default db?



serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46294>

    Instead of \n, can you use File.Seperator?



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46296>

    Is this meant to be Array[tinyint] => bytes?



serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46295>

    Lets take care of this TODO. Should be straight fwd.


- Ashutosh Chauhan


On June 18, 2013, 3:26 a.m., Mohammad Islam wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11925/
> -----------------------------------------------------------
> 
> (Updated June 18, 2013, 3:26 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-3159
>     https://issues.apache.org/jira/browse/HIVE-3159
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name.
> For multiple cases, it is very inconvenient for user.
> Some of the un-supported use cases:
> 1. Create table ... <Avro-SERDE etc.> as SELECT ... from <NON-AVRO FILE>
> 2. Create table ... <Avro-SERDE etc.> as SELECT from <AVRO TABLE>
> 3. Create  table  without specifying Avro schema.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11925/diff/
> 
> 
> Testing
> -------
> 
> Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases.
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>