You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Dennis Suhari <De...@ilab.nordlb.de> on 2020/10/31 10:57:02 UTC

Hive Avro: Directly use of embedded Avro Scheme

Hello Support,  currently I have created the following AVRO Hive table which works fine.  CREATE EXTERNAL TABLE blahblah.blublub
STORED AS AVRO LOCATION "/***/in" TBLPROPERTIES ('avro.schema.url‘=‚/.../schema/blublub.avsc')  As you can see I need to use the schema 'avro.schema.url' property which points to the AVRO schema blublub.avsc. This blublub.avsc I simply extract from the AVRO files.   How is it possible to work without 'avro.schema.url' and directly use the Avro scheme which is actually already delivered within Avro itself (that the strength from Avro) ? I want to have all columns that are included in the Avro specification, but without having them in the create statement.  Br,  Dennis



Re: Hive Avro: Directly use of embedded Avro Scheme

Posted by David <da...@gmail.com>.
Hey Dennis,

Specifying the schema url is simply a convenience tool so you can have a
single schema defined instead of having a SQL schema (CREATE TABLE) and a
separate Avro schema file which reduces maintenance overhead and prevents a
situation where the two could potentially fall out of sync.

Thanks.

On Sat, Oct 31, 2020, 12:00 PM Dennis Suhari <De...@ilab.nordlb.de>
wrote:

> Understood. So to hold the schema stable you should have an external
> reference to an avrc url (eg registry) which can evolve. And checking new
> Avro against registry is made easy because avrc is embedded. And if changed
> you can easily create a new version.
> Is this the idea ?
>
> Br,
> Dennis
> ------------------------------
> *Von:* David <da...@gmail.com>
> *Gesendet:* Samstag, 31. Oktober 2020 14:52:04
> *An:* user@hive.apache.org
> *Betreff:* Re: Hive Avro: Directly use of embedded Avro Scheme
>
> What would your expectation be?  That Hive reads the first file it finds
> and uses that schema in the table definition?
>
> What if the table is empty and a user attempts an INSERT?  What should be
> the behavior?
>
> The real power of Avro is not so much that the schema can exist
> (optionally) in the file itself but that the schema can mutate over time.
> In such cases the table can be ALTERED, for example to add a new column,
> and the existing schema will still work.
>
> Thanks.
>
> On Sat, Oct 31, 2020, 6:57 AM Dennis Suhari <De...@ilab.nordlb.de>
> wrote:
>
>> Hello Support,  currently I have created the following AVRO Hive table
>> which works fine.  CREATE EXTERNAL TABLE blahblah.blublub
>> STORED AS AVRO LOCATION "/***/in" TBLPROPERTIES
>> ('avro.schema.url‘=‚/.../schema/blublub.avsc')  As you can see I need to
>> use the schema 'avro.schema.url' property which points to the AVRO schema
>> blublub.avsc. This blublub.avsc I simply extract from the AVRO files.   How
>> is it possible to work without 'avro.schema.url' and directly use the Avro
>> scheme which is actually already delivered within Avro itself (that the
>> strength from Avro) ? I want to have all columns that are included in the
>> Avro specification, but without having them in the create statement.  Br,
>> Dennis
>>
>>
>>

AW: Hive Avro: Directly use of embedded Avro Scheme

Posted by Dennis Suhari <De...@ilab.nordlb.de>.
Understood. So to hold the schema stable you should have an external reference to an avrc url (eg registry) which can evolve. And checking new Avro against registry is made easy because avrc is embedded. And if changed you can easily create a new version.
Is this the idea ?

Br,
Dennis
________________________________
Von: David <da...@gmail.com>
Gesendet: Samstag, 31. Oktober 2020 14:52:04
An: user@hive.apache.org
Betreff: Re: Hive Avro: Directly use of embedded Avro Scheme

What would your expectation be?  That Hive reads the first file it finds and uses that schema in the table definition?

What if the table is empty and a user attempts an INSERT?  What should be the behavior?

The real power of Avro is not so much that the schema can exist (optionally) in the file itself but that the schema can mutate over time.  In such cases the table can be ALTERED, for example to add a new column, and the existing schema will still work.

Thanks.

On Sat, Oct 31, 2020, 6:57 AM Dennis Suhari <De...@ilab.nordlb.de>> wrote:
Hello Support,  currently I have created the following AVRO Hive table which works fine.  CREATE EXTERNAL TABLE blahblah.blublub
STORED AS AVRO LOCATION "/***/in" TBLPROPERTIES ('avro.schema.url‘=‚/.../schema/blublub.avsc')  As you can see I need to use the schema 'avro.schema.url' property which points to the AVRO schema blublub.avsc. This blublub.avsc I simply extract from the AVRO files.   How is it possible to work without 'avro.schema.url' and directly use the Avro scheme which is actually already delivered within Avro itself (that the strength from Avro) ? I want to have all columns that are included in the Avro specification, but without having them in the create statement.  Br,  Dennis



Re: Hive Avro: Directly use of embedded Avro Scheme

Posted by David <da...@gmail.com>.
What would your expectation be?  That Hive reads the first file it finds
and uses that schema in the table definition?

What if the table is empty and a user attempts an INSERT?  What should be
the behavior?

The real power of Avro is not so much that the schema can exist
(optionally) in the file itself but that the schema can mutate over time.
In such cases the table can be ALTERED, for example to add a new column,
and the existing schema will still work.

Thanks.

On Sat, Oct 31, 2020, 6:57 AM Dennis Suhari <De...@ilab.nordlb.de>
wrote:

> Hello Support,  currently I have created the following AVRO Hive table
> which works fine.  CREATE EXTERNAL TABLE blahblah.blublub
> STORED AS AVRO LOCATION "/***/in" TBLPROPERTIES
> ('avro.schema.url‘=‚/.../schema/blublub.avsc')  As you can see I need to
> use the schema 'avro.schema.url' property which points to the AVRO schema
> blublub.avsc. This blublub.avsc I simply extract from the AVRO files.   How
> is it possible to work without 'avro.schema.url' and directly use the Avro
> scheme which is actually already delivered within Avro itself (that the
> strength from Avro) ? I want to have all columns that are included in the
> Avro specification, but without having them in the create statement.  Br,
> Dennis
>
>
>