You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Yosi Botzer <yo...@gmail.com> on 2015/04/29 17:49:39 UTC

creating parquet table using avro schame

Hi,

I have parquet files that are the product of map-reduce job.

I have used AvroParquetOutputFormat in order to produce them, so I have an
avro schema file describing the structure of the data.

When I wan to create avro based table in hive I can use:
TBLPROPERTIES
('avro.schema.url'='hdfs:///schema/report/dashboard_report.avsc');

So I do not to specify every field in the create statement.

Is there a way to use the avro schema file to create the parquet table as
well?



Yosi

Re: creating parquet table using avro schame

Posted by Daniel Haviv <da...@veracity-group.com>.
Sorry, I misunderstood.
AFAIK you can't do that.

Daniel

> On 29 באפר׳ 2015, at 18:49, Yosi Botzer <yo...@gmail.com> wrote:
> 
> Hi,
> 
> I have parquet files that are the product of map-reduce job.
> 
> I have used AvroParquetOutputFormat in order to produce them, so I have an avro schema file describing the structure of the data.
> 
> When I wan to create avro based table in hive I can use:
> TBLPROPERTIES ('avro.schema.url'='hdfs:///schema/report/dashboard_report.avsc');
> 
> So I do not to specify every field in the create statement.
> 
> Is there a way to use the avro schema file to create the parquet table as well?
> 
> 
> 
> Yosi

Re: creating parquet table using avro schame

Posted by Daniel Haviv <da...@veracity-group.com>.
You should be able to get the schema out using parquet tools:
http://blog.cloudera.com/blog/2015/03/converting-apache-avro-data-to-parquet-format-in-apache-hadoop/

Daniel

> On 29 באפר׳ 2015, at 18:49, Yosi Botzer <yo...@gmail.com> wrote:
> 
> Hi,
> 
> I have parquet files that are the product of map-reduce job.
> 
> I have used AvroParquetOutputFormat in order to produce them, so I have an avro schema file describing the structure of the data.
> 
> When I wan to create avro based table in hive I can use:
> TBLPROPERTIES ('avro.schema.url'='hdfs:///schema/report/dashboard_report.avsc');
> 
> So I do not to specify every field in the create statement.
> 
> Is there a way to use the avro schema file to create the parquet table as well?
> 
> 
> 
> Yosi