You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Alexander Pivovarov <ap...@gmail.com> on 2015/12/03 06:53:29 UTC

Create table from ORC or Parquet file?

Hi Everyone

Is it possible to create Hive table from ORC or Parquet file without
specifying field names and their types. ORC or Parquet files contain field
name and type information inside.

Alex

Re: Create table from ORC or Parquet file?

Posted by Alexander Pivovarov <ap...@gmail.com>.
E.g. in Spark SQL I can create temporary table from ORC, Parquet or json
files without specifying column names and types

val myDf = sqlContext.read.format("orc").load("s3n://alex/test/mytable_orc")

myDf.printSchema
root
 |-- id: string (nullable = true)
 |-- name: string (nullable = true)
 |-- rc_state: string (nullable = true)
 |-- rc_county_name: string (nullable = true)

myDf.registerTempTable("mytable")
val res = sqlContext.sql("""
  select rc_state, count(*) cnt
  from mytable
  group by rc_state
  order by rc_state""")

res.show(10)
+--------+---+
|rc_state|cnt|
+--------+---+
|      AK| 37|
|      AL|224|
|      AR|109|
|      AZ| 81|
|      CA|417|
|      CO|145|
|      CT| 71|
|      DC| 15|
|      DE| 27|
|      FL|452|
+--------+---+
only showing top 10 rows

Lots of companies switch to Spark for ETL. But Hive is still used by many
people, reporting tools or legacy solutions to select data from files
(tables) prepared by Spark.
It would be nice if Hive can create table based on ORC or Parquet file(s)
without specifying table columns and types. Integration with Spark output
will be easier.


On Wed, Dec 9, 2015 at 9:50 AM, Owen O'Malley <om...@apache.org> wrote:

> So your use case is that you already have the ORC files and you want a
> table that can read those files without specifying the columns in the
> table? Obviously without the columns being specified Hive wouldn't be able
> to write to that table, so I assume you only care about reading it. Is that
> right?
>
> .. Owen
>
> On Wed, Dec 2, 2015 at 9:53 PM, Alexander Pivovarov <ap...@gmail.com>
> wrote:
>
>> Hi Everyone
>>
>> Is it possible to create Hive table from ORC or Parquet file without
>> specifying field names and their types. ORC or Parquet files contain field
>> name and type information inside.
>>
>> Alex
>>
>
>

Re: Create table from ORC or Parquet file?

Posted by Owen O'Malley <om...@apache.org>.
So your use case is that you already have the ORC files and you want a
table that can read those files without specifying the columns in the
table? Obviously without the columns being specified Hive wouldn't be able
to write to that table, so I assume you only care about reading it. Is that
right?

.. Owen

On Wed, Dec 2, 2015 at 9:53 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> Hi Everyone
>
> Is it possible to create Hive table from ORC or Parquet file without
> specifying field names and their types. ORC or Parquet files contain field
> name and type information inside.
>
> Alex
>

Re: Create table from ORC or Parquet file?

Posted by Divya Gehlot <di...@gmail.com>.
Hi Stephen,
Can you share example of how are you doing it ?
Would really appreciate your help.
As I am also stuck in this scenario .

Thanks,
Divya
On Dec 8, 2015 11:17 PM, "Stephen Bly" <st...@gmail.com> wrote:

> I am working on a similar problem — creating a Hive table from Parquet
> data and using the embedded schema to determine the columns. You will have
> to create your own SerDe and InputFormat I believe (that’s what I’m doing).

Re: Create table from ORC or Parquet file?

Posted by Stephen Bly <st...@gmail.com>.
I am working on a similar problem — creating a Hive table from Parquet data and using the embedded schema to determine the columns. You will have to create your own SerDe and InputFormat I believe (that’s what I’m doing).

RE: Create table from ORC or Parquet file?

Posted by Link Qian <fa...@outlook.com>.
Alex,

It's no possible create ORC format table without columns. as well as Parquet hence parquet store visualization is like JSON.
but it does not need to specify columns for creating a AVRO table, instead of specify a AVSC file to define table columns.

Thanks,
Link Qian

From: apivovarov@gmail.com
Date: Wed, 2 Dec 2015 21:53:29 -0800
Subject: Create table from ORC or Parquet file?
To: user@hive.apache.org

Hi Everyone
Is it possible to create Hive table from ORC or Parquet file without specifying field names and their types. ORC or Parquet files contain field name and type information inside.
Alex