You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Miljan Markovic <mi...@yoterra.com> on 2013/11/07 12:40:47 UTC

Schema less hive table. Is it possible?

Hello.

I have a set of complex structured objects that I would like to put into 
hive and use it's sql to query data from those objects. Now, since they 
are complex structures, defining a schema for every single field is a 
hard job.

Instead I'm thinking of making a custom SerDe with custom 
ObjectInspector that uses something like ONGL to get the values of 
object fields and convert them to hive's data types. So when writing a 
query it would be something like:

    select <ongl_expression1>,<ongl_expression2>... from etc...

If expressions can be transfered verbatim to ObjectInspector without 
hive checking for their validity as column names, ObjectInspector itself 
would know what to do with them. This doesn't need any explicit schema 
as far as SerDe and ObjectInspector are concerned. But can hive cope 
with that? Is this possible to do?


Re: Schema less hive table. Is it possible?

Posted by Nitin Pawar <ni...@gmail.com>.
This is my understanding i  may be wrong so wait for others to reply as
well and correct my stupid understanding

In hive without creating a table, you can not access the data
to create the table you will need atleast one column
does it qualify to be called as a schema .. yes

so in short schemaless table is not possible.
at max what you can do is create a table with a single column and string
type.

Then write a custom udf which parses the string and returns the data you
are looking for.
the problem with this is, you can not use any column compression storage
file types and you will need to read up all the data for each record when u
query for it plus always a subquery to do more granular access.


On other hand, creating table schema is one time job so its worth the
effort that the data validation is offloaded to hive.

In past I have created tables where data was in json format with 8 nested
documents inside a single column.


On Thu, Nov 7, 2013 at 5:10 PM, Miljan Markovic <miljan.markovic@yoterra.com
> wrote:

>  Hello.
>
> I have a set of complex structured objects that I would like to put into
> hive and use it's sql to query data from those objects. Now, since they are
> complex structures, defining a schema for every single field is a hard job.
>
> Instead I'm thinking of making a custom SerDe with custom ObjectInspector
> that uses something like ONGL to get the values of object fields and
> convert them to hive's data types. So when writing a query it would be
> something like:
>
> select <ongl_expression1>,<ongl_expression2>... from etc...
>
> If expressions can be transfered verbatim to ObjectInspector without hive
> checking for their validity as column names, ObjectInspector itself would
> know what to do with them. This doesn't need any explicit schema as far as
> SerDe and ObjectInspector are concerned. But can hive cope with that? Is
> this possible to do?
>
>


-- 
Nitin Pawar