You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Grant Overby (groverby)" <gr...@cisco.com> on 2015/05/04 20:02:17 UTC

Writing Sequence Files

I’m looking for some sample code to write a hive compatible sequence file for an external table and matching ddl.

I’m starting with a java pojo. I can create an Object Inspector for this class. I’m reasonably sure I can write a serde leveraging java’s externalizable serialization. I’m coming up a bit short on how to wire this together.

My end goal is to have this file query able while I’m writing to it. I don’t know if Hive will work this way out of the box. Perhaps I’ll need a modified InputFormat to skip over incomplete rows?

[http://www.cisco.com/web/europe/images/email/signature/est2014/logo_06.png?ct=1398192119726]

Grant Overby
Software Engineer
Cisco.com<http://www.cisco.com/>
groverby@cisco.com<ma...@cisco.com>
Mobile: 865 724 4910






[http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif] Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.

Please click here<http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for Company Registration Information.





Re: Writing Sequence Files

Posted by Owen O'Malley <om...@apache.org>.
On Mon, May 4, 2015 at 11:02 AM, Grant Overby (groverby) <groverby@cisco.com
> wrote:

>   I’m looking for some sample code to write a hive compatible sequence
> file for an external table and matching ddl.
>

In general the easiest way is to create a table with what you'd like to
have and use Hive to write to table like that.


> I’m starting with a java pojo. I can create an Object Inspector for this
> class. I’m reasonably sure I can write a serde leveraging java’s
> externalizable serialization. I’m coming up a bit short on how to wire this
> together.
>

Ok, to make Hive happy you need to pick a serde. The default is
LazySimpleSerDe, so let's assume you'll use that one:

hive> create table people(name string, id int) stored as sequencefile;

will look like:
SequenceFile - key: BytesWritable, value: Text
The key is ignored and the value will be same string that would have been
used in a text file:

Vader^A1
Solo^A2
R2D2^A3

where ^A is control-A.


> My end goal is to have this file query able while I’m writing to it. I
> don’t know if Hive will work this way out of the box. Perhaps I’ll need a
> modified InputFormat to skip over incomplete rows?
>

The SequenceFile reader isn't very tolerant of incomplete files. You would
probably want an InputFormat that finds an instance of the sequence file
marker and only reads up to that. Of course if your file is complete that
will skip the last set of rows so you'd need to know the difference between
incomplete and complete files.

You might look at the work we did with the streaming ingest and ORC files.

https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest

.. Owen

Re: Writing Sequence Files

Posted by Stéphane Verlet <ka...@gmail.com>.
Have you look at the Hadoop SequenceFile API  ?
http://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/io/SequenceFile.html

In particular SequenceFile.createWriter

It worked for me.

Here is also a sample from stack overflow
http://stackoverflow.com/a/25484581










On Mon, May 4, 2015 at 12:02 PM, Grant Overby (groverby) <groverby@cisco.com
> wrote:

>   I’m looking for some sample code to write a hive compatible sequence
> file for an external table and matching ddl.
>
>  I’m starting with a java pojo. I can create an Object Inspector for this
> class. I’m reasonably sure I can write a serde leveraging java’s
> externalizable serialization. I’m coming up a bit short on how to wire this
> together.
>
>  My end goal is to have this file query able while I’m writing to it. I
> don’t know if Hive will work this way out of the box. Perhaps I’ll need a
> modified InputFormat to skip over incomplete rows?
>
>         *Grant Overby*
> Software Engineer
> Cisco.com <http://www.cisco.com/>
> groverby@cisco.com
> Mobile: *865 724 4910 <865%20724%204910>*
>
>
>
>        Think before you print.
>
> This email may contain confidential and privileged material for the sole
> use of the intended recipient. Any review, use, distribution or disclosure
> by others is strictly prohibited. If you are not the intended recipient (or
> authorized to receive for the recipient), please contact the sender by
> reply email and delete all copies of this message.
>
> Please click here
> <http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for
> Company Registration Information.
>
>
>
>