You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Arvind Prabhakar <ar...@cloudera.com> on 2010/04/15 21:00:59 UTC

Re: table from sequence file

Hi Sagar,

Looks like your source file has custom writable types in it. If that is the
case, implementing a SerDe that works with that type may not be that
straight forward, although doable.

An alternative would be to implement a custom RecordReader that converts the
value of your custom writable to Struct type which can then be queried
directly.

Arvind

On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <sn...@attributor.com> wrote:

> Hi
>
> My data is in the value field of a sequence file.
> The value field has subfields in it. I am trying to create table using
> these subfields.
> Example:
> <KEY> <VALUE>
> <KEY_FIELD1, KEYFIELD 2>  forms the key
> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
> So i am trying to create a table from VALUE_FIELD*
>
> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as
> string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>
> I am planing to a write a custom SerDe implementation and custom
> SequenceFileReader
> Pl let me knw if I am on the right track.
>
>
> -Sagar

Re: table from sequence file

Posted by Arvind Prabhakar <ar...@cloudera.com>.
I think it will be better to take a look at LazySimpleSerDe to see how it
serializes and deserializes Struct types. Your implementation should be such
that it works with this SerDe seamlessly.

More specifically, creating a simple POJO may not work due to
inherent marshaling/encoding semantics that must be observed to conform to
the ByteWritable contracts.

Arvind

On Fri, Apr 16, 2010 at 11:04 AM, Sagar Naik <sn...@attributor.com> wrote:

> Hi Arvind,
> Thanks for explanation.
>
> I am newbie so I am not familiar with terms.
> Struct implementation is POJO or some thing else.
>
> My guess is struct is a simple POJO . If so then simple POJO represented in
> BYTES will be passed to BytesWritable .
> And it should work ?
>
>
>
> -Sagar
>
> On Apr 16, 2010, at 9:58 AM, Arvind Prabhakar wrote:
>
> Sagar,
>
> Unfortunately it is more complicated than that. The idea behind the record
> reader implementation is to actually convert the underlying writable into a
> type that is understood by the SerDe layer. At this time, the SerDe layer
> seems to understand ByteWritable and Text types. So - if you could take your
> custom type and emit a ByteWritable that represents a struct implementation
> of the same, it would work.
>
> Another alternative which would be simple to implement would be to do the
> following:
>
> 1. Modify your custom writable so that it has a toString() method that
> generates a parsable representation of the fields. For example you could use
> the JSON representation in your toString() method.
>
> 2. Create the external table with inputformat
> 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' and  outputformat
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat', mapping the
> entire value type to a single string column.
>
> 3. Use the UDFJson to extract the individual attributes from the JSON
> string that is emitted from the select query.
>
> You can use this output to populate a new table that now has the parsed
> values separated out in the warehouse.
>
> Arvind
>
>
> On Thu, Apr 15, 2010 at 6:01 PM, Sagar Naik <sn...@attributor.com> wrote:
>
>> Hi Arvind,
>>
>> U guessed it correct.
>>
>> We have custom writables.
>> I saw the TextRecordReader implementation to get an idea on RecordReader.
>>
>> It looks like createRow creates an instance and next(...) populates this
>> instance.
>> The createRow returns an instance of Writable.
>>
>> Is the Writable Instance same as "struct" from u r reply
>>
>> How is this Writable instance mapped to column names ?
>> Is there something in commandline syntax which binds the Writable instance
>> to column names and values ?
>> Or ObjectInspector will do it magically
>>
>> -Sagar
>> On Apr 15, 2010, at 12:00 PM, Arvind Prabhakar wrote:
>>
>> Hi Sagar,
>>
>> Looks like your source file has custom writable types in it. If that is
>> the case, implementing a SerDe that works with that type may not be that
>> straight forward, although doable.
>>
>> An alternative would be to implement a custom RecordReader that converts
>> the value of your custom writable to Struct type which can then be queried
>> directly.
>>
>> Arvind
>>
>> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <sn...@attributor.com> wrote:
>>
>>> Hi
>>>
>>> My data is in the value field of a sequence file.
>>> The value field has subfields in it. I am trying to create table using
>>> these subfields.
>>> Example:
>>> <KEY> <VALUE>
>>> <KEY_FIELD1, KEYFIELD 2>  forms the key
>>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
>>> So i am trying to create a table from VALUE_FIELD*
>>>
>>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as
>>> string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>>>
>>> I am planing to a write a custom SerDe implementation and custom
>>> SequenceFileReader
>>> Pl let me knw if I am on the right track.
>>>
>>>
>>> -Sagar
>>
>>
>>
>>
>
>

Re: table from sequence file

Posted by Sagar Naik <sn...@attributor.com>.
Hi Arvind,
Thanks for explanation.

I am newbie so I am not familiar with terms.
Struct implementation is POJO or some thing else.

My guess is struct is a simple POJO . If so then simple POJO represented in BYTES will be passed to BytesWritable .
And it should work ?



-Sagar

On Apr 16, 2010, at 9:58 AM, Arvind Prabhakar wrote:

> Sagar,
> 
> Unfortunately it is more complicated than that. The idea behind the record reader implementation is to actually convert the underlying writable into a type that is understood by the SerDe layer. At this time, the SerDe layer seems to understand ByteWritable and Text types. So - if you could take your custom type and emit a ByteWritable that represents a struct implementation of the same, it would work.
> 
> Another alternative which would be simple to implement would be to do the following:
> 
> 1. Modify your custom writable so that it has a toString() method that generates a parsable representation of the fields. For example you could use the JSON representation in your toString() method.
> 
> 2. Create the external table with inputformat 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' and  outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat', mapping the entire value type to a single string column.
> 
> 3. Use the UDFJson to extract the individual attributes from the JSON string that is emitted from the select query. 
> 
> You can use this output to populate a new table that now has the parsed values separated out in the warehouse.
> 
> Arvind
> 
> 
> On Thu, Apr 15, 2010 at 6:01 PM, Sagar Naik <sn...@attributor.com> wrote:
> Hi Arvind,
> 
> U guessed it correct.
> 
> We have custom writables.
> I saw the TextRecordReader implementation to get an idea on RecordReader.
> 
> It looks like createRow creates an instance and next(...) populates this instance.
> The createRow returns an instance of Writable.
> 
> Is the Writable Instance same as "struct" from u r reply
> 
> How is this Writable instance mapped to column names ?
> Is there something in commandline syntax which binds the Writable instance to column names and values ?
> Or ObjectInspector will do it magically 
> 
> -Sagar
> On Apr 15, 2010, at 12:00 PM, Arvind Prabhakar wrote:
> 
>> Hi Sagar,
>> 
>> Looks like your source file has custom writable types in it. If that is the case, implementing a SerDe that works with that type may not be that straight forward, although doable. 
>> 
>> An alternative would be to implement a custom RecordReader that converts the value of your custom writable to Struct type which can then be queried directly.
>> 
>> Arvind
>> 
>> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <sn...@attributor.com> wrote:
>> Hi
>> 
>> My data is in the value field of a sequence file.
>> The value field has subfields in it. I am trying to create table using these subfields.
>> Example:
>> <KEY> <VALUE>
>> <KEY_FIELD1, KEYFIELD 2>  forms the key
>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
>> So i am trying to create a table from VALUE_FIELD*
>> 
>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>> 
>> I am planing to a write a custom SerDe implementation and custom SequenceFileReader
>> Pl let me knw if I am on the right track.
>> 
>> 
>> -Sagar
>> 
> 
> 


Re: table from sequence file

Posted by Arvind Prabhakar <ar...@cloudera.com>.
Sagar,

Unfortunately it is more complicated than that. The idea behind the record
reader implementation is to actually convert the underlying writable into a
type that is understood by the SerDe layer. At this time, the SerDe layer
seems to understand ByteWritable and Text types. So - if you could take your
custom type and emit a ByteWritable that represents a struct implementation
of the same, it would work.

Another alternative which would be simple to implement would be to do the
following:

1. Modify your custom writable so that it has a toString() method that
generates a parsable representation of the fields. For example you could use
the JSON representation in your toString() method.

2. Create the external table with inputformat
'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' and  outputformat
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat', mapping the
entire value type to a single string column.

3. Use the UDFJson to extract the individual attributes from the JSON string
that is emitted from the select query.

You can use this output to populate a new table that now has the parsed
values separated out in the warehouse.

Arvind


On Thu, Apr 15, 2010 at 6:01 PM, Sagar Naik <sn...@attributor.com> wrote:

> Hi Arvind,
>
> U guessed it correct.
>
> We have custom writables.
> I saw the TextRecordReader implementation to get an idea on RecordReader.
>
> It looks like createRow creates an instance and next(...) populates this
> instance.
> The createRow returns an instance of Writable.
>
> Is the Writable Instance same as "struct" from u r reply
>
> How is this Writable instance mapped to column names ?
> Is there something in commandline syntax which binds the Writable instance
> to column names and values ?
> Or ObjectInspector will do it magically
>
> -Sagar
> On Apr 15, 2010, at 12:00 PM, Arvind Prabhakar wrote:
>
> Hi Sagar,
>
> Looks like your source file has custom writable types in it. If that is the
> case, implementing a SerDe that works with that type may not be that
> straight forward, although doable.
>
> An alternative would be to implement a custom RecordReader that converts
> the value of your custom writable to Struct type which can then be queried
> directly.
>
> Arvind
>
> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <sn...@attributor.com> wrote:
>
>> Hi
>>
>> My data is in the value field of a sequence file.
>> The value field has subfields in it. I am trying to create table using
>> these subfields.
>> Example:
>> <KEY> <VALUE>
>> <KEY_FIELD1, KEYFIELD 2>  forms the key
>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
>> So i am trying to create a table from VALUE_FIELD*
>>
>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as
>> string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>>
>> I am planing to a write a custom SerDe implementation and custom
>> SequenceFileReader
>> Pl let me knw if I am on the right track.
>>
>>
>> -Sagar
>
>
>
>

Re: table from sequence file

Posted by Sagar Naik <sn...@attributor.com>.
Hi Arvind,

U guessed it correct.

We have custom writables.
I saw the TextRecordReader implementation to get an idea on RecordReader.

It looks like createRow creates an instance and next(...) populates this instance.
The createRow returns an instance of Writable.

Is the Writable Instance same as "struct" from u r reply

How is this Writable instance mapped to column names ?
Is there something in commandline syntax which binds the Writable instance to column names and values ?
Or ObjectInspector will do it magically 

-Sagar
On Apr 15, 2010, at 12:00 PM, Arvind Prabhakar wrote:

> Hi Sagar,
> 
> Looks like your source file has custom writable types in it. If that is the case, implementing a SerDe that works with that type may not be that straight forward, although doable. 
> 
> An alternative would be to implement a custom RecordReader that converts the value of your custom writable to Struct type which can then be queried directly.
> 
> Arvind
> 
> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <sn...@attributor.com> wrote:
> Hi
> 
> My data is in the value field of a sequence file.
> The value field has subfields in it. I am trying to create table using these subfields.
> Example:
> <KEY> <VALUE>
> <KEY_FIELD1, KEYFIELD 2>  forms the key
> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
> So i am trying to create a table from VALUE_FIELD*
> 
> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
> 
> I am planing to a write a custom SerDe implementation and custom SequenceFileReader
> Pl let me knw if I am on the right track.
> 
> 
> -Sagar
> 


Re: table from sequence file

Posted by Arvind Prabhakar <ar...@cloudera.com>.
On Thu, Apr 15, 2010 at 7:00 PM, Edward Capriolo <ed...@gmail.com>wrote:

>
>
> On Thu, Apr 15, 2010 at 7:23 PM, Arvind Prabhakar <ar...@cloudera.com>wrote:
>
>> On Thu, Apr 15, 2010 at 1:23 PM, Edward Capriolo <ed...@gmail.com>wrote:
>>
>>>
>>>
>>> On Thu, Apr 15, 2010 at 3:00 PM, Arvind Prabhakar <ar...@cloudera.com>wrote:
>>>
>>>> Hi Sagar,
>>>>
>>>> Looks like your source file has custom writable types in it. If that is
>>>> the case, implementing a SerDe that works with that type may not be that
>>>> straight forward, although doable.
>>>>
>>>> An alternative would be to implement a custom RecordReader that converts
>>>> the value of your custom writable to Struct type which can then be queried
>>>> directly.
>>>>
>>>> Arvind
>>>>
>>>>
>>>> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <sn...@attributor.com>wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> My data is in the value field of a sequence file.
>>>>> The value field has subfields in it. I am trying to create table using
>>>>> these subfields.
>>>>> Example:
>>>>> <KEY> <VALUE>
>>>>> <KEY_FIELD1, KEYFIELD 2>  forms the key
>>>>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
>>>>> So i am trying to create a table from VALUE_FIELD*
>>>>>
>>>>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2
>>>>> as string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>>>>>
>>>>> I am planing to a write a custom SerDe implementation and custom
>>>>> SequenceFileReader
>>>>> Pl let me knw if I am on the right track.
>>>>>
>>>>>
>>>>> -Sagar
>>>>
>>>>
>>>>
>>> I am actually having lots of trouble with this.
>>> I have a sequence file that opens fine with
>>> /home/edward/hadoop/hadoop-0.20.2/bin/hadoop dfs -text
>>> /home/edward/Downloads/seq/seq
>>>
>>> create external table keyonly( ver string , theid int, thedate string )
>>> row format delimited fields terminated by ','
>>> STORED AS
>>> inputformat 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
>>> outputformat
>>> 'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat'
>>>
>>> location '/home/edward/Downloads/seq';
>>>
>>>
>>>
>>> Also tried
>>> inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat'
>>> or stored as SEQUENCEFILE
>>>
>>> I always get this...
>>>
>>> 2010-04-15 13:10:43,849 ERROR CliDriver
>>> (SessionState.java:printError(255)) - Failed with exception
>>> java.io.IOException:java.io.EOFException
>>> java.io.IOException: java.io.EOFException
>>>     at
>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:332)
>>>     at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:120)
>>>     at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:681)
>>>     at
>>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146)
>>>     at
>>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>>>     at
>>> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:510)
>>>     at
>>> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_key_only(TestCliDriver.java:79)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>     at junit.framework.TestCase.runTest(TestCase.java:154)
>>>     at junit.framework.TestCase.runBare(TestCase.java:127)
>>>     at junit.framework.TestResult$1.protect(TestResult.java:106)
>>>     at junit.framework.TestResult.runProtected(TestResult.java:124)
>>>     at junit.framework.TestResult.run(TestResult.java:109)
>>>     at junit.framework.TestCase.run(TestCase.java:118)
>>>     at junit.framework.TestSuite.runTest(TestSuite.java:208)
>>>     at junit.framework.TestSuite.run(TestSuite.java:203)
>>>     at
>>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
>>>     at
>>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
>>>     at
>>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)
>>> Caused by: java.io.EOFException
>>>     at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
>>>     at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
>>>     at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
>>>     at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
>>>     at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
>>>     at
>>> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
>>>     at
>>> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
>>>     at
>>> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169)
>>>     at
>>> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179)
>>>     at
>>> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
>>>     at
>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
>>>     at
>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>>>     at
>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>>>     at
>>> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
>>>     at
>>> org.apache.hadoop.mapred.SequenceFileAsTextRecordReader.<init>(SequenceFileAsTextRecordReader.java:44)
>>>     at
>>> org.apache.hadoop.mapred.SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43)
>>>     at
>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296)
>>>     at
>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311)
>>>     ... 21 more
>>>
>>> Does anyone have a clue on what I am doing wrong??
>>>
>>>
>> The SequenceFileAsTextInputFormat converts the sequence record values to
>> string using the toString() invocation. Assuming that your data has a custom
>> writable that has multiple fields in it, I don't think it is possible for
>> you to map the individual bits to different columns.
>>
>> Can you try doing the following:
>>
>> create external table dummy( fullvalue string)
>> stored as inputformat
>> 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
>> outputformat'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>>
>> location '/home/edward/Downloads/seq';
>>
>> and then doing a select * from dummy.
>>
>> Arvind
>>
>
>
> [edward@ec hive]$ head -1 /home/edward/Downloads/seq/seq | od -a
> 0000000   S   E   Q ack  em   o   r   g   .   a   p   a   c   h   e   .
> 0000020   h   a   d   o   o   p   .   i   o   .   T   e   x   t  em   o
> 0000040   r   g   .   a   p   a   c   h   e   .   h   a   d   o   o   p
> 0000060   .   i   o   .   T   e   x   t soh soh   '   o   r   g   .   a
> 0000100   p   a   c   h   e   .   h   a   d   o   o   p   .   i   o   .
> 0000120   c   o   m   p   r   e   s   s   .   G   z   i   p   C   o   d
> 0000140   e   c nul nul nul nul   =   4  ff   Y   F   s   V  so   4   "
> 0000160   R   +   X enq dle   T del del del del   =   4  ff   Y   F   s
> 0000200   V  so   4   "   R   +   X enq dle   T soh etb  us  vt  bs nul
>
>
> 2010-04-15 18:45:24,954 ERROR CliDriver (SessionState.java:printError(255))
> - Failed with exception java.io.IOException:java.io.EOFException
>
> java.io.IOException: java.io.EOFException
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:332)
>     at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:120)
>     at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:681)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>     at
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:510)
>     at
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_key_only(TestCliDriver.java:79)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at junit.framework.TestCase.runTest(TestCase.java:154)
>     at junit.framework.TestCase.runBare(TestCase.java:127)
>     at junit.framework.TestResult$1.protect(TestResult.java:106)
>     at junit.framework.TestResult.runProtected(TestResult.java:124)
>     at junit.framework.TestResult.run(TestResult.java:109)
>     at junit.framework.TestCase.run(TestCase.java:118)
>     at junit.framework.TestSuite.runTest(TestSuite.java:208)
>     at junit.framework.TestSuite.run(TestSuite.java:203)
>     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
>     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
>     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)
> Caused by: java.io.EOFException
>     at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
>     at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
>     at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
>     at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
>     at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
>     at
> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
>     at
> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
>     at
> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169)
>     at
> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>     at
> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
>     at
> org.apache.hadoop.mapred.SequenceFileAsTextRecordReader.<init>(SequenceFileAsTextRecordReader.java:44)
>     at
> org.apache.hadoop.mapred.SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43)
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296)
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311)
>     ... 21 more
>
>
The compression being used here - gzip - is not suitable for splitting of
the input files. That could be the reason why you are seeing this exception.
Can you try using a different compression scheme such as bzip2, or perhaps
by not compressing the files at all?

Arvind

Re: table from sequence file

Posted by Edward Capriolo <ed...@gmail.com>.
On Thu, Apr 15, 2010 at 7:23 PM, Arvind Prabhakar <ar...@cloudera.com>wrote:

> On Thu, Apr 15, 2010 at 1:23 PM, Edward Capriolo <ed...@gmail.com>wrote:
>
>>
>>
>> On Thu, Apr 15, 2010 at 3:00 PM, Arvind Prabhakar <ar...@cloudera.com>wrote:
>>
>>> Hi Sagar,
>>>
>>> Looks like your source file has custom writable types in it. If that is
>>> the case, implementing a SerDe that works with that type may not be that
>>> straight forward, although doable.
>>>
>>> An alternative would be to implement a custom RecordReader that converts
>>> the value of your custom writable to Struct type which can then be queried
>>> directly.
>>>
>>> Arvind
>>>
>>>
>>> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <sn...@attributor.com>wrote:
>>>
>>>> Hi
>>>>
>>>> My data is in the value field of a sequence file.
>>>> The value field has subfields in it. I am trying to create table using
>>>> these subfields.
>>>> Example:
>>>> <KEY> <VALUE>
>>>> <KEY_FIELD1, KEYFIELD 2>  forms the key
>>>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
>>>> So i am trying to create a table from VALUE_FIELD*
>>>>
>>>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2
>>>> as string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>>>>
>>>> I am planing to a write a custom SerDe implementation and custom
>>>> SequenceFileReader
>>>> Pl let me knw if I am on the right track.
>>>>
>>>>
>>>> -Sagar
>>>
>>>
>>>
>> I am actually having lots of trouble with this.
>> I have a sequence file that opens fine with
>> /home/edward/hadoop/hadoop-0.20.2/bin/hadoop dfs -text
>> /home/edward/Downloads/seq/seq
>>
>> create external table keyonly( ver string , theid int, thedate string )
>> row format delimited fields terminated by ','
>> STORED AS
>> inputformat 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
>> outputformat
>> 'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat'
>>
>> location '/home/edward/Downloads/seq';
>>
>>
>>
>> Also tried
>> inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat'
>> or stored as SEQUENCEFILE
>>
>> I always get this...
>>
>> 2010-04-15 13:10:43,849 ERROR CliDriver
>> (SessionState.java:printError(255)) - Failed with exception
>> java.io.IOException:java.io.EOFException
>> java.io.IOException: java.io.EOFException
>>     at
>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:332)
>>     at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:120)
>>     at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:681)
>>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146)
>>     at
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>>     at
>> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:510)
>>     at
>> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_key_only(TestCliDriver.java:79)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>     at junit.framework.TestCase.runTest(TestCase.java:154)
>>     at junit.framework.TestCase.runBare(TestCase.java:127)
>>     at junit.framework.TestResult$1.protect(TestResult.java:106)
>>     at junit.framework.TestResult.runProtected(TestResult.java:124)
>>     at junit.framework.TestResult.run(TestResult.java:109)
>>     at junit.framework.TestCase.run(TestCase.java:118)
>>     at junit.framework.TestSuite.runTest(TestSuite.java:208)
>>     at junit.framework.TestSuite.run(TestSuite.java:203)
>>     at
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
>>     at
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
>>     at
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)
>> Caused by: java.io.EOFException
>>     at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
>>     at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
>>     at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
>>     at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
>>     at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
>>     at
>> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
>>     at
>> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
>>     at
>> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169)
>>     at
>> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179)
>>     at
>> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
>>     at
>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
>>     at
>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>>     at
>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>>     at
>> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
>>     at
>> org.apache.hadoop.mapred.SequenceFileAsTextRecordReader.<init>(SequenceFileAsTextRecordReader.java:44)
>>     at
>> org.apache.hadoop.mapred.SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43)
>>     at
>> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296)
>>     at
>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311)
>>     ... 21 more
>>
>> Does anyone have a clue on what I am doing wrong??
>>
>>
> The SequenceFileAsTextInputFormat converts the sequence record values to
> string using the toString() invocation. Assuming that your data has a custom
> writable that has multiple fields in it, I don't think it is possible for
> you to map the individual bits to different columns.
>
> Can you try doing the following:
>
> create external table dummy( fullvalue string)
> stored as inputformat
> 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
> outputformat'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>
> location '/home/edward/Downloads/seq';
>
> and then doing a select * from dummy.
>
> Arvind
>


[edward@ec hive]$ head -1 /home/edward/Downloads/seq/seq | od -a
0000000   S   E   Q ack  em   o   r   g   .   a   p   a   c   h   e   .
0000020   h   a   d   o   o   p   .   i   o   .   T   e   x   t  em   o
0000040   r   g   .   a   p   a   c   h   e   .   h   a   d   o   o   p
0000060   .   i   o   .   T   e   x   t soh soh   '   o   r   g   .   a
0000100   p   a   c   h   e   .   h   a   d   o   o   p   .   i   o   .
0000120   c   o   m   p   r   e   s   s   .   G   z   i   p   C   o   d
0000140   e   c nul nul nul nul   =   4  ff   Y   F   s   V  so   4   "
0000160   R   +   X enq dle   T del del del del   =   4  ff   Y   F   s
0000200   V  so   4   "   R   +   X enq dle   T soh etb  us  vt  bs nul


2010-04-15 18:45:24,954 ERROR CliDriver (SessionState.java:printError(255))
- Failed with exception java.io.IOException:java.io.EOFException
java.io.IOException: java.io.EOFException
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:332)
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:120)
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:681)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
    at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:510)
    at
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_key_only(TestCliDriver.java:79)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at junit.framework.TestCase.runTest(TestCase.java:154)
    at junit.framework.TestCase.runBare(TestCase.java:127)
    at junit.framework.TestResult$1.protect(TestResult.java:106)
    at junit.framework.TestResult.runProtected(TestResult.java:124)
    at junit.framework.TestResult.run(TestResult.java:109)
    at junit.framework.TestCase.run(TestCase.java:118)
    at junit.framework.TestSuite.runTest(TestSuite.java:208)
    at junit.framework.TestSuite.run(TestSuite.java:203)
    at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
    at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
    at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)
Caused by: java.io.EOFException
    at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
    at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
    at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
    at
org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
    at
org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
    at
org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169)
    at
org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179)
    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
    at
org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
    at
org.apache.hadoop.mapred.SequenceFileAsTextRecordReader.<init>(SequenceFileAsTextRecordReader.java:44)
    at
org.apache.hadoop.mapred.SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43)
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296)
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311)
    ... 21 more

Re: table from sequence file

Posted by Arvind Prabhakar <ar...@cloudera.com>.
On Thu, Apr 15, 2010 at 1:23 PM, Edward Capriolo <ed...@gmail.com>wrote:

>
>
> On Thu, Apr 15, 2010 at 3:00 PM, Arvind Prabhakar <ar...@cloudera.com>wrote:
>
>> Hi Sagar,
>>
>> Looks like your source file has custom writable types in it. If that is
>> the case, implementing a SerDe that works with that type may not be that
>> straight forward, although doable.
>>
>> An alternative would be to implement a custom RecordReader that converts
>> the value of your custom writable to Struct type which can then be queried
>> directly.
>>
>> Arvind
>>
>>
>> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <sn...@attributor.com> wrote:
>>
>>> Hi
>>>
>>> My data is in the value field of a sequence file.
>>> The value field has subfields in it. I am trying to create table using
>>> these subfields.
>>> Example:
>>> <KEY> <VALUE>
>>> <KEY_FIELD1, KEYFIELD 2>  forms the key
>>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
>>> So i am trying to create a table from VALUE_FIELD*
>>>
>>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as
>>> string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>>>
>>> I am planing to a write a custom SerDe implementation and custom
>>> SequenceFileReader
>>> Pl let me knw if I am on the right track.
>>>
>>>
>>> -Sagar
>>
>>
>>
> I am actually having lots of trouble with this.
> I have a sequence file that opens fine with
> /home/edward/hadoop/hadoop-0.20.2/bin/hadoop dfs -text
> /home/edward/Downloads/seq/seq
>
> create external table keyonly( ver string , theid int, thedate string )
> row format delimited fields terminated by ','
> STORED AS
> inputformat 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
> outputformat
> 'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat'
>
> location '/home/edward/Downloads/seq';
>
>
>
> Also tried
> inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat'
> or stored as SEQUENCEFILE
>
> I always get this...
>
> 2010-04-15 13:10:43,849 ERROR CliDriver (SessionState.java:printError(255))
> - Failed with exception java.io.IOException:java.io.EOFException
> java.io.IOException: java.io.EOFException
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:332)
>     at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:120)
>     at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:681)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>     at
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:510)
>     at
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_key_only(TestCliDriver.java:79)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at junit.framework.TestCase.runTest(TestCase.java:154)
>     at junit.framework.TestCase.runBare(TestCase.java:127)
>     at junit.framework.TestResult$1.protect(TestResult.java:106)
>     at junit.framework.TestResult.runProtected(TestResult.java:124)
>     at junit.framework.TestResult.run(TestResult.java:109)
>     at junit.framework.TestCase.run(TestCase.java:118)
>     at junit.framework.TestSuite.runTest(TestSuite.java:208)
>     at junit.framework.TestSuite.run(TestSuite.java:203)
>     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
>     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
>     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)
> Caused by: java.io.EOFException
>     at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
>     at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
>     at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
>     at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
>     at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
>     at
> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
>     at
> org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
>     at
> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169)
>     at
> org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>     at
> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
>     at
> org.apache.hadoop.mapred.SequenceFileAsTextRecordReader.<init>(SequenceFileAsTextRecordReader.java:44)
>     at
> org.apache.hadoop.mapred.SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43)
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296)
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311)
>     ... 21 more
>
> Does anyone have a clue on what I am doing wrong??
>
>
The SequenceFileAsTextInputFormat converts the sequence record values to
string using the toString() invocation. Assuming that your data has a custom
writable that has multiple fields in it, I don't think it is possible for
you to map the individual bits to different columns.

Can you try doing the following:

create external table dummy( fullvalue string)
stored as inputformat
'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
outputformat'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location '/home/edward/Downloads/seq';

and then doing a select * from dummy.

Arvind

Re: table from sequence file

Posted by Edward Capriolo <ed...@gmail.com>.
On Thu, Apr 15, 2010 at 3:00 PM, Arvind Prabhakar <ar...@cloudera.com>wrote:

> Hi Sagar,
>
> Looks like your source file has custom writable types in it. If that is the
> case, implementing a SerDe that works with that type may not be that
> straight forward, although doable.
>
> An alternative would be to implement a custom RecordReader that converts
> the value of your custom writable to Struct type which can then be queried
> directly.
>
> Arvind
>
>
> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <sn...@attributor.com> wrote:
>
>> Hi
>>
>> My data is in the value field of a sequence file.
>> The value field has subfields in it. I am trying to create table using
>> these subfields.
>> Example:
>> <KEY> <VALUE>
>> <KEY_FIELD1, KEYFIELD 2>  forms the key
>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
>> So i am trying to create a table from VALUE_FIELD*
>>
>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as
>> string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>>
>> I am planing to a write a custom SerDe implementation and custom
>> SequenceFileReader
>> Pl let me knw if I am on the right track.
>>
>>
>> -Sagar
>
>
>
I am actually having lots of trouble with this.
I have a sequence file that opens fine with
/home/edward/hadoop/hadoop-0.20.2/bin/hadoop dfs -text
/home/edward/Downloads/seq/seq

create external table keyonly( ver string , theid int, thedate string )
row format delimited fields terminated by ','
STORED AS
inputformat 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
outputformat
'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat'

location '/home/edward/Downloads/seq';



Also tried
inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat'
or stored as SEQUENCEFILE

I always get this...

2010-04-15 13:10:43,849 ERROR CliDriver (SessionState.java:printError(255))
- Failed with exception java.io.IOException:java.io.EOFException
java.io.IOException: java.io.EOFException
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:332)
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:120)
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:681)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
    at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:510)
    at
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_key_only(TestCliDriver.java:79)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at junit.framework.TestCase.runTest(TestCase.java:154)
    at junit.framework.TestCase.runBare(TestCase.java:127)
    at junit.framework.TestResult$1.protect(TestResult.java:106)
    at junit.framework.TestResult.runProtected(TestResult.java:124)
    at junit.framework.TestResult.run(TestResult.java:109)
    at junit.framework.TestCase.run(TestCase.java:118)
    at junit.framework.TestSuite.runTest(TestSuite.java:208)
    at junit.framework.TestSuite.run(TestSuite.java:203)
    at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
    at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
    at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)
Caused by: java.io.EOFException
    at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
    at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
    at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
    at
org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
    at
org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
    at
org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169)
    at
org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179)
    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
    at
org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
    at
org.apache.hadoop.mapred.SequenceFileAsTextRecordReader.<init>(SequenceFileAsTextRecordReader.java:44)
    at
org.apache.hadoop.mapred.SequenceFileAsTextInputFormat.getRecordReader(SequenceFileAsTextInputFormat.java:43)
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:296)
    at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:311)
    ... 21 more

Does anyone have a clue on what I am doing wrong??