You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jamie Cockrill <ja...@gmail.com> on 2010/10/01 15:56:25 UTC

Re: Tables using custom SerDe doesn't return any data when queried

Dear all,

I managed to fix this by starting from scratch and re-creating the
table and loading data into it. There must have been something odd
about the way I created my original table.

thanks

Jamie

On 30 September 2010 10:20, Jamie Cockrill <ja...@gmail.com> wrote:
> Dear hive-user's,
>
> I've written my own custom SerDe to handle some log files in a custom
> format and as I'd quite like to (eventually) use the JDBC driver down
> the line, I'd quite like to retain the column types for the output.
> Part of the reason for this is that we're using OpenCSV
> (http://opencsv.sourceforge.net/) to produce them in the first place,
> so it'd be good to use it again to parse the files when used for
> querying in Hive.
>
> I've implemented my own SerDe, originally using
> MetadataTypedColumnsetSerDe as a basis, however whenever I run a
> query, no data is returned, regardless of the amount of data I load
> into the table. The load proceeds fine. I am using the version of Hive
> from Cloudera's CDH3 distribution (based on 0.5.0).
>
> My create table statement is:
>
> CREATE TABLE my_test_table (col_name_1 STRING, col_name_2, INT, ...
> etc) COMMENT 'Some comment' PARTITIONED BY (part_col_1 STRING,
> part_col_2 STRING)
> ROW FORMAT SERDE "com.my.package.named.MyNewSerDe" STORED AS TEXTFILE;
>
> I have switched on the debug logging and put a bunch of debug
> statements in my code and I've found that when I do a simple query
> (like "select * from my_test_table limit 10;") so that it runs
> locally, it does find the class. Indeed it calls the initialize method
> and calls the getObjectInspector method a number of times.
> Subsequently though, it calls initialize on LazySimpleSerDe three
> times. The first two times it has dummy column names (_col0) and the
> correct column types in the correct order. The last time it contains
> no column names or types at all.
>
> Presumably I'm missing something fairly simple from somewhere (a class
> extension missing, wrong class returned by getSerializedClass() or
> perhaps constructing the ObjectInspector incorrectly?) but for the
> life of me I can't spot it. The underlying files are just CSV's
> constructed using the OpenCSV library above.
>
> I'd be very grateful for any suggestions.
>
> Thanks,
>
> Jamie
>