You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Bennie Schut (JIRA)" <ji...@apache.org> on 2012/07/27 15:53:33 UTC
[jira] [Created] (HIVE-3308) Mixing avro and snappy gives null
values
Bennie Schut created HIVE-3308:
----------------------------------
Summary: Mixing avro and snappy gives null values
Key: HIVE-3308
URL: https://issues.apache.org/jira/browse/HIVE-3308
Project: Hive
Issue Type: Bug
Components: Query Processor
Affects Versions: 0.10.0
Reporter: Bennie Schut
On default hive uses LazySimpleSerDe for output.
When I now enable compression and "select count(*) from avrotable" the output is a file with the .avro extension but this then will display null values since the file is in reality not an avro file but a file created by LazySimpleSerDe using compression so should be a .snappy file.
This causes any job (exception select * from avrotable is that not truly a job) to show null values.
If you use any serde other then avro you can temporarily fix this by setting "set hive.output.file.extension=.snappy" and it will correctly work again but this won't work on avro since it overwrites the hive.output.file.extension during initializing.
When you dump the query result into a table with "create table bla as" you can rename the .avro file into .snappy and the "select from bla" will also magiacally work again.
Input and Ouput serdes don't always match so when I use avro as an input format it should not set the hive.output.file.extension.
Onces it's set all queries will use it and fail making the connection useless to reuse.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3308) Mixing avro and snappy gives null
values
Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bennie Schut updated HIVE-3308:
-------------------------------
Attachment: HIVE-3308.patch1.txt
Added a test to show the problem.
Result of the test will show:
#### A masked pattern was here ####
POSTHOOK: query: select count(*) from src
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
#### A masked pattern was here ####
NULL
But should show something like:
#### A masked pattern was here ####
POSTHOOK: query: select count(*) from src
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
#### A masked pattern was here ####
500
> Mixing avro and snappy gives null values
> ----------------------------------------
>
> Key: HIVE-3308
> URL: https://issues.apache.org/jira/browse/HIVE-3308
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Bennie Schut
> Attachments: HIVE-3308.patch1.txt
>
>
> On default hive uses LazySimpleSerDe for output.
> When I now enable compression and "select count(*) from avrotable" the output is a file with the .avro extension but this then will display null values since the file is in reality not an avro file but a file created by LazySimpleSerDe using compression so should be a .snappy file.
> This causes any job (exception select * from avrotable is that not truly a job) to show null values.
> If you use any serde other then avro you can temporarily fix this by setting "set hive.output.file.extension=.snappy" and it will correctly work again but this won't work on avro since it overwrites the hive.output.file.extension during initializing.
> When you dump the query result into a table with "create table bla as" you can rename the .avro file into .snappy and the "select from bla" will also magiacally work again.
> Input and Ouput serdes don't always match so when I use avro as an input format it should not set the hive.output.file.extension.
> Onces it's set all queries will use it and fail making the connection useless to reuse.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira