You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Julien Phalip <jp...@gmail.com> on 2022/05/16 03:09:19 UTC

Issue with the "hive.io.file.readcolumn.names" property

Hi,

I've noticed an odd behavior with the 'hive.io.file.readcolumn.names' conf
property.

Imagine a simple table "mytable" with two fields: "text" and "number".

- If you run the query "SELECT * FROM mytable", then the
"hive.io.file.readcolumn.names" has the value: "text,number". Makes sense
so far.
- If you run the query "SELECT text FROM mytable", then the
"hive.io.file.readcolumn.names" has the value: "text". Still makes sense.

However, if you add a predicate (WHERE clause), then the behavior of that
property seems strange to me:

- If you run the query "SELECT * FROM mytable WHERE number = 999", then the
"hive.io.file.readcolumn.names" has the value: "text". The "number" column
is missing from the property.
- If you run the query "SELECT number FROM mytable WHERE number = 999",
then the "hive.io.file.readcolumn.names" has the value: "" (empty string).
The "number" column is still missing from the property.

In other terms, it looks like if a column is part of a predicate, then it
is omitted from the "hive.io.file.readcolumn.names" property. Do you know
why that is?

I'm writing a custom StorageHandler and so I would need to know exactly
what columns the user is requesting. Is there a way to consistently
retrieve all the requested columns either from the configuration or from
within the InputFormat class, even when there is a WHERE clause?

Thanks,

Julien