You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Navis (JIRA)" <ji...@apache.org> on 2012/07/17 02:03:34 UTC

[jira] [Updated] (HIVE-3198) Table properties of non-native table are not transferred to RecordReader

     [ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navis updated HIVE-3198:
------------------------

    Description: 
I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.

I found the following code which looks like it's supposed to propagate JobProperties:
{code}
public class HiveInputFormat<K extends WritableComparable, V extends Writable>
...
  public RecordReader getRecordReader(InputSplit split, JobConf job,
      Reporter reporter) throws IOException {

    HiveInputSplit hsplit = (HiveInputSplit) split;
...
    boolean nonNative = false;
    PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
    if ((part != null) && (part.getTableDesc() != null)) {
      Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
      nonNative = part.getTableDesc().isNonNative();
    }
{code}

In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
{code}
create external table test3 () STORED BY 'foo' location '/data/bar';
{code}
The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").

I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.


  was:
I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.

I found the following code which looks like it's supposed to propagate JobProperties:
{code}
public class HiveInputFormat<K extends WritableComparable, V extends Writable>
...
  public RecordReader getRecordReader(InputSplit split, JobConf job,
      Reporter reporter) throws IOException {

    HiveInputSplit hsplit = (HiveInputSplit) split;
...
    boolean nonNative = false;
    PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
    if ((part != null) && (part.getTableDesc() != null)) {
      Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
      nonNative = part.getTableDesc().isNonNative();
    }
{code}

In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
{code}
create external table test3 () STORED BY 'foo' location '/data/bar';
{code}
The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").

I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.

       Assignee: Navis
        Summary: Table properties of non-native table are not transferred to RecordReader  (was: StorageHandler properties not passed to InputFormat (?))

For non-native tables hive delegates HiveInputFormat to create input splits and record readers. But most of input formats in hadoop replaces directories (which is location of table/partition) to concrete file names in it, which causes not finding appropriate partition desc by simple map access of pathToPartitionInfo.

It can be simply fixed by searching partition in recursive manner which is CombinHiveInputFormat is already doing as commented below. But it seemed to hard to make a proper test case for this case, so I'll just upload the code patch.
                
> Table properties of non-native table are not transferred to RecordReader
> ------------------------------------------------------------------------
>
>                 Key: HIVE-3198
>                 URL: https://issues.apache.org/jira/browse/HIVE-3198
>             Project: Hive
>          Issue Type: Bug
>         Environment: trunk r1352973
>            Reporter: Brian Bloniarz
>            Assignee: Navis
>         Attachments: TestStorageHandler.java, inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate JobProperties:
> {code}
> public class HiveInputFormat<K extends WritableComparable, V extends Writable>
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>       Reporter reporter) throws IOException {
>     HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
>     boolean nonNative = false;
>     PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
>     if ((part != null) && (part.getTableDesc() != null)) {
>       Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
>       nonNative = part.getTableDesc().isNonNative();
>     }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira