You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Brian Bloniarz (JIRA)" <ji...@apache.org> on 2012/06/25 21:13:44 UTC

[jira] [Created] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)

Brian Bloniarz created HIVE-3198:
------------------------------------

             Summary: StorageHandler properties not passed to InputFormat (?)
                 Key: HIVE-3198
                 URL: https://issues.apache.org/jira/browse/HIVE-3198
             Project: Hive
          Issue Type: Bug
         Environment: trunk r1352973
            Reporter: Brian Bloniarz


I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.

I found the following code which looks like it's supposed to propagate JobProperties:
{code}
public class HiveInputFormat<K extends WritableComparable, V extends Writable>
...
  public RecordReader getRecordReader(InputSplit split, JobConf job,
      Reporter reporter) throws IOException {

    HiveInputSplit hsplit = (HiveInputSplit) split;
...
    boolean nonNative = false;
    PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
    if ((part != null) && (part.getTableDesc() != null)) {
      Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
      nonNative = part.getTableDesc().isNonNative();
    }
{code}

In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
{code}
create external table test3 () STORED BY 'foo' location '/data/bar';
{code}
The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").

I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)

Posted by "Brian Bloniarz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Bloniarz updated HIVE-3198:
---------------------------------

    Attachment: inputformat.patch
    
> StorageHandler properties not passed to InputFormat (?)
> -------------------------------------------------------
>
>                 Key: HIVE-3198
>                 URL: https://issues.apache.org/jira/browse/HIVE-3198
>             Project: Hive
>          Issue Type: Bug
>         Environment: trunk r1352973
>            Reporter: Brian Bloniarz
>         Attachments: inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate JobProperties:
> {code}
> public class HiveInputFormat<K extends WritableComparable, V extends Writable>
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>       Reporter reporter) throws IOException {
>     HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
>     boolean nonNative = false;
>     PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
>     if ((part != null) && (part.getTableDesc() != null)) {
>       Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
>       nonNative = part.getTableDesc().isNonNative();
>     }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)

Posted by "Navis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413631#comment-13413631 ] 

Navis commented on HIVE-3198:
-----------------------------

I wish I could provide patch for this issue, but I can't reproduce the situation you described.
                
> StorageHandler properties not passed to InputFormat (?)
> -------------------------------------------------------
>
>                 Key: HIVE-3198
>                 URL: https://issues.apache.org/jira/browse/HIVE-3198
>             Project: Hive
>          Issue Type: Bug
>         Environment: trunk r1352973
>            Reporter: Brian Bloniarz
>         Attachments: inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate JobProperties:
> {code}
> public class HiveInputFormat<K extends WritableComparable, V extends Writable>
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>       Reporter reporter) throws IOException {
>     HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
>     boolean nonNative = false;
>     PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
>     if ((part != null) && (part.getTableDesc() != null)) {
>       Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
>       nonNative = part.getTableDesc().isNonNative();
>     }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3198) Table properties of non-native table are not transferred to RecordReader

Posted by "Navis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navis updated HIVE-3198:
------------------------

    Affects Version/s: 0.10.0
               Status: Patch Available  (was: Open)

https://reviews.facebook.net/D4173
                
> Table properties of non-native table are not transferred to RecordReader
> ------------------------------------------------------------------------
>
>                 Key: HIVE-3198
>                 URL: https://issues.apache.org/jira/browse/HIVE-3198
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>         Environment: trunk r1352973
>            Reporter: Brian Bloniarz
>            Assignee: Navis
>         Attachments: TestStorageHandler.java, inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate JobProperties:
> {code}
> public class HiveInputFormat<K extends WritableComparable, V extends Writable>
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>       Reporter reporter) throws IOException {
>     HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
>     boolean nonNative = false;
>     PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
>     if ((part != null) && (part.getTableDesc() != null)) {
>       Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
>       nonNative = part.getTableDesc().isNonNative();
>     }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)

Posted by "Brian Bloniarz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413302#comment-13413302 ] 

Brian Bloniarz commented on HIVE-3198:
--------------------------------------

Hi Navis, sorry it took me so long to get back to you.

Your suggested fix also works & makes the problem go away. Thanks for helping, let me know if there's anything else w.r.t. getting this fixed.
                
> StorageHandler properties not passed to InputFormat (?)
> -------------------------------------------------------
>
>                 Key: HIVE-3198
>                 URL: https://issues.apache.org/jira/browse/HIVE-3198
>             Project: Hive
>          Issue Type: Bug
>         Environment: trunk r1352973
>            Reporter: Brian Bloniarz
>         Attachments: inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate JobProperties:
> {code}
> public class HiveInputFormat<K extends WritableComparable, V extends Writable>
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>       Reporter reporter) throws IOException {
>     HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
>     boolean nonNative = false;
>     PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
>     if ((part != null) && (part.getTableDesc() != null)) {
>       Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
>       nonNative = part.getTableDesc().isNonNative();
>     }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-3198:
-----------------------------

    Status: Open  (was: Patch Available)

There is no patch for this
                
> StorageHandler properties not passed to InputFormat (?)
> -------------------------------------------------------
>
>                 Key: HIVE-3198
>                 URL: https://issues.apache.org/jira/browse/HIVE-3198
>             Project: Hive
>          Issue Type: Bug
>         Environment: trunk r1352973
>            Reporter: Brian Bloniarz
>         Attachments: inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate JobProperties:
> {code}
> public class HiveInputFormat<K extends WritableComparable, V extends Writable>
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>       Reporter reporter) throws IOException {
>     HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
>     boolean nonNative = false;
>     PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
>     if ((part != null) && (part.getTableDesc() != null)) {
>       Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
>       nonNative = part.getTableDesc().isNonNative();
>     }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3198) Table properties of non-native table are not transferred to RecordReader

Posted by "Navis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navis updated HIVE-3198:
------------------------

    Description: 
I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.

I found the following code which looks like it's supposed to propagate JobProperties:
{code}
public class HiveInputFormat<K extends WritableComparable, V extends Writable>
...
  public RecordReader getRecordReader(InputSplit split, JobConf job,
      Reporter reporter) throws IOException {

    HiveInputSplit hsplit = (HiveInputSplit) split;
...
    boolean nonNative = false;
    PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
    if ((part != null) && (part.getTableDesc() != null)) {
      Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
      nonNative = part.getTableDesc().isNonNative();
    }
{code}

In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
{code}
create external table test3 () STORED BY 'foo' location '/data/bar';
{code}
The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").

I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.


  was:
I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.

I found the following code which looks like it's supposed to propagate JobProperties:
{code}
public class HiveInputFormat<K extends WritableComparable, V extends Writable>
...
  public RecordReader getRecordReader(InputSplit split, JobConf job,
      Reporter reporter) throws IOException {

    HiveInputSplit hsplit = (HiveInputSplit) split;
...
    boolean nonNative = false;
    PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
    if ((part != null) && (part.getTableDesc() != null)) {
      Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
      nonNative = part.getTableDesc().isNonNative();
    }
{code}

In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
{code}
create external table test3 () STORED BY 'foo' location '/data/bar';
{code}
The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").

I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.

       Assignee: Navis
        Summary: Table properties of non-native table are not transferred to RecordReader  (was: StorageHandler properties not passed to InputFormat (?))

For non-native tables hive delegates HiveInputFormat to create input splits and record readers. But most of input formats in hadoop replaces directories (which is location of table/partition) to concrete file names in it, which causes not finding appropriate partition desc by simple map access of pathToPartitionInfo.

It can be simply fixed by searching partition in recursive manner which is CombinHiveInputFormat is already doing as commented below. But it seemed to hard to make a proper test case for this case, so I'll just upload the code patch.
                
> Table properties of non-native table are not transferred to RecordReader
> ------------------------------------------------------------------------
>
>                 Key: HIVE-3198
>                 URL: https://issues.apache.org/jira/browse/HIVE-3198
>             Project: Hive
>          Issue Type: Bug
>         Environment: trunk r1352973
>            Reporter: Brian Bloniarz
>            Assignee: Navis
>         Attachments: TestStorageHandler.java, inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate JobProperties:
> {code}
> public class HiveInputFormat<K extends WritableComparable, V extends Writable>
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>       Reporter reporter) throws IOException {
>     HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
>     boolean nonNative = false;
>     PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
>     if ((part != null) && (part.getTableDesc() != null)) {
>       Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
>       nonNative = part.getTableDesc().isNonNative();
>     }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3198) Table properties of non-native table are not transferred to RecordReader

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415814#comment-13415814 ] 

Edward Capriolo commented on HIVE-3198:
---------------------------------------

For the Cassandra handler I coded the properties into the InputSplit to deal with them not being passed on to MR jobs.
                
> Table properties of non-native table are not transferred to RecordReader
> ------------------------------------------------------------------------
>
>                 Key: HIVE-3198
>                 URL: https://issues.apache.org/jira/browse/HIVE-3198
>             Project: Hive
>          Issue Type: Bug
>         Environment: trunk r1352973
>            Reporter: Brian Bloniarz
>            Assignee: Navis
>         Attachments: TestStorageHandler.java, inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate JobProperties:
> {code}
> public class HiveInputFormat<K extends WritableComparable, V extends Writable>
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>       Reporter reporter) throws IOException {
>     HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
>     boolean nonNative = false;
>     PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
>     if ((part != null) && (part.getTableDesc() != null)) {
>       Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
>       nonNative = part.getTableDesc().isNonNative();
>     }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)

Posted by "Brian Bloniarz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Bloniarz updated HIVE-3198:
---------------------------------

    Attachment: TestStorageHandler.java

Here's a StorageHandler implementation which should help reproduce the bug. When I run it like this:
{code}
$ mkdir /tmp/test; touch /tmp/test/part-00000
hive> add jar test.jar;
hive> create external table test (a string) STORED BY 'TestStorageHandler' location '/tmp/test';
hive> select * from test;
{code}
I see "TESTPROP: hello world", which means that the properties are being setup correctly. But if you do:
{code}
hive> select a from test;
{code}
I see "TESTPROP: null", meaning that properties from configureInputJobProperties() don't get passed to the getRecordReader() call.
                
> StorageHandler properties not passed to InputFormat (?)
> -------------------------------------------------------
>
>                 Key: HIVE-3198
>                 URL: https://issues.apache.org/jira/browse/HIVE-3198
>             Project: Hive
>          Issue Type: Bug
>         Environment: trunk r1352973
>            Reporter: Brian Bloniarz
>         Attachments: TestStorageHandler.java, inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate JobProperties:
> {code}
> public class HiveInputFormat<K extends WritableComparable, V extends Writable>
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>       Reporter reporter) throws IOException {
>     HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
>     boolean nonNative = false;
>     PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
>     if ((part != null) && (part.getTableDesc() != null)) {
>       Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
>       nonNative = part.getTableDesc().isNonNative();
>     }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)

Posted by "Navis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401068#comment-13401068 ] 

Navis commented on HIVE-3198:
-----------------------------

Try replacing.. 
{code}
PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
{code}
part with.. 
{code}
PartitionDesc part = HiveFileFormatUtils.getPartitionDescFromPathRecursively(pathToPartitionInfo,
                  hsplit.getPath(), IOPrepareCache.get().getPartitionDescMap());
{code}

It seemed to be a bug, IMHO.
                
> StorageHandler properties not passed to InputFormat (?)
> -------------------------------------------------------
>
>                 Key: HIVE-3198
>                 URL: https://issues.apache.org/jira/browse/HIVE-3198
>             Project: Hive
>          Issue Type: Bug
>         Environment: trunk r1352973
>            Reporter: Brian Bloniarz
>         Attachments: inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate JobProperties:
> {code}
> public class HiveInputFormat<K extends WritableComparable, V extends Writable>
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>       Reporter reporter) throws IOException {
>     HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
>     boolean nonNative = false;
>     PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
>     if ((part != null) && (part.getTableDesc() != null)) {
>       Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
>       nonNative = part.getTableDesc().isNonNative();
>     }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)

Posted by "Brian Bloniarz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Bloniarz updated HIVE-3198:
---------------------------------

    Status: Patch Available  (was: Open)
    
> StorageHandler properties not passed to InputFormat (?)
> -------------------------------------------------------
>
>                 Key: HIVE-3198
>                 URL: https://issues.apache.org/jira/browse/HIVE-3198
>             Project: Hive
>          Issue Type: Bug
>         Environment: trunk r1352973
>            Reporter: Brian Bloniarz
>
> I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate JobProperties:
> {code}
> public class HiveInputFormat<K extends WritableComparable, V extends Writable>
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>       Reporter reporter) throws IOException {
>     HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
>     boolean nonNative = false;
>     PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
>     if ((part != null) && (part.getTableDesc() != null)) {
>       Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
>       nonNative = part.getTableDesc().isNonNative();
>     }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira