You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Francis Liu (Created) (JIRA)" <ji...@apache.org> on 2012/02/01 20:00:59 UTC

[jira] [Created] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
---------------------------------------------------------------------------------------------------------------------------

                 Key: HIVE-2773
                 URL: https://issues.apache.org/jira/browse/HIVE-2773
             Project: Hive
          Issue Type: Improvement
            Reporter: Francis Liu


HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).

My proposal is to replace configureTableJobProperties() with two methods:

configureInputJobProperties()
configureOutputJobProperties()

Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

Posted by "Ashutosh Chauhan (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan reassigned HIVE-2773:
--------------------------------------

    Assignee: Francis Liu
    
> HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2773
>                 URL: https://issues.apache.org/jira/browse/HIVE-2773
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Francis Liu
>            Assignee: Francis Liu
>              Labels: hcatalog, storage_handler
>         Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.patch
>
>
> HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).
> My proposal is to replace configureTableJobProperties() with two methods:
> configureInputJobProperties()
> configureOutputJobProperties()
> Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224942#comment-13224942 ] 

Phabricator commented on HIVE-2773:
-----------------------------------

ashutoshc has requested changes to the revision "HIVE-2773 [jira] HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output".

  Francis,
  Normally, we first apply patches to trunk and then potentially backport them to a branch. Can, you please first generate a patch for trunk?
  Overall, patch looks good. I left couple of comments.  Also, lint is complaining for few lines being too long. Can you take care of those one too?

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java:698 Should this be NoSuchMethodError ?
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java:708 Should this be NoSuchMethodError ?

REVISION DETAIL
  https://reviews.facebook.net/D2007

BRANCH
  arcpatch-D1815

                
> HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2773
>                 URL: https://issues.apache.org/jira/browse/HIVE-2773
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Francis Liu
>            Assignee: Francis Liu
>              Labels: hcatalog, storage_handler
>         Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.D2007.1.patch, HIVE-2773.patch
>
>
> HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).
> My proposal is to replace configureTableJobProperties() with two methods:
> configureInputJobProperties()
> configureOutputJobProperties()
> Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

Posted by "Francis Liu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214835#comment-13214835 ] 

Francis Liu commented on HIVE-2773:
-----------------------------------

Since it's an interface, things will still break not unless they extended DefaultStorageHandler. In any case will ask user@hive.
                
> HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2773
>                 URL: https://issues.apache.org/jira/browse/HIVE-2773
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Francis Liu
>            Assignee: Francis Liu
>              Labels: hcatalog, storage_handler
>         Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.patch
>
>
> HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).
> My proposal is to replace configureTableJobProperties() with two methods:
> configureInputJobProperties()
> configureOutputJobProperties()
> Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

Posted by "Francis Liu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Francis Liu updated HIVE-2773:
------------------------------

    Release Note: HiveStorageHandler.configureTableJobProperties() has been replaced with configureInputJobProperties() and configureOutputJobProperties().
          Status: Patch Available  (was: Open)
    
> HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2773
>                 URL: https://issues.apache.org/jira/browse/HIVE-2773
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Francis Liu
>              Labels: hcatalog, storage_handler
>         Attachments: HIVE-2773.patch
>
>
> HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).
> My proposal is to replace configureTableJobProperties() with two methods:
> configureInputJobProperties()
> configureOutputJobProperties()
> Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

Posted by "Ashutosh Chauhan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated HIVE-2773:
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.9.0
           Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Francis!
                
> HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2773
>                 URL: https://issues.apache.org/jira/browse/HIVE-2773
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Francis Liu
>            Assignee: Francis Liu
>              Labels: hcatalog, storage_handler
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.D2007.1.patch, HIVE-2773.D2415.1.patch, HIVE-2773.patch
>
>
> HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).
> My proposal is to replace configureTableJobProperties() with two methods:
> configureInputJobProperties()
> configureOutputJobProperties()
> Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

Posted by "Ashutosh Chauhan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214018#comment-13214018 ] 

Ashutosh Chauhan commented on HIVE-2773:
----------------------------------------

This will effectively break backwards compatibility since you are removing a method from an interface. You can get around this by keeping original method as well as introducing new ones. Then, at the time of invocation of these methods, you can check if newer methods exists, if they do then invoke them else invoke old method. Then, we can deprecate old method. But, before you go down that path and introduce code complexity I suggest you send an email on user@hive to check with people if they are ok with your current approach as is and will be able to upgrade their storage handlers, in which case we can go ahead with this current approach.
                
> HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2773
>                 URL: https://issues.apache.org/jira/browse/HIVE-2773
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Francis Liu
>              Labels: hcatalog, storage_handler
>         Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.patch
>
>
> HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).
> My proposal is to replace configureTableJobProperties() with two methods:
> configureInputJobProperties()
> configureOutputJobProperties()
> Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

Posted by "Francis Liu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214962#comment-13214962 ] 

Francis Liu commented on HIVE-2773:
-----------------------------------

If I add new methods, classes implementing the interface will have to implement the new methods anyway, why not just take the extra step of having them do a very little bit of refactoring (see my changes to HBaseStorageHandler)? Of course if most users are extending DefaultStorageHandler then doing as you say would be a smoother way to go about it. 
                
> HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2773
>                 URL: https://issues.apache.org/jira/browse/HIVE-2773
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Francis Liu
>            Assignee: Francis Liu
>              Labels: hcatalog, storage_handler
>         Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.patch
>
>
> HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).
> My proposal is to replace configureTableJobProperties() with two methods:
> configureInputJobProperties()
> configureOutputJobProperties()
> Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236772#comment-13236772 ] 

Hudson commented on HIVE-2773:
------------------------------

Integrated in Hive-trunk-h0.21 #1328 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1328/])
    HIVE-2773: HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output (Francis Liu via Ashutosh Chauhan) (Revision 1304167)

     Result = ABORTED
hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1304167
Files : 
* /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java

                
> HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2773
>                 URL: https://issues.apache.org/jira/browse/HIVE-2773
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Francis Liu
>            Assignee: Francis Liu
>              Labels: hcatalog, storage_handler
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.D2007.1.patch, HIVE-2773.D2415.1.patch, HIVE-2773.patch
>
>
> HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).
> My proposal is to replace configureTableJobProperties() with two methods:
> configureInputJobProperties()
> configureOutputJobProperties()
> Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236024#comment-13236024 ] 

Phabricator commented on HIVE-2773:
-----------------------------------

ashutoshc has accepted the revision "HIVE-2773 [jira] HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output".

  +1 will commit if tests pass.

REVISION DETAIL
  https://reviews.facebook.net/D2415

BRANCH
  configure2

                
> HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2773
>                 URL: https://issues.apache.org/jira/browse/HIVE-2773
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Francis Liu
>            Assignee: Francis Liu
>              Labels: hcatalog, storage_handler
>         Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.D2007.1.patch, HIVE-2773.D2415.1.patch, HIVE-2773.patch
>
>
> HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).
> My proposal is to replace configureTableJobProperties() with two methods:
> configureInputJobProperties()
> configureOutputJobProperties()
> Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HIVE-2773:
------------------------------

    Attachment: HIVE-2773.D2007.1.patch

toffer requested code review of "HIVE-2773 [jira] HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output".
Reviewers: JIRA, cwsteinbach, ashutoshc

  This is a patch which supports backward compatibility for the old storageHandlers which implements the deprecated method. Patch is built against 0.8-r2

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D2007

AFFECTED FILES
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler.java
  ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/4281/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2773
>                 URL: https://issues.apache.org/jira/browse/HIVE-2773
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Francis Liu
>            Assignee: Francis Liu
>              Labels: hcatalog, storage_handler
>         Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.D2007.1.patch, HIVE-2773.patch
>
>
> HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).
> My proposal is to replace configureTableJobProperties() with two methods:
> configureInputJobProperties()
> configureOutputJobProperties()
> Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HIVE-2773:
------------------------------

    Attachment: HIVE-2773.D2415.1.patch

toffer requested code review of "HIVE-2773 [jira] HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output".
Reviewers: JIRA

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D2415

AFFECTED FILES
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler.java
  ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/5409/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2773
>                 URL: https://issues.apache.org/jira/browse/HIVE-2773
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Francis Liu
>            Assignee: Francis Liu
>              Labels: hcatalog, storage_handler
>         Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.D2007.1.patch, HIVE-2773.D2415.1.patch, HIVE-2773.patch
>
>
> HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).
> My proposal is to replace configureTableJobProperties() with two methods:
> configureInputJobProperties()
> configureOutputJobProperties()
> Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

Posted by "Ashutosh Chauhan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214979#comment-13214979 ] 

Ashutosh Chauhan commented on HIVE-2773:
----------------------------------------

bq. If I add new methods, classes implementing the interface will have to implement the new methods anyway,

Adding or removing methods have different impacts from point of view of backward compatibility. 
If you add new methods in an interface then implementations using the older interface will continue to work even after they upgrade to new version of hive. That has no cost on people who have implemented the old interface. Only if they recompile it, then they have to implement new methods.  
On the other hand, if you remove methods, then their old implementation won't work anymore after upgrade. They have to recompile and this implies work on their part.

This is the difference which gets factored in backward compatibility. So, as long as you are adding new methods in interface, I consider it to be backward compatible. But, if you are removing methods, then its not. Makes sense?   
                
> HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2773
>                 URL: https://issues.apache.org/jira/browse/HIVE-2773
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Francis Liu
>            Assignee: Francis Liu
>              Labels: hcatalog, storage_handler
>         Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.patch
>
>
> HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).
> My proposal is to replace configureTableJobProperties() with two methods:
> configureInputJobProperties()
> configureOutputJobProperties()
> Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HIVE-2773:
------------------------------

    Attachment: HIVE-2773.D1815.1.patch

toffer requested code review of "HIVE-2773 [jira] HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output".
Reviewers: JIRA, cwsteinbach

  HIVE-2773 HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

TEST PLAN
  Empty

REVISION DETAIL
  https://reviews.facebook.net/D1815

AFFECTED FILES
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler.java
  ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/3867/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2773
>                 URL: https://issues.apache.org/jira/browse/HIVE-2773
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Francis Liu
>              Labels: hcatalog, storage_handler
>         Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.patch
>
>
> HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).
> My proposal is to replace configureTableJobProperties() with two methods:
> configureInputJobProperties()
> configureOutputJobProperties()
> Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

Posted by "Francis Liu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Francis Liu updated HIVE-2773:
------------------------------

    Attachment: HIVE-2773.patch
    
> HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2773
>                 URL: https://issues.apache.org/jira/browse/HIVE-2773
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Francis Liu
>              Labels: hcatalog, storage_handler
>         Attachments: HIVE-2773.patch
>
>
> HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).
> My proposal is to replace configureTableJobProperties() with two methods:
> configureInputJobProperties()
> configureOutputJobProperties()
> Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output

Posted by "Alexander Lorenz-Alten (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214854#comment-13214854 ] 

Alexander Lorenz-Alten commented on HIVE-2773:
----------------------------------------------

Since Hive need the new methods to work with HCatalog the interface should be extended. Later it can be dropped or marked as obsolete. For a flumeNG hive sink the new methods are a good improvement. 
                
> HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2773
>                 URL: https://issues.apache.org/jira/browse/HIVE-2773
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Francis Liu
>            Assignee: Francis Liu
>              Labels: hcatalog, storage_handler
>         Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.patch
>
>
> HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237).
> My proposal is to replace configureTableJobProperties() with two methods:
> configureInputJobProperties()
> configureOutputJobProperties()
> Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira