You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ian Holsman (JIRA)" <ji...@apache.org> on 2010/02/08 09:03:29 UTC

[jira] Created: (PIG-1229) allow pig to write output into a JDBC db

allow pig to write output into a JDBC db
----------------------------------------

                 Key: PIG-1229
                 URL: https://issues.apache.org/jira/browse/PIG-1229
             Project: Pig
          Issue Type: New Feature
          Components: impl
            Reporter: Ian Holsman
            Priority: Minor
         Attachments: DbStorage.java

UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892378#action_12892378 ] 

Ashutosh Chauhan commented on PIG-1229:
---------------------------------------

Since fix to PIG-1424 doesnt look straight forward and I dont think anyone is working on it, I will suggest to unblock this useful piggy bank functionality from Pig's issues. We can take the original approach suggested in the first patch of passing jdbc url string as constructor argument instead of store location. 
Ankur, do you have cycles to generate the patch which we will commit now so it makes into 0.8.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852243#action_12852243 ] 

Ankur commented on PIG-1229:
----------------------------

Ashutosh,
   Thanks for the review comments. Accepting the store location via setStoreLocation() definitely makes sense. However I am not sure about checking database reachability in checkOutputSepcs() 
since that may be called on the client side as well and the DB machine may not be reachable from the client machine. Isn't OutputFormat's setupTask()  a better place to do a DB availability checks ?
This sounds like a reasonable ask before a commit. I will incorporate this and submit a new patch 

> Doing DataType.find() ....
I assume this is what you have in mind :-
    1. Getting DB Schema information for the table we are writing to.
    2. Use checkSchema() API to validate this with Pig supplied schema and cache it.
    3. Use the cached information in the putNext() method.

This is more of a performance enhancement and looks like more work. So I would prefer if we track this as a JIRA for DBStorage.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-1229:
----------------------------------

        Status: Resolved  (was: Patch Available)
    Resolution: Fixed

Changes look good. Core test failures looks irrelevant as there are no changes in main src/ tree of Pig only in contrib. Thanks Ian for your initial work. Thanks, Ankur for your persistence in getting this committed . 

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895171#action_12895171 ] 

Aaron Kimball commented on PIG-1229:
------------------------------------

I'm pretty confused by what you mean here. HSQLDB is fully SQL-92 compliant and provides ACID transactional semantics. If you execute a {{CREATE TABLE}} statement in a {{Statement}} or {{PreparedStatement}} created in a given {{Connection}} and then call {{Connection.commit()}}, this commit statement will either throw a {{SQLException}} indicating failure, or return silently, indicating that the results have been made durable and are visible to all subsequent transactions of concurrent clients.

This version of HSQLDB has been available for several years at this point. It is quite stable. If sleeping for a random timeout interval fixes your issue, then you have most likely misconfigured something. You might want to double-check; have you called {{Connection.setAutoCommit()}}? If this is configured to false, do you call {{commit()}} after making an update?

Note that if you are using separate processes to connect to HSQLDB, then you should start a single {{Server}} instance that should connect to the underlying database resource with {{file:}} or {{mem:}} to operate on a file-backed or memory-backed database, but the child processes should then connect to the server using {{jdbc:hsqldb:hsql://<server>:<port>/<dbname>}} so they actually serialize through the server. Concurrent clients in separate processes should not access the same database via {{jdbc:hsqldb:file://}} resources.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Attachment:     (was: jira-1229-final.test-fix.patch)

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895330#action_12895330 ] 

Ashutosh Chauhan commented on PIG-1229:
---------------------------------------

Tested and it worked. Committed. Thanks Aaron and Ankur for help in fixing the issue.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Status: In Progress  (was: Patch Available)

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833998#action_12833998 ] 

Aaron Kimball commented on PIG-1229:
------------------------------------

Looks much better - thanks for adding the test case too. Including hsqldb.jar in your patch didn't work, by the way -- you'll need to attach that jar separately to the issue I think.


> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.6.0
>
>         Attachments: jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831337#action_12831337 ] 

Ankur commented on PIG-1229:
----------------------------

Aaron, Thanks for the suggestions.
I'll have an updated patch coming soon.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>         Attachments: DbStorage.java
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ian Holsman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Holsman updated PIG-1229:
-----------------------------

    Attachment:     (was: DbStorage.java)

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>         Attachments: jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Attachment: pig-1229.2.patch

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Status: Patch Available  (was: In Progress)

Regenerated the patch as per Ashutosh's suggestion.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833949#action_12833949 ] 

Hadoop QA commented on PIG-1229:
--------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435875/jira-1229.patch
  against trunk revision 909921.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 4 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/211/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/211/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/211/console

This message is automatically generated.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.6.0
>
>         Attachments: jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851665#action_12851665 ] 

Hadoop QA commented on PIG-1229:
--------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12440249/jira-1229-v2.patch
  against trunk revision 928950.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 4 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/260/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/260/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/260/console

This message is automatically generated.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Attachment: jira-1229.patch

Updated code with added test case using HSQLDB (binary part of the patch).

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>         Attachments: jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Status: In Progress  (was: Patch Available)

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841003#action_12841003 ] 

Ashutosh Chauhan commented on PIG-1229:
---------------------------------------

Ankur,

With recent Load-Store interface changes, the patch doesn't compile. Can you regenerate it? And while you are at it, can you also make changes in ivy.xml so that hsqldb.jar is pulled over internet instead of needing it to be bundled with pig distribution.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: hsqldb.jar, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861177#action_12861177 ] 

Ashutosh Chauhan commented on PIG-1229:
---------------------------------------

Ankur,

The stack trace above is out of sync with trunk. Can you upload the patch with this alternative approach that you are trying. I think it might be possible to get this working.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Sandesh Devaraju (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910331#action_12910331 ] 

Sandesh Devaraju commented on PIG-1229:
---------------------------------------

I upgraded to 0.7 and tried the updated patch. However, I don't see any entries in the database.
Upon further investigation, I noticed that in my particular case, the batch size was 100 and the number of output records that ended up at every reducer was below this threshold.
I added a debug statement to the OuputComitter's commitTask method and found that count was 0.
Any ideas why this might be happening?

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Attachment:     (was: hsqldb.jar)

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894963#action_12894963 ] 

Ashutosh Chauhan commented on PIG-1229:
---------------------------------------

I am still getting the same exception 
{code}
java.io.IOException: JDBC Error
        at org.apache.pig.piggybank.storage.DBStorage.prepareToWrite(DBStorage.java:291)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.<init>(PigOutputFormat.java:124)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:85)
        at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:488)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:610)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.sql.SQLException: Table not found in statement [insert into ttt (id, name, ratio) values (?,?,?)]
        at org.hsqldb.jdbc.Util.throwError(Unknown Source)
        at org.hsqldb.jdbc.jdbcPreparedStatement.<init>(Unknown Source)
        at org.hsqldb.jdbc.jdbcConnection.prepareStatement(Unknown Source)
        at org.apache.pig.piggybank.storage.DBStorage.prepareToWrite(DBStorage.java:288)
        ... 6 more
{code}

Reading through few internet forums it seems that there are subtle differences in "stand-alone" mode Vs "server" mode of hsqldb . May be starting hsqldb instance in server mode would alleviate the problem.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894975#action_12894975 ] 

Aaron Kimball commented on PIG-1229:
------------------------------------

Haven't looked at how you're using hsqldb in this patch, but I've got a lot of experience using HSQLDB for testing.

If you're running one or more tests in a single process that requires an HSQLDB-backed database, you do not need to create a new instance of Server. You can just set your JDBC connect string to {{jdbc:hsqldb:mem:foodbname}} and get a {{Connection}} instance to a memory-backed single-process database called {{foodbname}}. This database will exist for the lifetime of the Java process. You can have multiple {{Connection}} instances (concurrently or serially) open to this database and it will function like you expect a database to work like. The advantage of not using a server is that this does not require binding a port; therefore you can run multiple tests concurrently without worrying about collisions. Similarly, there's no need to use the {{jdbc:hsqldb:file}} protocol unless you want to restore the contents of the database in a subsequent process. When your Java process ends, you won't have a bonus file to clean up with {{jdbc:hsqldb:mem}}.

Of course, if you're testing with {{MiniMRCluster}} or something, you'll want to start a Server so that the external mapper processes can connect to the same database via {{jdbc:hsqldb:hsql://server:port/dbname}}. 



> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841391#action_12841391 ] 

Ashutosh Chauhan commented on PIG-1229:
---------------------------------------

Sure. By the way, I am not sure if hsqldb license http://hsqldb.org/web/hsqlLicense.html is compatible with Apache or not. Though, I think if we are pulling it through ivy, we will be fine. Am I correct ?

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: hsqldb.jar, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Fix Version/s: 0.6.0
           Status: Patch Available  (was: Open)

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.6.0
>
>         Attachments: jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Status: Patch Available  (was: In Progress)

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855835#action_12855835 ] 

Ankur commented on PIG-1229:
----------------------------

* Sigh *
The problem is with hadoop's Path implementation that has problems understanding JDBC URLs correctly. So turning relToAbsPathForStoreFunction() does NOT help. 
The URI SyntaxException is now propagated to the point of setting output path for the job. Here is the new trace from the text execution failure with suggested workaround

org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution.
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:332)
        at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
        at org.apache.pig.PigServer.execute(PigServer.java:828)
        at org.apache.pig.PigServer.access$100(PigServer.java:105)
        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
        at org.apache.pig.piggybank.test.storage.TestDBStorage.testWriteToDB(Unknown Source)
Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:624)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:246)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131)
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
        at org.apache.hadoop.fs.Path.initialize(Path.java:140)
        at org.apache.hadoop.fs.Path.<init>(Path.java:126)
        at org.apache.hadoop.fs.Path.<init>(Path.java:45)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:459)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
        at java.net.URI.checkPath(URI.java:1787)
        at java.net.URI.<init>(URI.java:735)
        at org.apache.hadoop.fs.Path.initialize(Path.java:137)


  

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847483#action_12847483 ] 

Olga Natkovich commented on PIG-1229:
-------------------------------------

Is this issue going to be resolved by Monday or should we move the release to Pig 0.8.0?

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: hsqldb.jar, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Attachment: jira-1229-v2.patch

Here is the updated patch that compiles against pig 0.7 branch and implements new load/store APIs. 

Note:- that I haven't used hadoop's DBOutputFormat as the code is not yet moved to o.p.h.mapreduce.lib and hence there are compatibility issues.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841049#action_12841049 ] 

Ankur commented on PIG-1229:
----------------------------

Sure, I'll do that. Give me a couple days of time.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: hsqldb.jar, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840963#action_12840963 ] 

Olga Natkovich commented on PIG-1229:
-------------------------------------

Ashutosh, please, review and see if we can pull the jar from IVY.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: hsqldb.jar, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Attachment: jira-1229-final.patch

Hope this one finally goes in .

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869692#action_12869692 ] 

Ashutosh Chauhan commented on PIG-1229:
---------------------------------------

Cool. I created PIG-1424 to track the Pig issue.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur reassigned PIG-1229:
--------------------------

    Assignee: Ankur

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>         Attachments: DbStorage.java
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892999#action_12892999 ] 

Hadoop QA commented on PIG-1229:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12450586/jira-1229-final.patch
  against trunk revision 979781.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 4 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/360/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/360/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/360/console

This message is automatically generated.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847909#action_12847909 ] 

Ankur commented on PIG-1229:
----------------------------

@Ashtosh Chauhan 
I read the HSQLDB license and it looked ok to me but I am not a lawyer :-) . Besides that apache cocoon uses it. I think we should be ok pulling it through ivy.

I'll make the ivy and load-store related changes and submit a new patch on Monday.

Sorry for the delay.
 

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: hsqldb.jar, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Attachment: jira-1229-final.test-fix.patch

Aaron,
         Autocommit() was not the issue.  It was the usage of "jdbc:hsqldb:file:" url in the STORE function that was the problem. Replacing it with "jdbc:hsqldb:hsql://localhost/dbname" solved the issue. Attaching the updated patch with the test case modification.

Really appreciate your help here. Thanks a lot :-)

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831052#action_12831052 ] 

Aaron Kimball commented on PIG-1229:
------------------------------------

Ian, 

This class looks reasonable to me. You'll probably need to format this as a patch to get it accepted into the project though.

Is there a test plan for this code and/or unit tests?

Some database-specific things I've noticed: 
* You create a PreparedStatement, and call its executeUpdate() method several times then call close() on the statement. This assumes you're in Auto-commit mode; I think you should configure the commit mode explicitly when creating the connection. Also, you'll probably get a lot better performance if you use addBatch() / executeBatch() for your batch size rather than individual executeUpdate() statements. You should then call connection.commit() and ps.clear() rather than closing the prepared statement and compiling a new one. 
* If user and pass are null, I think you may need to use DriverManager.getConnection(jdbcUrl) instead of DriverManager.getConnection(jdbcUrl, null, null). Worth a unit test.
* See org.apache.hadoop.mapreduce.lib.db.DBOutputFormat in the MapReduce project for some similar code to take inspiration from. 


> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Priority: Minor
>         Attachments: DbStorage.java
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Attachment:     (was: jira-1229-final.test-fix.patch)

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854740#action_12854740 ] 

Ashutosh Chauhan commented on PIG-1229:
---------------------------------------

You  can get rid of this stack-trace by overriding relToAbsPathForStoreLocation() of StoreFunc which DBStorage extends and turning it into no-op. Since, DB location is always absolute, there is no need of default behavior which is there in StoreFunc.  

For DataType.find() I found even PigStorage does the same, so this patch is no worse then PigStorage in that way.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Sandesh Devaraju (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910629#action_12910629 ] 

Sandesh Devaraju commented on PIG-1229:
---------------------------------------

I narrowed down the problem to org.apache.hadoop.mapred.Task.java lines 411-418.

{code:title=org.apache.hadoop.mapred.Task.java|linenumbers=true|firstline=411}
if (useNewApi) {
  LOG.debug("using new api for output committer");
  outputFormat =
        ReflectionUtils.newInstance(taskContext.getOutputFormatClass(), job);
  committer = outputFormat.getOutputCommitter(taskContext);
} else {
  committer = conf.getOutputCommitter();
}
{code}

But DBStorage UDF assumes that the OutputFormat is in a closure.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Attachment: hsqldb.jar

Attaching hsqldb.jar separately as including it in the patch does not work

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.6.0
>
>         Attachments: hsqldb.jar, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Attachment:     (was: jira-1229.patch)

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ian Holsman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Holsman updated PIG-1229:
-----------------------------

    Attachment: DbStorage.java

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Priority: Minor
>         Attachments: DbStorage.java
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1229:
--------------------------------

    Fix Version/s:     (was: 0.7.0)
                   0.8.0

Moving to Pig 0.8.0 release since we are branching today.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: hsqldb.jar, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856761#action_12856761 ] 

Ankur commented on PIG-1229:
----------------------------

Any updates ? 

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857154#action_12857154 ] 

Ashutosh Chauhan commented on PIG-1229:
---------------------------------------

As per http://www.mail-archive.com/pig-user@hadoop.apache.org/msg02257.html thread I am wondering if it will be safe and possible to make sure that job using this storage has speculative execution turned-off.  Otherwise, with S.E. turned on, there are too many scenarios we would have to handle. What do you think?

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910441#action_12910441 ] 

Ankur commented on PIG-1229:
----------------------------

In the putNext() method, count is reset to 0 every time the number of tuples added to the batch exceed 'batchSize'. The batch is then executed and its parameters cleared. There is currently 
an ExecException in the putNext() method that is being ignored. Can you try adding some debugging System.outs and check the stdout/stderr of your reducers to see if that is the problem ?

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853843#action_12853843 ] 

Ankur commented on PIG-1229:
----------------------------

So accepting the JDBC URL in setStoreLocation() exposes a flaw in Hadoop's Path class and it causes test case to fail with following exception

java.net.URISyntaxException: Relative path in absolute URI: jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
        at org.apache.hadoop.fs.Path.initialize(Path.java:140)
        at org.apache.hadoop.fs.Path.<init>(Path.java:126)
        at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:238)
        at org.apache.pig.StoreFunc.relToAbsPathForStoreLocation(StoreFunc.java:60)
        at org.apache.pig.impl.logicalLayer.parser.QueryParser.StoreClause(QueryParser.java:3587)
...
...
Caused by: java.net.URISyntaxException: Relative path in absolute URI: jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
        at java.net.URI.checkPath(URI.java:1787)
        at java.net.URI.<init>(URI.java:735)
        at org.apache.hadoop.fs.Path.initialize(Path.java:137)

Looking at the code of Path.java it seems like it extracts scheme based on the first occurrence of ':', this causes authority and path to be extracted incorrectly resulting in the above exception thrown java.net.URI. 
However if I try to initialize URI directly with the URL string, no exception is thrown.

As for DB reachability check, I think it is ok to check the availability at the runtime an fail if its available. We do this prepareToWrite(). 
For performance enhancement, I think we can track that via separate issue.

This patch has taken quite a while now and I wouldn't want to delay it further by depending on a hadoop fix.

So If a reviewer does not find any blocking issues then my suggestion is to go ahead with the commit. 

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851455#action_12851455 ] 

Olga Natkovich commented on PIG-1229:
-------------------------------------

Since we already branched, this feature will not go into 0.7.0 branch but would instead be committed to trunk and released as part of 0.8.0 release. I think this patch should work just fine against trunk since we have noit deviated much.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Attachment: jira-1229-v3.patch

Here you go ...

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch, jira-1229-v3.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-1229:
----------------------------------

    Attachment: pig-1229.patch

Ankur,

Sorry for getting back late on this. I fiddled with your latest patch and was able to make some progress on it. I am able to get rid of those Path problems (looks like Pig itself is not dealing with it correctly at one place). I think with the patch that I attached should work but I am not able to get test case to pass because of hsqldb problem which I am not able to resolve. I keep getting this error from it:
{noformat}
Caused by: java.sql.SQLException: The database is already in use by another process: org.hsqldb.persist.NIOLockFile@4abea04e[file =/private/tmp/batchtest.lck, exists=true, locked=false, valid=false, fl =null]: java.lang.Exception: checkHeartbeat(): lock file [/private/tmp/batchtest.lck] is presumably locked by another process.
        at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
        at org.hsqldb.jdbc.jdbcConnection.<init>(Unknown Source)
        at org.hsqldb.jdbcDriver.getConnection(Unknown Source)
        at org.hsqldb.jdbcDriver.connect(Unknown Source)
        at java.sql.DriverManager.getConnection(DriverManager.java:582)
        at java.sql.DriverManager.getConnection(DriverManager.java:185)
        at org.apache.pig.piggybank.storage.DBStorage.prepareToWrite(DBStorage.java:274)

{noformat}
Anyways here are the changes I made:
1.
{code}
Index:src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
===================================================================
-                conf.set("pig.streaming.log.dir", 
-                            new Path(outputPath, LOG_DIR).toString());
+//                conf.set("pig.streaming.log.dir", 
+//                            new Path(outputPath, LOG_DIR).toString());
                 conf.set("pig.streaming.task.output.dir", outputPath);
             }
{code}
This looks like a problem in Pig. Here Pig is incorrectly assuming that it can put logs generated during stream command in output location which is incorrect if output location is something like DB. Since this needs changes in main Pig code, I will suggest to open new jira for it and track it there.

2. Then in DBStorage.java
{code}
@Override
public void setStoreLocation(String location, Job job) throws IOException {
	  job.getConfiguration().set("pig.db.conn.string", location);
}
@Override
public RecordWriter<NullWritable, NullWritable> getRecordWriter(
    TaskAttemptContext context) throws IOException, InterruptedException {
  jdbcURL = context.getConfiguration().get("pig.db.conn.string");
  return null;
}
{code} 
Need to save db connection string in job in setStoreLocation() and then retrieve it in backend in getRecordWriter(). 

3. In DBStorage.java
{code}
@Override
	public void cleanupOnFailure(String location, Job job) throws IOException {
	  log.error("Job has failed.");
	}
{code}
You need to necessarily override this function of StoreFunc() as default implementation assumes FileSystem as the output location. Currently, I left it as no-op but it can be improved to do rollbacks, release db connections etc. 


> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Attachment: jira-1229-final.test-fix.patch

Attaching the patch with fixes to the test case.
1. Starting the HsqlDB server manually - dbServer.start().
2. Supplying user name and password when initializing DBStorage.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1229:
--------------------------------

    Fix Version/s:     (was: 0.6.0)
                   0.7.0

Updated version number since it is not a blocker for 0.6.0 that has been out for a while

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: hsqldb.jar, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-1229:
-----------------------

    Attachment: jira-1229-final.test-fix.patch

Here is my understanding of what happens

1. The main thread in the JVM executing the test initializes MiniDFSCluster,  MiniMRCluster and HSQLDB server all in different threads.
2. The test setUp() method then executed to create table 'ttt' to which data will be written by DBStorage() in the test.
3. Pig statements are then executed that spawn M/R job as a separate process that tries to get a connection to the database and create a preparedStatement for table 'ttt'. This fails sometimes as DB thread does NOT get a chance to fully persist the table information and the exception is thrown from the map-tasks as noted by Ashutosh.

The fix for this is to add a 5 sec sleep in setUp() method to give DB a chance to persist table information. This alleviates the problem and test passes for repeated multiple runs. 

Note that Ideal fix would have been to do a busy wait for table creation completion but i don't see a method in HSqlDB to do that. 

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869552#action_12869552 ] 

Ankur commented on PIG-1229:
----------------------------

Hi Ashutosh,
                   Thanks for helping out here. The error that you see - "...The database is already in use by another process" is due to locking issues in hsqldb 1.8.0.7. Upgrading to 1.8.0.10 
alleviates the problem and the test passes successfully. Few changes that I did

1. Added a placeholder record-writer as PigOutputFormat calls close() on it throwing null pointer exception if we return null from our output format.
2. Looks like you missed the ivy.xml and build.xml changes to pull the correct hsqldb jar.
 

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852190#action_12852190 ] 

Ashutosh Chauhan commented on PIG-1229:
---------------------------------------

Few suggestions:

Reading from test case, currently store statements look like:
{code}
 b = store a into 'dummy' using org.apache.pig.piggybank.storage.DBStorage('org.hsqldb.jdbcDriver','jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100','insert into a...');
{code}
here 'dummy' is totally ignored. while this works, from a user experience following might be better:

{code}
 b = store a into 'jdbc:hsqldb:file:/tmp/batchtest' using org.apache.pig.piggybank.storage.DBStorage('org.hsqldb.jdbcDriver','hsqldb.default_table_type=cached;hsqldb.cache_rows=100','insert into a');
{code}
that is, have db url as store location and second param of store func as db params. you can use setStoreLocation() to store url. Apart from more intuitive store stmt, this will also allow you to check whether DB is reachable or not at compile time itself, instead of at runtime. You can do that via checkOutputSpecs(). 

Doing DataType.findType() on every element of every tuple will be expensive. I am wondering if you can get hold of schema in your store func and use that to map pig types to sql types.

All of these suggestions may come in as later patches. So, if you want to get this committed and track these separately I think that also will work as this patch is functionally complete. 

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895182#action_12895182 ] 

Aaron Kimball commented on PIG-1229:
------------------------------------

Glad you got it working!

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857253#action_12857253 ] 

Ankur commented on PIG-1229:
----------------------------

So I read the complete thread and here are my thoughts:-

- Speculative execution issue : With recent changes of moving to Hadoop's I/O format in Load/Store, DBStorage has been modified to commit the data to DB in OutputCommitter's 
commitTask() method.   Hadoop itself gaurantees that the method will be called only for first successful attempt so it shouldn't matter whether or not speculative execution is on. 
BUT this does NOT solve the problem where certain tasks finished successfully but the JOB itself failed in which case the data from successful attempts should be rolled back.

- Writing to Temporary Table: Even this does not handle the case the above case since some of the tasks would have moved their data to the actual table.

- Bulk loading : This is the most suitable option in my opinion if the data is large. However for small to medium data size (like aggregate summaries), I found DBStorage UDF to be most helpful. 
It just eliminates one more layer of processing from the application. In fact this was precisely the reason it was written for.

So in a nutshell, using a single mapper/reducer with this patch should be good regardless of speculative execution being off/on. In case of multiple mappers/reducers writing to DB it should be application's
responsibility to cleanup data ONLY IN CASE of job failure.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira