You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Aaron Kimball (JIRA)" <ji...@apache.org> on 2009/06/30 23:16:47 UTC

[jira] Created: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Sqoop will fail with OutOfMemory on large tables using mysql
------------------------------------------------------------

                 Key: MAPREDUCE-685
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: contrib/sqoop
            Reporter: Aaron Kimball
            Assignee: Aaron Kimball
         Attachments: MAPREDUCE-685.patch

The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727224#action_12727224 ] 

Hadoop QA commented on MAPREDUCE-685:
-------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12412298/MAPREDUCE-685.patch.2
  against trunk revision 790971.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/console

This message is automatically generated.

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-685:
------------------------------------

    Status: Patch Available  (was: Open)

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.3.patch, MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-685:
------------------------------------

    Status: Open  (was: Patch Available)

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.3.patch, MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728542#action_12728542 ] 

Aaron Kimball commented on MAPREDUCE-685:
-----------------------------------------

New patch test failures are unrelated to this patch.

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.3.patch, MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730104#action_12730104 ] 

Hudson commented on MAPREDUCE-685:
----------------------------------

Integrated in Hadoop-Mapreduce-trunk #20 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/])
    

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-685.3.patch, MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728071#action_12728071 ] 

Tom White commented on MAPREDUCE-685:
-------------------------------------

The latest patch no longer applies. Can you please regenerate it Aaron?

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728494#action_12728494 ] 

Hadoop QA commented on MAPREDUCE-685:
-------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12412766/MAPREDUCE-685.3.patch
  against trunk revision 791909.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/console

This message is automatically generated.

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.3.patch, MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742426#action_12742426 ] 

Aaron Kimball commented on MAPREDUCE-685:
-----------------------------------------

Because it's not actually the same fix ;) Postgresql wants you to do {{statement.setFetchSize(something_reasonable)}} e.g., 40.

MySQL wants you to do {{statement.setFetchSize(INT_MIN)}}. The only cursor modes MySQL supports are fully buffered (fetch size = 0) and fully row-wise cursors (fetch_size = INT_MIN).

That having been said, I have just finished a postgresql patch ready to post up here this week :) Just waiting for some existing patches to get committed first so that it applies cleanly.

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-685.3.patch, MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-685:
------------------------------------

    Attachment: MAPREDUCE-685.patch.2

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725854#action_12725854 ] 

Todd Lipcon commented on MAPREDUCE-685:
---------------------------------------

Couple notes:
- The SQL_BIG_RESULT hint I mentioned (offline) was meant for the query that actually returns lots of rows. If you're doing LIMIT 1 you don't need it.
- Why check against null stmt in execute()? Isn't it assumed that passing null here would throw an NPE?
- Also, why return null here instead of letting the SQLException fall through?


> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.patch
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-685:
------------------------------------

    Status: Patch Available  (was: Open)

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-685:
------------------------------------

    Status: Open  (was: Patch Available)

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725812#action_12725812 ] 

Aaron Kimball commented on MAPREDUCE-685:
-----------------------------------------

Added patch that fixes this issue. Also includes some other performance enhancements:

* MySQL now uses "LIMIT 1" when making SELECTs against tables for metadata-reading purposes.
* Transactions are no longer opened with TRANSACTION_SERIALIZABLE since it's unnecessary for metadata reads

No new tests for this included since Hadoop testing doesn't mesh well with MySQL. I tested locally by building a 1.7 GB table in mysql and reading into a local HDFS instance. This failed before applying the patch, and succeeds afterwards.

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.patch
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-685:
------------------------------------

    Status: Open  (was: Patch Available)

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-685:
------------------------------------

    Attachment: MAPREDUCE-685.3.patch

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.3.patch, MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-685:
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.21.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Aaron!

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-685.3.patch, MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Martin Dittus (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742345#action_12742345 ] 

Martin Dittus commented on MAPREDUCE-685:
-----------------------------------------

We just found that PostgreSQL shows the same behaviour. What do you think of making this a generic fix instead? It seems Postgres has the same mechanism to enable streaming of ResultSets:

http://jdbc.postgresql.org/documentation/83/query.html -- "Changing code to cursor mode is as simple as setting the fetch size of the Statement to the appropriate size. Setting the fetch size back to 0 will cause all rows to be cached (the default behaviour)."

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-685.3.patch, MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-685:
------------------------------------

    Status: Patch Available  (was: Open)

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.patch
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-685:
------------------------------------

    Status: Patch Available  (was: Open)

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726175#action_12726175 ] 

Aaron Kimball commented on MAPREDUCE-685:
-----------------------------------------

Removed SQL_BIG_RESULT. Also, good call re. the null check; no reason not to pass straight through. I've modified the API for execute() accordingly.

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728258#action_12728258 ] 

Aaron Kimball commented on MAPREDUCE-685:
-----------------------------------------

Attaching rebased patch.

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.3.patch, MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727294#action_12727294 ] 

Aaron Kimball commented on MAPREDUCE-685:
-----------------------------------------

These test errors are unrelated to this patch.

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.patch, MAPREDUCE-685.patch.2
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-685:
------------------------------------

    Attachment: MAPREDUCE-685.patch

> Sqoop will fail with OutOfMemory on large tables using mysql
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-685.patch
>
>
> The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.