AS t" to get meta information is too expensive for big tables
Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783192#action_12783192 ]
Aaron Kimball commented on MAPREDUCE-1224:
------------------------------------------
@Jeff Sqoop is already using the ResultSetMetaData associated with the query, rather than trying to read the DatabaseMetaData directly. Especially when we eventually support arbitrary user-supplied queries, this will be necessary. It can also be tricky to set all the parameters for a DatabaseMetaData correctly in a generic way. But to get at ResultSetMetaData (which definitely includes the proper typing information), a query must be submitted.
@Spenser This is a good catch and improvement! What database are you testing against? This patch passes unit tests against HSQLDB, PostgreSQL, and Oracle, so +1 from me.
For PostgreSQL and MySQL, Sqoop uses {{connection.setFetchSize()}} to specify a row-buffered (rather than table-buffered) result, so it returns fast. But unfortunately, {{setFetchSize()}} is, like everything else in JDBC, poorly specified, so there isn't a good way to do this generically. This is a good way to ensure that the query returns quickly even if the database does not respect a row-buffered connection.
> Calling "SELECT t.* from <table> AS t" to get meta information is too expensive for big tables
> ----------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1224
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: contrib/sqoop
> Affects Versions: 0.20.1
> Environment: all platforms, generic jdbc driver
> Reporter: Spencer Ho
> Attachments: MAPREDUCE-1224.patch, SqlManager.java
>
>
> The SqlManager uses the query, "SELECT t.* from <table> AS t" to get table spec is too expensive for big tables, and it was called twice to generate column names and types. For tables that are big enough to be map-reduced, this is too expensive to make sqoop useful.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.