You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Scott Carey (JIRA)" <ji...@apache.org> on 2009/06/04 04:24:08 UTC

[jira] Commented: (HADOOP-5967) Sqoop should only use a single map task

    [ https://issues.apache.org/jira/browse/HADOOP-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716131#action_12716131 ] 

Scott Carey commented on HADOOP-5967:
-------------------------------------

Some databases optimize multiple queries doing sequential scans on the same table at the same time by having them 'tag along' with the same sequential scan (Postgres, at least) which avoids the O( N^2 ) issue.  But LIMIT ... OFFSET is not guaranteed to return distinct, consistent partitions unless it has an ORDER BY clause and is in the same transaction anyway.

> Sqoop should only use a single map task
> ---------------------------------------
>
>                 Key: HADOOP-5967
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5967
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>            Priority: Minor
>         Attachments: single-mapper.patch
>
>
> The current DBInputFormat implementation uses SELECT ... LIMIT ... OFFSET statements to read from a database table. This actually results in several queries all accessing the same table at the same time. Most database implementations will actually use a full table scan for each such query, starting at row 1 and scanning down until the OFFSET is reached before emitting data to the client. The upshot of this is that we see O(n^2) performance in the size of the table when using a large number of mappers, when a single mapper would read through the table in O(n) time in the number of rows.
> This patch sets the number of map tasks to 1 in the MapReduce job sqoop launches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.