You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Aaron Kimball (JIRA)" <ji...@apache.org> on 2009/11/10 22:37:28 UTC

[jira] Updated: (HADOOP-4331) DBOutputFormat: add batch size support for JDBC and recieve DBWritable object in value not in key

     [ https://issues.apache.org/jira/browse/HADOOP-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated HADOOP-4331:
----------------------------------

    Attachment: HADOOP-4331.patch

I would like to request that this issue be reopened. For doing exports from HDFS into a database, the reducer is not always necessary. In a map-only job, the mapper tasks can write directly to the database, saving significant effort over needing to run a shuffle/reduce step. But some map tasks may be very large (e.g., when reading from gzipped files) which expand to 1MM or more records per task. 

In this case, the user should be allowed to specify that a potential lack of atomicity is allowed. (In most use cases, database users enforce that redundant rows are not entered via primary keys or other uniqueness constraints in the database itself anyway.)

Attaching a new patch sync'd to mapreduce trunk; by default, this disables intermediate spills to the db (spill size=0); but this allows you to set the spill size to another number of records, instead.

> DBOutputFormat: add batch size support for JDBC and recieve  DBWritable object in value not in key
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4331
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4331
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Alexander Schwid
>            Assignee: Aaron Kimball
>            Priority: Minor
>         Attachments: HADOOP-4331.patch, patch.txt
>
>
> package mapred.lib.db
> added batch size support for JDBC in DBOutputFormat 
> recieve  DBWritable object in value not in key in DBOutputFormat

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.