You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@sqoop.apache.org by be...@gmail.com on 2011/08/09 19:05:29 UTC

Re: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is partitions

Moving the discussion on apache sqoop mailing list. Please continue it here.

Regards
Bejoy K S

-----Original Message-----
From: bejoyks@gmail.com
Date: Tue, 9 Aug 2011 16:54:44 
To: <sq...@cloudera.org>
Reply-To: bejoyks@gmail.com
Subject: Re: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is partitions

Yes Sqoop imports and exports are totally on parallel processing/ map only processes . No reduce operation required in such scenarios.
 You are not  doing any sort of aggregated operation while performing imports and exports, hence reducer do hardly come to play.
SQOOP with a reduce job, I don't have a clue. Are you looking out for some specific implementation? If so please share more details.

Regards
Bejoy K S

-----Original Message-----
From: Sonal <im...@gmail.com>
Date: Tue, 9 Aug 2011 07:52:55 
To: Sqoop Users<sq...@cloudera.org>
Reply-To: sqoop-user@cloudera.org
Subject: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is partitions

Hi,

Thanks for reply.
So is it sqoop is just parallel processing , even if you have primary
key/unique index/partition on table?

Is there any case in which sqoop can make use of reduce job.?
Is there any way we can set the batchsize/fetchsize in sqoop?

Thanks & Regards,
Sonal Kumar


On Aug 9, 7:44 pm, bejo...@gmail.com wrote:
> Hi Sonal
>         AFAIK Sqoop import and export jobs kicks of map tasks alone, both are map only jobs.
>  In imports the data set to be imported is equally distributed across the mappers and each mapper is responsible for firing its corresponding  SQL query and fetch data to hdfs. Here no reduce operation required as it is just  parallel processing(parallel fetching of data) happening under the hood. Similar case applies for SQOOP export as well, parallel inserts happening under the hood. For parallel processing just map tasks alone is fine no reduce operation needed.
>
> Regards
> Bejoy K S
>
> -----Original Message-----
> From: Sonal <im...@gmail.com>
> Date: Tue, 9 Aug 2011 04:02:10
> To: Sqoop Users<sq...@cloudera.org>
>
> Reply-To: sqoop-u...@cloudera.org
> Subject: [sqoop-user] Sqoop export not having reduce jobs , even table is partitions
>
> Hi,
>
> I am trying to load the data into db using sqoop export with following
> command:
> sqoop export --connect jdbc:oracle:thin:@adc2190481.us.oracle.com:
> 45773:dooptry --username sh --password sh --export-dir $ORACLE_HOME/
> work/SALES_input --table SALES_OLH_RANGE -m 4
>
> It is able to insert the data , but it is only map jobs
> 11/08/09 03:57:42 WARN tool.BaseSqoopTool: Setting your password on
> the command-line is insecure. Consider using -P instead.
> 11/08/09 03:57:42 INFO tool.CodeGenTool: Beginning code generation
> 11/08/09 03:57:42 INFO manager.OracleManager: Time zone has been set
> to GMT
> 11/08/09 03:57:42 INFO manager.SqlManager: Executing SQL statement:
> SELECT t.* FROM SALES_OLH_RANGE t
> 11/08/09 03:57:42 WARN manager.SqlManager: SQLException closing
> ResultSet: java.sql.SQLException: Could not commit with auto-commit
> set on
> 11/08/09 03:57:42 INFO manager.SqlManager: Executing SQL statement:
> SELECT t.* FROM SALES_OLH_RANGE t
> 11/08/09 03:57:42 WARN manager.SqlManager: SQLException closing
> ResultSet: java.sql.SQLException: Could not commit with auto-commit
> set on
> 11/08/09 03:57:42 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/
> hadoop
> 11/08/09 03:57:42 INFO orm.CompilationManager: Found hadoop core jar
> at: /usr/lib/hadoop/hadoop-0.20.2+737-core.jar
> Note: /net/adc2190481/scratch/sonkumar/view_storage/sonkumar_hadooptry/
> work/./SALES_OLH_RANGE.java uses or overrides a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> 11/08/09 03:57:43 INFO orm.CompilationManager: Writing jar file: /tmp/
> sqoop/compile/SALES_OLH_RANGE.jar
> 11/08/09 03:57:43 INFO mapreduce.ExportJobBase: Beginning export of
> SALES_OLH_RANGE
> 11/08/09 03:57:44 INFO manager.OracleManager: Time zone has been set
> to GMT
> 11/08/09 03:57:44 INFO manager.SqlManager: Executing SQL statement:
> SELECT t.* FROM SALES_OLH_RANGE t
> 11/08/09 03:57:44 WARN manager.SqlManager: SQLException closing
> ResultSet: java.sql.SQLException: Could not commit with auto-commit
> set on
> 11/08/09 03:57:44 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 11/08/09 03:57:44 INFO input.FileInputFormat: Total input paths to
> process : 1
> 11/08/09 03:57:44 INFO input.FileInputFormat: Total input paths to
> process : 1
> 11/08/09 03:57:44 INFO mapred.JobClient: Running job: job_local_0001
> 11/08/09 03:57:45 INFO mapred.JobClient:  map 0% reduce 0%
> 11/08/09 03:57:50 INFO mapred.LocalJobRunner:
> 11/08/09 03:57:51 INFO mapred.JobClient:  map 24% reduce 0%
> 11/08/09 03:57:53 INFO mapred.LocalJobRunner:
> 11/08/09 03:57:54 INFO mapred.JobClient:  map 41% reduce 0%
> 11/08/09 03:57:56 INFO mapred.LocalJobRunner:
> 11/08/09 03:57:57 INFO mapred.JobClient:  map 58% reduce 0%
> 11/08/09 03:57:59 INFO mapred.LocalJobRunner:
> 11/08/09 03:58:00 INFO mapred.JobClient:  map 75% reduce 0%
> 11/08/09 03:58:02 INFO mapred.LocalJobRunner:
> 11/08/09 03:58:02 INFO mapred.JobClient:  map 92% reduce 0%
> 11/08/09 03:58:03 INFO mapreduce.AutoProgressMapper: Auto-progress
> thread is finished. keepGoing=false
> 11/08/09 03:58:03 INFO mapred.Task: Task:attempt_local_0001_m_000000_0
> is done. And is in the process of commiting
> 11/08/09 03:58:03 INFO mapred.LocalJobRunner:
> 11/08/09 03:58:03 INFO mapred.Task: Task
> 'attempt_local_0001_m_000000_0' done.
> 11/08/09 03:58:03 WARN mapred.FileOutputCommitter: Output path is null
> in cleanup
> 11/08/09 03:58:04 INFO mapred.JobClient:  map 100% reduce 0%
> 11/08/09 03:58:04 INFO mapred.JobClient: Job complete: job_local_0001
> 11/08/09 03:58:04 INFO mapred.JobClient: Counters: 6
> 11/08/09 03:58:04 INFO mapred.JobClient:   FileSystemCounters
> 11/08/09 03:58:04 INFO mapred.JobClient:     FILE_BYTES_READ=41209592
> 11/08/09 03:58:04 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=309754
> 11/08/09 03:58:04 INFO mapred.JobClient:   Map-Reduce Framework
> 11/08/09 03:58:04 INFO mapred.JobClient:     Map input records=918843
> 11/08/09 03:58:04 INFO mapred.JobClient:     Spilled Records=0
> 11/08/09 03:58:04 INFO mapred.JobClient:     SPLIT_RAW_BYTES=154
> 11/08/09 03:58:04 INFO mapred.JobClient:     Map output records=918843
> 11/08/09 03:58:04 INFO mapreduce.ExportJobBase: Transferred 0 bytes in
> 20.3677 seconds (0 bytes/sec)
> 11/08/09 03:58:04 INFO mapreduce.ExportJobBase: Exported 918843
> records.
>
> why reduce jobs are not coming up? Do i have to pass some other option
> as well?
>
> Quick reply will be appreciated.
>
> Thanks & Regards,
> Sonal Kumar
>
> --
> NOTE: The mailing list sqoop-u...@cloudera.org is deprecated in favor of Apache Sqoop mailing list sqoop-u...@incubator.apache.org. Please subscribe to it by sending an email to incubator-sqoop-user-subscr...@apache.org.

-- 
NOTE: The mailing list sqoop-user@cloudera.org is deprecated in favor of Apache Sqoop mailing list sqoop-user@incubator.apache.org. Please subscribe to it by sending an email to incubator-sqoop-user-subscribe@apache.org.

Re: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is partitions

Posted by Arvind Prabhakar <ar...@apache.org>.

Thanks Bejoy.

We have considered adding reduce jobs to Sqoop to further partition
the output files. See [SQOOP-137] for more details.

[SQOOP-137] https://issues.cloudera.org/browse/SQOOP-137

Thanks,
Arvind

On Tue, Aug 9, 2011 at 10:05 AM,  <be...@gmail.com> wrote:
> Moving the discussion on apache sqoop mailing list. Please continue it here.
>
> Regards
> Bejoy K S
>
> -----Original Message-----
> From: bejoyks@gmail.com
> Date: Tue, 9 Aug 2011 16:54:44
> To: <sq...@cloudera.org>
> Reply-To: bejoyks@gmail.com
> Subject: Re: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is partitions
>
> Yes Sqoop imports and exports are totally on parallel processing/ map only processes . No reduce operation required in such scenarios.
>  You are not  doing any sort of aggregated operation while performing imports and exports, hence reducer do hardly come to play.
> SQOOP with a reduce job, I don't have a clue. Are you looking out for some specific implementation? If so please share more details.
>
> Regards
> Bejoy K S
>
> -----Original Message-----
> From: Sonal <im...@gmail.com>
> Date: Tue, 9 Aug 2011 07:52:55
> To: Sqoop Users<sq...@cloudera.org>
> Reply-To: sqoop-user@cloudera.org
> Subject: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is partitions
>
> Hi,
>
> Thanks for reply.
> So is it sqoop is just parallel processing , even if you have primary
> key/unique index/partition on table?
>
> Is there any case in which sqoop can make use of reduce job.?
> Is there any way we can set the batchsize/fetchsize in sqoop?
>
> Thanks & Regards,
> Sonal Kumar
>
>
> On Aug 9, 7:44 pm, bejo...@gmail.com wrote:
>> Hi Sonal
>>         AFAIK Sqoop import and export jobs kicks of map tasks alone, both are map only jobs.
>>  In imports the data set to be imported is equally distributed across the mappers and each mapper is responsible for firing its corresponding  SQL query and fetch data to hdfs. Here no reduce operation required as it is just  parallel processing(parallel fetching of data) happening under the hood. Similar case applies for SQOOP export as well, parallel inserts happening under the hood. For parallel processing just map tasks alone is fine no reduce operation needed.
>>
>> Regards
>> Bejoy K S
>>
>> -----Original Message-----
>> From: Sonal <im...@gmail.com>
>> Date: Tue, 9 Aug 2011 04:02:10
>> To: Sqoop Users<sq...@cloudera.org>
>>
>> Reply-To: sqoop-u...@cloudera.org
>> Subject: [sqoop-user] Sqoop export not having reduce jobs , even table is partitions
>>
>> Hi,
>>
>> I am trying to load the data into db using sqoop export with following
>> command:
>> sqoop export --connect jdbc:oracle:thin:@adc2190481.us.oracle.com:
>> 45773:dooptry --username sh --password sh --export-dir $ORACLE_HOME/
>> work/SALES_input --table SALES_OLH_RANGE -m 4
>>
>> It is able to insert the data , but it is only map jobs
>> 11/08/09 03:57:42 WARN tool.BaseSqoopTool: Setting your password on
>> the command-line is insecure. Consider using -P instead.
>> 11/08/09 03:57:42 INFO tool.CodeGenTool: Beginning code generation
>> 11/08/09 03:57:42 INFO manager.OracleManager: Time zone has been set
>> to GMT
>> 11/08/09 03:57:42 INFO manager.SqlManager: Executing SQL statement:
>> SELECT t.* FROM SALES_OLH_RANGE t
>> 11/08/09 03:57:42 WARN manager.SqlManager: SQLException closing
>> ResultSet: java.sql.SQLException: Could not commit with auto-commit
>> set on
>> 11/08/09 03:57:42 INFO manager.SqlManager: Executing SQL statement:
>> SELECT t.* FROM SALES_OLH_RANGE t
>> 11/08/09 03:57:42 WARN manager.SqlManager: SQLException closing
>> ResultSet: java.sql.SQLException: Could not commit with auto-commit
>> set on
>> 11/08/09 03:57:42 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/
>> hadoop
>> 11/08/09 03:57:42 INFO orm.CompilationManager: Found hadoop core jar
>> at: /usr/lib/hadoop/hadoop-0.20.2+737-core.jar
>> Note: /net/adc2190481/scratch/sonkumar/view_storage/sonkumar_hadooptry/
>> work/./SALES_OLH_RANGE.java uses or overrides a deprecated API.
>> Note: Recompile with -Xlint:deprecation for details.
>> 11/08/09 03:57:43 INFO orm.CompilationManager: Writing jar file: /tmp/
>> sqoop/compile/SALES_OLH_RANGE.jar
>> 11/08/09 03:57:43 INFO mapreduce.ExportJobBase: Beginning export of
>> SALES_OLH_RANGE
>> 11/08/09 03:57:44 INFO manager.OracleManager: Time zone has been set
>> to GMT
>> 11/08/09 03:57:44 INFO manager.SqlManager: Executing SQL statement:
>> SELECT t.* FROM SALES_OLH_RANGE t
>> 11/08/09 03:57:44 WARN manager.SqlManager: SQLException closing
>> ResultSet: java.sql.SQLException: Could not commit with auto-commit
>> set on
>> 11/08/09 03:57:44 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> processName=JobTracker, sessionId=
>> 11/08/09 03:57:44 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 11/08/09 03:57:44 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 11/08/09 03:57:44 INFO mapred.JobClient: Running job: job_local_0001
>> 11/08/09 03:57:45 INFO mapred.JobClient:  map 0% reduce 0%
>> 11/08/09 03:57:50 INFO mapred.LocalJobRunner:
>> 11/08/09 03:57:51 INFO mapred.JobClient:  map 24% reduce 0%
>> 11/08/09 03:57:53 INFO mapred.LocalJobRunner:
>> 11/08/09 03:57:54 INFO mapred.JobClient:  map 41% reduce 0%
>> 11/08/09 03:57:56 INFO mapred.LocalJobRunner:
>> 11/08/09 03:57:57 INFO mapred.JobClient:  map 58% reduce 0%
>> 11/08/09 03:57:59 INFO mapred.LocalJobRunner:
>> 11/08/09 03:58:00 INFO mapred.JobClient:  map 75% reduce 0%
>> 11/08/09 03:58:02 INFO mapred.LocalJobRunner:
>> 11/08/09 03:58:02 INFO mapred.JobClient:  map 92% reduce 0%
>> 11/08/09 03:58:03 INFO mapreduce.AutoProgressMapper: Auto-progress
>> thread is finished. keepGoing=false
>> 11/08/09 03:58:03 INFO mapred.Task: Task:attempt_local_0001_m_000000_0
>> is done. And is in the process of commiting
>> 11/08/09 03:58:03 INFO mapred.LocalJobRunner:
>> 11/08/09 03:58:03 INFO mapred.Task: Task
>> 'attempt_local_0001_m_000000_0' done.
>> 11/08/09 03:58:03 WARN mapred.FileOutputCommitter: Output path is null
>> in cleanup
>> 11/08/09 03:58:04 INFO mapred.JobClient:  map 100% reduce 0%
>> 11/08/09 03:58:04 INFO mapred.JobClient: Job complete: job_local_0001
>> 11/08/09 03:58:04 INFO mapred.JobClient: Counters: 6
>> 11/08/09 03:58:04 INFO mapred.JobClient:   FileSystemCounters
>> 11/08/09 03:58:04 INFO mapred.JobClient:     FILE_BYTES_READ=41209592
>> 11/08/09 03:58:04 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=309754
>> 11/08/09 03:58:04 INFO mapred.JobClient:   Map-Reduce Framework
>> 11/08/09 03:58:04 INFO mapred.JobClient:     Map input records=918843
>> 11/08/09 03:58:04 INFO mapred.JobClient:     Spilled Records=0
>> 11/08/09 03:58:04 INFO mapred.JobClient:     SPLIT_RAW_BYTES=154
>> 11/08/09 03:58:04 INFO mapred.JobClient:     Map output records=918843
>> 11/08/09 03:58:04 INFO mapreduce.ExportJobBase: Transferred 0 bytes in
>> 20.3677 seconds (0 bytes/sec)
>> 11/08/09 03:58:04 INFO mapreduce.ExportJobBase: Exported 918843
>> records.
>>
>> why reduce jobs are not coming up? Do i have to pass some other option
>> as well?
>>
>> Quick reply will be appreciated.
>>
>> Thanks & Regards,
>> Sonal Kumar
>>
>> --
>> NOTE: The mailing list sqoop-u...@cloudera.org is deprecated in favor of Apache Sqoop mailing list sqoop-u...@incubator.apache.org. Please subscribe to it by sending an email to incubator-sqoop-user-subscr...@apache.org.
>
> --
> NOTE: The mailing list sqoop-user@cloudera.org is deprecated in favor of Apache Sqoop mailing list sqoop-user@incubator.apache.org. Please subscribe to it by sending an email to incubator-sqoop-user-subscribe@apache.org.
>