You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by "Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco)" <vt...@cisco.com> on 2014/08/07 00:09:35 UTC

Hive table count from Sqoop output

Hi All,

I am planning to import around 10-15 tables from Oracle to Hive using Sqoop. After the successful import of all the tables I need to capture the counts of all tables and mail them.
The straight forward way is to do select count(*) on each table after it gets loaded.
But since I have good number of tables and each count query runs map-reduce to give the table count.... I am looking for an alternative way to get the count of table... say, capture count in a variable that sqoop ouputs.

Here is the sample sqoop output
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:     Launched map tasks=1
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:     Aggregate execution time of reducers(ms)=0
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:   FileSystemCounters
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:     MAPRFS_BYTES_READ=87
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:     MAPRFS_BYTES_WRITTEN=340491064
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=66273
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:   Map-Reduce Framework
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:     Map input records=921343
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:     PHYSICAL_MEMORY_BYTES=522985472
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:     Spilled Records=0
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:     CPU_MILLISECONDS=42560
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:     VIRTUAL_MEMORY_BYTES=1808834560
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:     Map output records=921343
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:     SPLIT_RAW_BYTES=87
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient:     GC time elapsed (ms)=594
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 720.6803 seconds (0 bytes/sec)
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapreduce.ImportJobBase: Retrieved 921343 records.

Is there a way to capture the count from last line (mapreduce.ImportJobBase: Retrieved 921343 records) or from map out records line(Map output records=921343) into a variable...

Please help me in this regard.

[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Vikash Talanki
Engineer - Software
vtalanki@cisco.com
Phone: +1 (408)838 4078

Cisco Systems Limited
SJ-J 3
255 W Tasman Dr
San Jose
CA - 95134
United States
Cisco.com<http://www.cisco.com/>





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html




Re: Hive table count from Sqoop output

Posted by Gwen Shapira <gs...@cloudera.com>.
step 1 - write sqoop output to log:

sqoop .... > sqoop.log 2>&1

step 2 - grep

grep -oP 'Retrieved \K\d*' sqoop.log