You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by John Omernik <jo...@omernik.com> on 2012/08/31 14:13:51 UTC

Force number of records per map task

This is going to sound very odd, but I am hoping to use a transform script
in such a way that I pass a filepath to the transform script, to which it
reads the file and produces a bunch of rows in hive.  In this case the data
is pcaps.  I have a location accessible to all nodes, and I want to have my
transform script read in a file location, and then spit out, for example
the IP addresses that were seen in the packet capture (using a script I've
already written).   Can I do something whereby I load my file locations
into a table in hive (one file per row) and read that table into a
transform script and only have one map task per source row?  I don't want
my script to parse several files, it may make for some poor
parrelelization, but I am having trouble forcing such a small record count
per map task.

Thoughts?

RE: Force number of records per map task

Posted by "Elango, Vikram" <Vi...@SYNTELINC.COM>.
Thanks buddy !!

 

Thanks and regards,
Vikram Elango
The Home Depot, 
Nortel no: 0441-3806 

Mobile: +91-8939662345

 

From: John Omernik [mailto:john@omernik.com] 
Sent: Friday, August 31, 2012 5:44 PM
To: user@hive.apache.org
Subject: Force number of records per map task

 

This is going to sound very odd, but I am hoping to use a transform
script in such a way that I pass a filepath to the transform script, to
which it reads the file and produces a bunch of rows in hive.  In this
case the data is pcaps.  I have a location accessible to all nodes, and
I want to have my transform script read in a file location, and then
spit out, for example the IP addresses that were seen in the packet
capture (using a script I've already written).   Can I do something
whereby I load my file locations into a table in hive (one file per row)
and read that table into a transform script and only have one map task
per source row?  I don't want my script to parse several files, it may
make for some poor parrelelization, but I am having trouble forcing such
a small record count per map task. 

 

Thoughts? 

 

 


Confidential: This electronic message and all contents contain information from Syntel, Inc. which may be privileged, confidential or otherwise protected from disclosure. The information is intended to be for the addressee only. If you are not the addressee, any disclosure, copy, distribution or use of the contents of this message is prohibited. If you have received this electronic message in error, please notify the sender immediately and destroy the original message and all copies.