You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Anmol Ahuja <da...@gmail.com> on 2023/05/08 08:19:44 UTC

Controlling number of mappers or queueing some

Hi,

We have a hive connector library we're having some issues with. Given
N input splits we have N mappers spawned, but the number of write
requests these mappers end up making end up exceeding the peak allowed
write throughput. Is it possible to dynamically merge some input
splits in some mappers to throttle throughput to stay under some peak?

I could use CombineFileInputFormat to control the size of splits
perhaps, but I don't know what the size of the data set that needs to
be written is in advance. And ideally we'd be able to delegate the
throttling logic to the library- it sees what throughput is achieved
with a given number of mappers, and maybe kill some mappers and move
their input splits over to another OR maybe queue some inputs to run
later. Is this possible?

Thanks,
Anmol