You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2014/01/15 20:48:19 UTC

[jira] [Created] (PIG-3669) Wrong Usage of Scalar can bring down Namenode

Rohini Palaniswamy created PIG-3669:
---------------------------------------

Summary: Wrong Usage of Scalar can bring down Namenode
Key: PIG-3669
URL: https://issues.apache.org/jira/browse/PIG-3669
Project: Pig
Issue Type: Bug
Reporter: Rohini Palaniswamy

People often get confused and end up using . after a join instead of :: (PIG-2134) to reference fields of the join. Pig takes it to be a scalar and tries to read the whole join data assuming there is only one record and then fails with "scalar has more than one row in the output". ReadScalars uses InterStorage to read the scalar data. It uses InterInputFormat which extends FileInputFormat and for getSplits(), there is a listStatus on all the files in the dir to construct the splits and then InterRecordReader is used on the splits to read the data. When the data is really huge with lot of files, this can cause the namenode to go out of memory and crash and especially with the recent optimization change in hadoop 0.23/2.x that uses listLocatedStatus instead of listStatus + getBlockLocations in FileInputFormat to reduce number of calls to Namenode.

In the particular case we encountered, the join output had 6.5K files. On the job that did ReadScalars, it had 8K+ tasks (with speculative execution) and all of them doing listStatus for the 6.5K files caused NN queue to fill up with huge responses (block locations makes the response even bigger). It also saturated the network, causing responses to build up in NN without being sent finally leading it to crash with OOM.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)