You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2014/01/15 20:54:29 UTC
[jira] [Commented] (PIG-3669) Wrong Usage of Scalar can bring down Namenode

    [ https://issues.apache.org/jira/browse/PIG-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872520#comment-13872520 ] 

Rohini Palaniswamy commented on PIG-3669:
-----------------------------------------

It would be good to have a different syntax for scalar. But not sure how this can be done without breaking backward compatibility.
 
The other solution that I could think of to mitigate the problem was to validate all the ReadScalars UDF in the plan by executing them in the OutputCommitter's setupJob(). The setupJob() runs once for the job before the map and reduce tasks are launched. It runs as a separate task in Hadoop 1.x and is run as part of the AM in 2.x.  If there is more than 1 record, then the job would fail in the setup phase itself, thus avoiding doing listStatus and loading on all the map or reduce tasks. If it indeed was a scalar, we have an extra load in the setup phase but the cost is minimal as that should at max be 1 file or 1 file with data and other being empty files.

  Any other good ideas are welcome.

> Wrong Usage of Scalar can bring down Namenode
> ---------------------------------------------
>
>                 Key: PIG-3669
>                 URL: https://issues.apache.org/jira/browse/PIG-3669
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>
> People often get confused and end up using . after a join instead of ::  (PIG-2134) to reference fields of the join. Pig takes it to be a scalar and tries to read the whole join data assuming there is only one record and then fails with "scalar has more than one row in the output". ReadScalars uses InterStorage to read the scalar data. It uses InterInputFormat which extends FileInputFormat and for getSplits(), there is a listStatus on all the files in the dir to construct the splits and then InterRecordReader is used on the splits to read the data.  When the data is really huge with lot of files, this can cause the namenode to go out of memory and crash and especially with the recent optimization change in hadoop 0.23/2.x that uses listLocatedStatus instead of listStatus + getBlockLocations in FileInputFormat to reduce number of calls to Namenode. 
>  In the particular case we encountered, the join output had 6.5K files. On the job that did ReadScalars, it had 8K+ tasks (with speculative execution) and all of them doing listStatus for the 6.5K files caused NN queue to fill up with huge responses (block locations makes the response even bigger). It also saturated the network, causing responses to build up in NN without being sent finally leading it to crash with OOM.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)