You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Steve Ogden (JIRA)" <ji...@apache.org> on 2014/02/06 15:20:10 UTC

[jira] [Commented] (PIG-3720) Nested concats of binary conditionals take 1/2 hour to parse

    [ https://issues.apache.org/jira/browse/PIG-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13893371#comment-13893371 ] 

Steve Ogden commented on PIG-3720:
----------------------------------

Yes, this fixes the problem. Thanks!

Steve Ogden
Lead Data Warehouse Developer
Thomson Reuters
Office: 651-848-4721
Cell: 651-206-4856

steve.ogden@thomsonreuters.com



> Nested concats of binary conditionals take 1/2 hour to parse
> ------------------------------------------------------------
>
>                 Key: PIG-3720
>                 URL: https://issues.apache.org/jira/browse/PIG-3720
>             Project: Pig
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.10.0
>            Reporter: Steve Ogden
>            Priority: Minor
>
> This statement takes over 1/2 hour to parse. Seems to be related to the conditionals. Removing them and just running the nested concats, it parses fast:
> fact_tsgsrtd_dim_hash = foreach tsgsrtd generate checksum,
>         UPPER(
>                 CONCAT((no_of_rics == '\\N' ? '0' : no_of_rics),
>                 CONCAT(request_start_dttm,
>                 CONCAT(request_end_dttm,
>                 CONCAT((adjs_list == '\\N' ? 'UNKNOWN' : adjs_list),
>                 CONCAT((event_datatype == '\\N' ? 'UNKNOWN' : event_datatype),
>                 CONCAT((facts_list == '\\N' ? 'UNKNOWN' : facts_list),
>                 CONCAT((frequency == '\\N' ? 'UNKNOWN' : frequency),
>                 CONCAT((points == '\\N' ? '0' : points),
>                 CONCAT((multiplier == '\\N' ? '0' : multiplier),
>                 CONCAT((qos == '\\N' ? 'UNKNOWN' : qos),
>                 CONCAT((pe == '\\N' ? '0' : pe),
>                 (event_type == 'GSREQ' ? 'GS' : (event_type == 'RICREQ' ? 'RTD' : (event_type == 'TSREQ' ? 'TS' : 'UNKNOWN')))
>                 ))))))))))));
> I noticed it I split it, do half the conditionals in one relation, then take the results of that and create another relation and do the other half of the conditionals, it parses in less than a minute:
> fact_tsgsrtd_cat1 = foreach tsgsrtd generate checksum, points, multiplier, qos, pe, event_type,
>                 CONCAT(CONCAT((no_of_rics == '\\N' ? '0' : no_of_rics),'.000000000'),
>                 CONCAT(request_start_dttm,
>                 CONCAT(request_end_dttm,
>                 CONCAT((adjs_list == '\\N' ? 'UNKNOWN' : adjs_list),
>                 CONCAT((event_datatype == '\\N' ? 'UNKNOWN' : event_datatype),
>                 CONCAT((facts_list == '\\N' ? 'UNKNOWN' : facts_list),
>                 (frequency == '\\N' ? 'UNKNOWN' : frequency)
>                 )))))) as cat1;
> fact_tsgsrtd_dim_hash = foreach fact_tsgsrtd_cat1 generate checksum,
>         UPPER(
>                 CONCAT(cat1,
>                 CONCAT((points == '\\N' ? '0' : points),
>                 CONCAT((multiplier == '\\N' ? '0' : multiplier),
>                 CONCAT((qos == '\\N' ? 'UNKNOWN' : qos),
>                 CONCAT(CONCAT((pe == '\\N' ? '0' : pe), '.0000'),
>                 (event_type == 'GSREQ' ? 'GS' : (event_type == 'RICREQ' ? 'RTD' : (event_type == 'TSREQ' ? 'TS' : 'UNKNOWN')))
>                 )))))) as ts_dim_hash;



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)