You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Niels Basjes (JIRA)" <ji...@apache.org> on 2015/03/19 16:49:38 UTC

[jira] [Created] (PIG-4471) ASSERT Bag is not empty and/or is within a specified size range.

Niels Basjes created PIG-4471:
---------------------------------

             Summary: ASSERT Bag is not empty and/or is within a specified size range.
                 Key: PIG-4471
                 URL: https://issues.apache.org/jira/browse/PIG-4471
             Project: Pig
          Issue Type: New Feature
          Components: internal-udfs, parser
            Reporter: Niels Basjes


In PIG-3367 the ASSERT keyword was created.
The current implementation allows for checking in each record in the bag if the value of a column is valid (and fail the job if it is not).

We did several experiments and found that an empty bag (0 tuples) always succeeds. We need to ensure that a bag has been loaded correctly.

*Proposed enhancements:* 
# Allow the ASSERT statement to check if a bag is empty.
{code}
A = LOAD 'data' AS (a0:int,a1:int,a2:int);
ASSERT A NOT EMPTY, 'The A bag may not be empty';
{code}
# Allow the ASSERT statement to check if a bag has more than (or less than) a specific number of tuples. 
{code}
A = LOAD 'data' AS (a0:int,a1:int,a2:int);
ASSERT SIZE A > 100, 'The A bag is not big enough';
ASSERT SIZE A < 1000, 'The A bag is too big';
{code}
#- For me this may be an approximating implementation. i.e. if I say it must have at least 5M tuples then it may still return 'is valid' if it has 4.9M tuples.

NOTE: The syntax I show is just to give an idea on what I want to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)