You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2008/09/16 23:53:50 UTC

[Pig Wiki] Update of "FAQ" by DavidPhillips

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by DavidPhillips:
http://wiki.apache.org/pig/FAQ

The comment on the change is:
this page is old, the new one is PigFaq

------------------------------------------------------------------------------
- Pig FAQ
+ deleted
  
- 1. I'm using PigStorage to parse my input files. Can I make it use control characters as delimiters?
- 
- A. Yes. Examples: PigStorage('\u0001') for Ctrl+A or '\u007C' for this character: |
- 
- 2. Can I do a numerical comparison while filtering?
- 
- A. Yes, you can choose between numerical and string comparison. For numerical comparison use the operators =, <>, <  etc. and for string comparisons use eq, neq etc. 
- 
- 3. How do I make my jobs run on multiple machines?
- 
- A. Use the PARALLEL clause. For example =C = JOIN A by url, B by url PARALLEL 50=
- 
- 4. Does Pig support NULLs?
- 
- A. Pig currently has no support for NULL values but it is on the roadmap.
- 
- 5. Does pig support regular expressions?
- 
- A. Pig does support regular expression matching via =matches= keyward. Tt uses java.util.regexp matches which means your pattern has to match the entire string (ie if your string is "hi fred" and you want to find "fred" you have to give a pattern of ".*fred" not "fred").
- 
- 6. How to prevent failure if some records don't have the needed number of columns.
- 
- You can filter away those records by including the following in your Pig program:
- 
- 
- A = load 'foo' using PigStorage('\t');
- B = FILTER A BY ARITY(*) < 5;
- .....
- 
- 
- This code would drop all the records that has less than 5 columns.
- 
- 7. Is there any difference between == and eq for numeric comparisons?
- 
- For equality, there is no difference while you stay in integers. However 11.0 and 11 will be equal with == but not with eq. 
- 
- 8. Is there an easy way for me to figure out how many rows exists in a dataset from its alias?
- 
- You can run the following set of commands:
- 
- 
- a = load 'bla' ... ;
- 
- b = group a all;
- 
- c = foreach b generate COUNT(a.$0);
- 
- 
- This is equivalent to select count(*) in SQL.
- 
- 9. Does Pig allow grouping on expressions
- 
- Currently, Pig only allows to group on data fields rather than expressions. Allowing grouping on expressions is on our road map. Stay tuned!
-