You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2008/05/01 00:42:02 UTC

[Pig Wiki] Update of "FAQ" by AmirYoussefi

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by AmirYoussefi:
http://wiki.apache.org/pig/FAQ

New page:
---+!! PigFAQ

---++++ 1. I'm using PigStorage to parse my input files. Can I make it use control characters as delimiters?

A. Yes. Examples: PigStorage('\u0001') for Ctrl+A or '\u007C' for this character: |


---++++2. Can I do a numerical comparison while filtering?

A. Yes, you can choose between numerical and string comparison. For numerical comparison use the operators =, <>, <  etc. and for string comparisons use eq, neq etc. 

---++++3. How do I make my jobs run on multiple machines?

A. Use the PARALLEL clause. For example =C = JOIN A by url, B by url PARALLEL 50=

---++++4. Does Pig support NULLs?

A. Pig currently has no support for NULL values but it is on the roadmap.

---++++5. Does pig support regular expressions?

A. Pig does support regular expression matching via =matches= keyward. Tt uses java.util.regexp matches which means your pattern has to match the entire string (ie if your string is "hi fred" and you want to find "fred" you have to give a pattern of ".*fred" not "fred").

---++++6. How to prevent failure if some records don't have the needed number of columns.

You can filter away those records by including the following in your Pig program:

<verbatim>
A = load 'foo' using PigStorage('\t');
B = FILTER A BY ARITY(*) < 5;
.....
</verbatim>

This code would drop all the records that has less than 5 columns.

---++++7. Is there any difference between == and eq for numeric comparisons?

For equality, there is no difference while you stay in integers. However 11.0 and 11 will be equal with == but not with eq. 

---++++8. Is there an easy way for me to figure out how many rows exists in a dataset from its alias?

You can run the following set of commands:

<verbatim>
a = load 'bla' ... ;
b = group a all;
c = foreach b generate COUNT(a.$0);
</verbatim>

This is equivalent to select count(*) in SQL.

---++++9. Does Pig allow grouping on expressions

Currently, Pig only allows to group on data fields rather than expressions. Allowing grouping on expressions is on our road map. Stay tuned!