You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jonathan Coveney <jc...@gmail.com> on 2011/02/17 17:54:59 UTC

Has anyone run into problems with filter simply not working?

This is weird, because in my case it seems to be nondeterministic. I have a
text file, thing.txt, that is simply

http://www.guardian.co.uk/
asjlkdajlkdad
askjldajlksdjlkasjdlkajslkdjalds
asdjaskdjlasjdlkad
http://www.guardian.co.uk/adsasd
http://www.guardian.co.uk/sadasd
http://www.guardian.co.uk/asdad

I am running this code:
A = LOAD 'thing.txt' AS (c7:chararray);
B = filter A by (c7 matches '.*guardian\\.co\\.uk.*');
dump B;

For a while, I got no results! Then, it started working after I did "dump
A", although then it KEPT working. However, it isn't working with the actual
data that I care about. I can't seem to get it to not work again in local
mode.

I am running pig-0.8.0, latest trunk. The big files in question are .bcp.gz,
and locally I just use a .txt.

Any ideas what it will be? I will try to replicate on a smaller set of data
again...