You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Joao Salcedo <jo...@gmail.com> on 2012/09/06 02:40:11 UTC

REGEX

Hi All,

I am using regular expressions to parse my string

I have the following

"GET /javascript/quicksearch.js HTTP/1.0" 200 1947 "
http://www.gothiclolitawigs.com/gothic-lolita-wigs/straight-split-ss-blonde-white/"
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.5 (KHTML, like Gecko)
Chrome/19.0.1084.56 Safari/536.5"

I perform:

 agent = FOREACH base GENERATE (REGEX_EXTRACT(base, '(".+?")', 1));

And I get the first match

----------------------------------------------------------------------------
| agent     | org.apache.pig.builtin.regex_extract_base_1661:chararray
 |
-----------------------------------------------------------------------------
|           | "GET /javascript/quicksearch.js HTTP/1.0"
|
-----------------------------------------------------------------------------

If I do the same command :  agent = FOREACH base GENERATE
(REGEX_EXTRACT(base, '(".+?")', 3));   In order to get the 3 string between
quotes


I get the following: I do not understand why? any ideas?

2012-09-06 10:38:23,785 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
java.lang.NullPointerException
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.getCounter(TaskInputOutputContext.java:84)
at
org.apache.pig.tools.pigstats.PigStatusReporter.getCounter(PigStatusReporter.java:55)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.warn(PigHadoopLogger.java:56)
at org.apache.pig.EvalFunc.warn(EvalFunc.java:186)
at org.apache.pig.builtin.REGEX_EXTRACT.exec(REGEX_EXTRACT.java:90)
at org.apache.pig.builtin.REGEX_EXTRACT.exec(REGEX_EXTRACT.java:47)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:216)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:305)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:322)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at
org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:194)
at org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:257)
at org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:238)
at
org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:103)
at
org.apache.pig.pen.LineageTrimmingVisitor.<init>(LineageTrimmingVisitor.java:98)
at
org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:166)
at org.apache.pig.PigServer.getExamples(PigServer.java:1245)
at
org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:698)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:591)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:306)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:495)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
2012-09-06 10:38:23,794 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2997: Encountered IOException. Exception : null

Re: REGEX

Posted by Віталій Тимчишин <ti...@gmail.com>.
Index is not match number, but group number, so you need something
like (REGEX_EXTRACT(base, '(".+?")[^"]*(".+?")[^"]*(".+?")', 3))

2012/9/6 Joao Salcedo <jo...@gmail.com>

> Hi All,
>
> I am using regular expressions to parse my string
>
> I have the following
>
> "GET /javascript/quicksearch.js HTTP/1.0" 200 1947 "
>
> http://www.gothiclolitawigs.com/gothic-lolita-wigs/straight-split-ss-blonde-white/
> "
> "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.5 (KHTML, like Gecko)
> Chrome/19.0.1084.56 Safari/536.5"
>
> I perform:
>
>  agent = FOREACH base GENERATE (REGEX_EXTRACT(base, '(".+?")', 1));
>
> And I get the first match
>
>
> ----------------------------------------------------------------------------
> | agent     | org.apache.pig.builtin.regex_extract_base_1661:chararray
>  |
>
> -----------------------------------------------------------------------------
> |           | "GET /javascript/quicksearch.js HTTP/1.0"
> |
>
> -----------------------------------------------------------------------------
>
> If I do the same command :  agent = FOREACH base GENERATE
> (REGEX_EXTRACT(base, '(".+?")', 3));   In order to get the 3 string between
> quotes
>
>
> I get the following: I do not understand why? any ideas?
>
> 2012-09-06 10:38:23,785 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> java.lang.NullPointerException
> at
>
> org.apache.hadoop.mapreduce.TaskInputOutputContext.getCounter(TaskInputOutputContext.java:84)
> at
>
> org.apache.pig.tools.pigstats.PigStatusReporter.getCounter(PigStatusReporter.java:55)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.warn(PigHadoopLogger.java:56)
> at org.apache.pig.EvalFunc.warn(EvalFunc.java:186)
> at org.apache.pig.builtin.REGEX_EXTRACT.exec(REGEX_EXTRACT.java:90)
> at org.apache.pig.builtin.REGEX_EXTRACT.exec(REGEX_EXTRACT.java:47)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:216)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:305)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:322)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at
>
> org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:194)
> at org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:257)
> at org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:238)
> at
>
> org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:103)
> at
>
> org.apache.pig.pen.LineageTrimmingVisitor.<init>(LineageTrimmingVisitor.java:98)
> at
> org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:166)
> at org.apache.pig.PigServer.getExamples(PigServer.java:1245)
> at
>
> org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:698)
> at
>
> org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:591)
> at
>
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:306)
> at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
> at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> at org.apache.pig.Main.run(Main.java:495)
> at org.apache.pig.Main.main(Main.java:111)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> 2012-09-06 10:38:23,794 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2997: Encountered IOException. Exception : null
>



-- 
Best regards,
 Vitalii Tymchyshyn