You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Viraj Bhat (JIRA)" <ji...@apache.org> on 2008/12/31 00:36:44 UTC
[jira] Created: (PIG-594) Inconsistent behaviour of FilterFunc UDF
when used in the Filter and ForEach statements
Inconsistent behaviour of FilterFunc UDF when used in the Filter and ForEach statements
---------------------------------------------------------------------------------------
Key: PIG-594
URL: https://issues.apache.org/jira/browse/PIG-594
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: types_branch
Reporter: Viraj Bhat
Fix For: types_branch
I have a UDF known as INSETFROMFILE, which matches data against a set of values stored in an HDFS file. The INSETFROMFILE extends FilterFunc. Here is a sample pig script which uses it.
{code}
register util.jar;
define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
B = group A by (url);
C = foreach B generate ((InQuerySet(A.bcookie))?1:0) as inset, A;
dump C;
{code}
This script fails with the following exception in the reducer:
================================================================================================================
at org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toProperties(ConfigurationUtil.java:45)
at util.INSETFROMFILE.init(INSETFROMFILE.java:79)
at util.INSETFROMFILE.exec(INSETFROMFILE.java:99)
at util.INSETFROMFILE.exec(INSETFROMFILE.java:61)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:185)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:223)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POBinCond.getNext(POBinCond.java:92)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:259)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:280)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:247)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:224)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:136)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
================================================================================================================
To avoid this error we use the INSETFROMFILE UDF in the Filter statement of Pig and it works.
{code}
register util.jar;
define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
B = filter A by InQuerySet(bcookie);
dump B;
{code}
The result is:
(www.yahoo.com,12344)
Problems:
1) Why does the FilterFunc UDF, INSETFROMFILE show inconsistent behaviour when used in the FOREACH?
2) Is there a rule that FilterFunc UDF should be used in Filter statement?
3) Properties props = ConfigurationUtil.toProperties(PigInputFormat.sJob) is null when the FilterFunc UDF is called within ForEach
Attaching data and script file for testing.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-594) Inconsistent behaviour of FilterFunc UDF
when used in the Filter and ForEach statements
Posted by "Viraj Bhat (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Bhat updated PIG-594:
---------------------------
Description:
I have a UDF known as INSETFROMFILE, which matches data against a set of values stored in an HDFS file. The INSETFROMFILE extends FilterFunc. Here is a sample pig script which uses it.
{code}
register util.jar;
define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
B = group A by (url);
C = foreach B generate ((InQuerySet(A.bcookie))?1:0) as inset, A;
dump C;
{code}
This script fails with the following exception in the reducer:
================================================================================================================
java.lang.NullPointerException
at org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toProperties(ConfigurationUtil.java:45)
at util.INSETFROMFILE.init(INSETFROMFILE.java:79)
at util.INSETFROMFILE.exec(INSETFROMFILE.java:99)
at util.INSETFROMFILE.exec(INSETFROMFILE.java:61)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:185)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:223)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POBinCond.getNext(POBinCond.java:92)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:259)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:280)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:247)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:224)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:136)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
================================================================================================================
To avoid this error we use the INSETFROMFILE UDF in the Filter statement of Pig and it works.
{code}
register util.jar;
define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
B = filter A by InQuerySet(bcookie);
dump B;
{code}
The result is:
(www.yahoo.com,12344)
Problems:
1) Why does the FilterFunc UDF, INSETFROMFILE show inconsistent behaviour when used in the FOREACH?
2) Is there a rule that FilterFunc UDF should be used in Filter statement?
3) Properties props = ConfigurationUtil.toProperties(PigInputFormat.sJob) is null when the FilterFunc UDF is called within ForEach
Attaching data and script file for testing.
was:
I have a UDF known as INSETFROMFILE, which matches data against a set of values stored in an HDFS file. The INSETFROMFILE extends FilterFunc. Here is a sample pig script which uses it.
{code}
register util.jar;
define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
B = group A by (url);
C = foreach B generate ((InQuerySet(A.bcookie))?1:0) as inset, A;
dump C;
{code}
This script fails with the following exception in the reducer:
================================================================================================================
at org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toProperties(ConfigurationUtil.java:45)
at util.INSETFROMFILE.init(INSETFROMFILE.java:79)
at util.INSETFROMFILE.exec(INSETFROMFILE.java:99)
at util.INSETFROMFILE.exec(INSETFROMFILE.java:61)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:185)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:223)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POBinCond.getNext(POBinCond.java:92)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:259)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:280)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:247)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:224)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:136)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
================================================================================================================
To avoid this error we use the INSETFROMFILE UDF in the Filter statement of Pig and it works.
{code}
register util.jar;
define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
B = filter A by InQuerySet(bcookie);
dump B;
{code}
The result is:
(www.yahoo.com,12344)
Problems:
1) Why does the FilterFunc UDF, INSETFROMFILE show inconsistent behaviour when used in the FOREACH?
2) Is there a rule that FilterFunc UDF should be used in Filter statement?
3) Properties props = ConfigurationUtil.toProperties(PigInputFormat.sJob) is null when the FilterFunc UDF is called within ForEach
Attaching data and script file for testing.
> Inconsistent behaviour of FilterFunc UDF when used in the Filter and ForEach statements
> ---------------------------------------------------------------------------------------
>
> Key: PIG-594
> URL: https://issues.apache.org/jira/browse/PIG-594
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Reporter: Viraj Bhat
> Fix For: types_branch
>
>
> I have a UDF known as INSETFROMFILE, which matches data against a set of values stored in an HDFS file. The INSETFROMFILE extends FilterFunc. Here is a sample pig script which uses it.
> {code}
> register util.jar;
> define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
> A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
> B = group A by (url);
> C = foreach B generate ((InQuerySet(A.bcookie))?1:0) as inset, A;
> dump C;
> {code}
> This script fails with the following exception in the reducer:
> ================================================================================================================
> java.lang.NullPointerException
> at org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toProperties(ConfigurationUtil.java:45)
> at util.INSETFROMFILE.init(INSETFROMFILE.java:79)
> at util.INSETFROMFILE.exec(INSETFROMFILE.java:99)
> at util.INSETFROMFILE.exec(INSETFROMFILE.java:61)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:185)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:223)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POBinCond.getNext(POBinCond.java:92)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:259)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:280)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:247)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:224)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:136)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ================================================================================================================
> To avoid this error we use the INSETFROMFILE UDF in the Filter statement of Pig and it works.
> {code}
> register util.jar;
> define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
> A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
> B = filter A by InQuerySet(bcookie);
> dump B;
> {code}
> The result is:
> (www.yahoo.com,12344)
> Problems:
> 1) Why does the FilterFunc UDF, INSETFROMFILE show inconsistent behaviour when used in the FOREACH?
> 2) Is there a rule that FilterFunc UDF should be used in Filter statement?
> 3) Properties props = ConfigurationUtil.toProperties(PigInputFormat.sJob) is null when the FilterFunc UDF is called within ForEach
> Attaching data and script file for testing.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-594) Inconsistent behaviour of FilterFunc UDF
when used in the Filter and ForEach statements
Posted by "Viraj Bhat (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Bhat updated PIG-594:
---------------------------
Attachment: insetfilterfile
FilterFile in HDFS
> Inconsistent behaviour of FilterFunc UDF when used in the Filter and ForEach statements
> ---------------------------------------------------------------------------------------
>
> Key: PIG-594
> URL: https://issues.apache.org/jira/browse/PIG-594
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Reporter: Viraj Bhat
> Fix For: types_branch
>
> Attachments: insetfilterfile, myurldata.txt
>
>
> I have a UDF known as INSETFROMFILE, which matches data against a set of values stored in an HDFS file. The INSETFROMFILE extends FilterFunc. Here is a sample pig script which uses it.
> {code}
> register util.jar;
> define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
> A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
> B = group A by (url);
> C = foreach B generate ((InQuerySet(A.bcookie))?1:0) as inset, A;
> dump C;
> {code}
> This script fails with the following exception in the reducer:
> ================================================================================================================
> java.lang.NullPointerException
> at org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toProperties(ConfigurationUtil.java:45)
> at util.INSETFROMFILE.init(INSETFROMFILE.java:79)
> at util.INSETFROMFILE.exec(INSETFROMFILE.java:99)
> at util.INSETFROMFILE.exec(INSETFROMFILE.java:61)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:185)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:223)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POBinCond.getNext(POBinCond.java:92)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:259)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:280)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:247)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:224)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:136)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ================================================================================================================
> To avoid this error we use the INSETFROMFILE UDF in the Filter statement of Pig and it works.
> {code}
> register util.jar;
> define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
> A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
> B = filter A by InQuerySet(bcookie);
> dump B;
> {code}
> The result is:
> (www.yahoo.com,12344)
> Problems:
> 1) Why does the FilterFunc UDF, INSETFROMFILE show inconsistent behaviour when used in the FOREACH?
> 2) Is there a rule that FilterFunc UDF should be used in Filter statement?
> 3) Properties props = ConfigurationUtil.toProperties(PigInputFormat.sJob) is null when the FilterFunc UDF is called within ForEach
> Attaching data and script file for testing.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-594) Inconsistent behaviour of FilterFunc UDF
when used in the Filter and ForEach statements
Posted by "Viraj Bhat (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Bhat updated PIG-594:
---------------------------
Attachment: myurldata.txt
Input data for Pig Script
> Inconsistent behaviour of FilterFunc UDF when used in the Filter and ForEach statements
> ---------------------------------------------------------------------------------------
>
> Key: PIG-594
> URL: https://issues.apache.org/jira/browse/PIG-594
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Reporter: Viraj Bhat
> Fix For: types_branch
>
> Attachments: insetfilterfile, myurldata.txt
>
>
> I have a UDF known as INSETFROMFILE, which matches data against a set of values stored in an HDFS file. The INSETFROMFILE extends FilterFunc. Here is a sample pig script which uses it.
> {code}
> register util.jar;
> define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
> A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
> B = group A by (url);
> C = foreach B generate ((InQuerySet(A.bcookie))?1:0) as inset, A;
> dump C;
> {code}
> This script fails with the following exception in the reducer:
> ================================================================================================================
> java.lang.NullPointerException
> at org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toProperties(ConfigurationUtil.java:45)
> at util.INSETFROMFILE.init(INSETFROMFILE.java:79)
> at util.INSETFROMFILE.exec(INSETFROMFILE.java:99)
> at util.INSETFROMFILE.exec(INSETFROMFILE.java:61)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:185)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:223)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POBinCond.getNext(POBinCond.java:92)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:259)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:280)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:247)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:224)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:136)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ================================================================================================================
> To avoid this error we use the INSETFROMFILE UDF in the Filter statement of Pig and it works.
> {code}
> register util.jar;
> define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
> A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
> B = filter A by InQuerySet(bcookie);
> dump B;
> {code}
> The result is:
> (www.yahoo.com,12344)
> Problems:
> 1) Why does the FilterFunc UDF, INSETFROMFILE show inconsistent behaviour when used in the FOREACH?
> 2) Is there a rule that FilterFunc UDF should be used in Filter statement?
> 3) Properties props = ConfigurationUtil.toProperties(PigInputFormat.sJob) is null when the FilterFunc UDF is called within ForEach
> Attaching data and script file for testing.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-594) Inconsistent behaviour of FilterFunc UDF
when used in the Filter and ForEach statements
Posted by "Viraj Bhat (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Bhat updated PIG-594:
---------------------------
Attachment: INSETFROMFILE.java
INSETFROMFILE UDF which uses FilterFunc
> Inconsistent behaviour of FilterFunc UDF when used in the Filter and ForEach statements
> ---------------------------------------------------------------------------------------
>
> Key: PIG-594
> URL: https://issues.apache.org/jira/browse/PIG-594
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Reporter: Viraj Bhat
> Fix For: types_branch
>
> Attachments: insetfilterfile, INSETFROMFILE.java, myurldata.txt
>
>
> I have a UDF known as INSETFROMFILE, which matches data against a set of values stored in an HDFS file. The INSETFROMFILE extends FilterFunc. Here is a sample pig script which uses it.
> {code}
> register util.jar;
> define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
> A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
> B = group A by (url);
> C = foreach B generate ((InQuerySet(A.bcookie))?1:0) as inset, A;
> dump C;
> {code}
> This script fails with the following exception in the reducer:
> ================================================================================================================
> java.lang.NullPointerException
> at org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toProperties(ConfigurationUtil.java:45)
> at util.INSETFROMFILE.init(INSETFROMFILE.java:79)
> at util.INSETFROMFILE.exec(INSETFROMFILE.java:99)
> at util.INSETFROMFILE.exec(INSETFROMFILE.java:61)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:185)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:223)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POBinCond.getNext(POBinCond.java:92)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:259)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:280)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:247)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:224)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:136)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ================================================================================================================
> To avoid this error we use the INSETFROMFILE UDF in the Filter statement of Pig and it works.
> {code}
> register util.jar;
> define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
> A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
> B = filter A by InQuerySet(bcookie);
> dump B;
> {code}
> The result is:
> (www.yahoo.com,12344)
> Problems:
> 1) Why does the FilterFunc UDF, INSETFROMFILE show inconsistent behaviour when used in the FOREACH?
> 2) Is there a rule that FilterFunc UDF should be used in Filter statement?
> 3) Properties props = ConfigurationUtil.toProperties(PigInputFormat.sJob) is null when the FilterFunc UDF is called within ForEach
> Attaching data and script file for testing.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.