You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Zeynep PEHLIVAN <ze...@lip6.fr> on 2011/04/29 16:12:38 UTC

newbie question about a basic script

Hi to all,

I am newbie and  I am just testing small scripts for training.

My question is about the result of the script below in local mode:

grunt> cat nested.txt 
{(8,9),(0,1)},{(8,9),(1,1)}
{(2,3),(4,5)},{(2,3),(4,5)}
{(6,7),(3,7)},{(2,2),(3,7)}
grunt> A = LOAD 'nested.txt' AS
(B1:bag{T1:tuple(t1:int,t2:int)},B2:bag{T2:tuple(f1:int,f2:int)});
grunt> DUMP A;
({(8,9),(0,1)},)
({(2,3),(4,5)},)
({(6,7),(3,7)},)

Why B2 is not displayed !????

When I executed the same script with PigPen, B2 is displayed but this
time I have only one result instead of three. You can find the
screenshot in the attachment.


When I use grunt shell, I have all the messages below before displaying
the result and it takes too much time.
Should I use a parameter with pig -x local to avoid this? or I made
errors with my installation?

THANKS IN ADVANCE

grunt> A = LOAD 'nested.txt' AS
(B1:bag{T1:tuple(t1:int,t2:int)},B2:bag{T2:tuple(f1:int,f2:int)});
grunt> DUMP
A;                                                                                   
2011-04-29 15:37:44,954 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2011-04-29 15:37:44,954 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-04-29 15:37:44,955 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:44,959 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
A:
Store(file:/tmp/temp643030084/tmp-1663465556:org.apache.pig.impl.io.InterStorage) - scope-48 Operator Key: scope-48)
2011-04-29 15:37:44,959 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false
2011-04-29 15:37:44,960 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2011-04-29 15:37:44,960 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2011-04-29 15:37:44,961 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:44,964 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:44,966 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
added to the job
2011-04-29 15:37:44,966 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-04-29 15:37:46,270 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2011-04-29 15:37:46,273 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,275 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2011-04-29 15:37:46,295 [Thread-57] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,300 [Thread-57] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,308 [Thread-57] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1
2011-04-29 15:37:46,308 [Thread-57] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1
2011-04-29 15:37:46,308 [Thread-57] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2011-04-29 15:37:46,402 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,407 [Thread-66] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1
2011-04-29 15:37:46,407 [Thread-66] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1
2011-04-29 15:37:46,407 [Thread-66] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2011-04-29 15:37:46,442 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,446 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,449 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,452 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,486 [Thread-66] INFO
org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0005_m_000000_0
is done. And is in the process of commiting
2011-04-29 15:37:46,486 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,489 [Thread-66] INFO
org.apache.hadoop.mapred.LocalJobRunner -
2011-04-29 15:37:46,489 [Thread-66] INFO
org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0005_m_000000_0
is allowed to commit now
2011-04-29 15:37:46,489 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,494 [Thread-66] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved
output of task 'attempt_local_0005_m_000000_0' to
file:/tmp/temp643030084/tmp-1663465556
2011-04-29 15:37:46,496 [Thread-66] INFO
org.apache.hadoop.mapred.LocalJobRunner -
2011-04-29 15:37:46,496 [Thread-66] INFO
org.apache.hadoop.mapred.TaskRunner - Task
'attempt_local_0005_m_000000_0' done.
2011-04-29 15:37:46,776 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0005
2011-04-29 15:37:46,776 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2011-04-29 15:37:51,778 [main] WARN
org.apache.pig.tools.pigstats.PigStatsUtil - Failed to get RunningJob
for job job_local_0005
2011-04-29 15:37:51,778 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2011-04-29 15:37:51,778 [main] INFO
org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats
reported below may be incomplete
2011-04-29 15:37:51,778 [main] INFO
org.apache.pig.tools.pigstats.PigStats - Script Statistics:

HadoopVersion    PigVersion    UserId    StartedAt    FinishedAt
Features
0.20.2    0.8.1    pehlivanz    2011-04-29 15:37:44    2011-04-29
15:37:51    UNKNOWN

Success!

Job Stats (time in seconds):
JobId    Alias    Feature    Outputs
job_local_0005    A    MAP_ONLY
file:/tmp/temp643030084/tmp-1663465556,

Input(s):
Successfully read records from:
"file:///home/pehlivanz/PIG/pig-0.8.1/tutorial/scripts/testzp/nested.txt"

Output(s):
Successfully stored records in: "file:/tmp/temp643030084/tmp-1663465556"

Job DAG:
job_local_0005


2011-04-29 15:37:51,778 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:51,781 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2011-04-29 15:37:51,782 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:51,784 [main] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1
2011-04-29 15:37:51,785 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1

Re: newbie question about a basic script

Posted by Richard Ding <rd...@yahoo-inc.com>.

Before casting fields to the schema you specified, loader needs to split each record into fields. For PigStorage (the loader used in your script), the default field separator is '\t'. Since the data file doesn't use '\t' to mark the field boundary, the loader reads the whole record into a single field.

-Richard






On 4/29/11 7:12 AM, "Zeynep PEHLIVAN" <ze...@lip6.fr> wrote:

Hi to all,

I am newbie and  I am just testing small scripts for training.

My question is about the result of the script below in local mode:

grunt> cat nested.txt
{(8,9),(0,1)},{(8,9),(1,1)}
{(2,3),(4,5)},{(2,3),(4,5)}
{(6,7),(3,7)},{(2,2),(3,7)}
grunt> A = LOAD 'nested.txt' AS
(B1:bag{T1:tuple(t1:int,t2:int)},B2:bag{T2:tuple(f1:int,f2:int)});
grunt> DUMP A;
({(8,9),(0,1)},)
({(2,3),(4,5)},)
({(6,7),(3,7)},)

Why B2 is not displayed !????

When I executed the same script with PigPen, B2 is displayed but this
time I have only one result instead of three. You can find the
screenshot in the attachment.


When I use grunt shell, I have all the messages below before displaying
the result and it takes too much time.
Should I use a parameter with pig -x local to avoid this? or I made
errors with my installation?

THANKS IN ADVANCE

grunt> A = LOAD 'nested.txt' AS
(B1:bag{T1:tuple(t1:int,t2:int)},B2:bag{T2:tuple(f1:int,f2:int)});
grunt> DUMP
A;
2011-04-29 15:37:44,954 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2011-04-29 15:37:44,954 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-04-29 15:37:44,955 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:44,959 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
A:
Store(file:/tmp/temp643030084/tmp-1663465556:org.apache.pig.impl.io.InterStorage) - scope-48 Operator Key: scope-48)
2011-04-29 15:37:44,959 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false
2011-04-29 15:37:44,960 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2011-04-29 15:37:44,960 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2011-04-29 15:37:44,961 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:44,964 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:44,966 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
added to the job
2011-04-29 15:37:44,966 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-04-29 15:37:46,270 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2011-04-29 15:37:46,273 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,275 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2011-04-29 15:37:46,295 [Thread-57] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,300 [Thread-57] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,308 [Thread-57] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1
2011-04-29 15:37:46,308 [Thread-57] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1
2011-04-29 15:37:46,308 [Thread-57] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2011-04-29 15:37:46,402 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,407 [Thread-66] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1
2011-04-29 15:37:46,407 [Thread-66] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1
2011-04-29 15:37:46,407 [Thread-66] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2011-04-29 15:37:46,442 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,446 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,449 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,452 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,486 [Thread-66] INFO
org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0005_m_000000_0
is done. And is in the process of commiting
2011-04-29 15:37:46,486 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,489 [Thread-66] INFO
org.apache.hadoop.mapred.LocalJobRunner -
2011-04-29 15:37:46,489 [Thread-66] INFO
org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0005_m_000000_0
is allowed to commit now
2011-04-29 15:37:46,489 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,494 [Thread-66] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved
output of task 'attempt_local_0005_m_000000_0' to
file:/tmp/temp643030084/tmp-1663465556
2011-04-29 15:37:46,496 [Thread-66] INFO
org.apache.hadoop.mapred.LocalJobRunner -
2011-04-29 15:37:46,496 [Thread-66] INFO
org.apache.hadoop.mapred.TaskRunner - Task
'attempt_local_0005_m_000000_0' done.
2011-04-29 15:37:46,776 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0005
2011-04-29 15:37:46,776 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2011-04-29 15:37:51,778 [main] WARN
org.apache.pig.tools.pigstats.PigStatsUtil - Failed to get RunningJob
for job job_local_0005
2011-04-29 15:37:51,778 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2011-04-29 15:37:51,778 [main] INFO
org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats
reported below may be incomplete
2011-04-29 15:37:51,778 [main] INFO
org.apache.pig.tools.pigstats.PigStats - Script Statistics:

HadoopVersion    PigVersion    UserId    StartedAt    FinishedAt
Features
0.20.2    0.8.1    pehlivanz    2011-04-29 15:37:44    2011-04-29
15:37:51    UNKNOWN

Success!

Job Stats (time in seconds):
JobId    Alias    Feature    Outputs
job_local_0005    A    MAP_ONLY
file:/tmp/temp643030084/tmp-1663465556,

Input(s):
Successfully read records from:
"file:///home/pehlivanz/PIG/pig-0.8.1/tutorial/scripts/testzp/nested.txt"

Output(s):
Successfully stored records in: "file:/tmp/temp643030084/tmp-1663465556"

Job DAG:
job_local_0005


2011-04-29 15:37:51,778 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:51,781 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2011-04-29 15:37:51,782 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:51,784 [main] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1
2011-04-29 15:37:51,785 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1