You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2007/11/07 18:34:07 UTC

[Pig Wiki] Update of "Grunt" by OlgaN

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/Grunt

New page:
[[Anchor(Grunt_Shell)]]
= Grunt Shell =

[[Anchor(Introduction)]]
== Introduction ==

This document describes commands supported by `grunt` that can be used in interactive shell as well as in batch mode. The supported commands include DFS commands, `pig` commands as well as a few others. All of them are discussed in the document.

[[Anchor(Commands)]]
== Commands ==

This section describes currently available commands. The commands in each section are listed in alphabetical order. All commands are case insensitive and white spaces are not significant.

[[Anchor(DFS)]]
=== DFS ===

This is a basic set of commands that allow you to navigate hadoop file system.

[[Anchor(cat)]]
==== cat ====

This command is similar to the Unix `cat` commands and allows to print content of file(s) to the screen.

{{{
cat <PATH1> <PATH2> ...
}}}

If multiple files are specified, they are concatenated together . If directory is specified, it is recursively traversed and all content is concatenated together.

Example:

{{{
grunt> cat students
joe smith
john adams
anne white
grunt>
}}}

[[Anchor(cd)]]
==== cd ====

This command is similar to the Unix `cd` command and can be used to navigate the file system:

{{{
cd <DIR>
or
cd
}}}

If directory is specified, this directory is made user's current working directory and all other operations happen relatively to this directory. If no directory is specified, user's home directory (/user/NAME) becomes the current working directory.

Example:

{{{
grunt> cd /data
grunt>
}}}

[[Anchor(copyFromLocal)]]
==== copyFromLocal ====

This command allows to copy a file or a director from local file system to DFS.

{{{
copyFromLocal <SRC PATH> <DST PATH>
}}}

If a directory is specified, it is recursively copied over. "." can be used to specify that the new file/directory should be created in the current working directory and retain the name of the source file/directory.

Examples:

{{{
grunt> copyFromLocal /data/students students
grunt> ls students
/data/students <r 3> 8270
grunt>  copyFromLocal  /data/tests new_tests
grunt> ls new_test
/data/new_test/test1.data<r 3>   664
/data/new_test/test2.data<r 3>    344
/data/new_test/more_data        <dir>
}}}

[[Anchor(copyToLocal)]]
==== copyToLocal ====

This command allows to copy file or directory from DFS to a local file system. 

{{{
copyToLocal <SRC PATH> <DST PATH>
}}}

If a directory is specified, it is recursively copied over. "." can be used to specify that the new file/directory should be created in the current working directory (directory from which the script was executed or grunt shell started) and retain the name of the source file/directory.

Examples:

{{{
grunt> copyToLocal students /data
copyToLocal data /data/mydata
}}}

[[Anchor(cp)]]
==== cp ====

This command is similar to the Unix `cp` command and allows to copy files or directories within DFS.

{{{
cp <SRC PATH> <DST PATH>
}}}

If a directory is specified, it is recursively copied over. "." can be used to specify that the new file/directory should be created in the current working directory and retain the name of the source file/directory.

Examples

{{{
cp students students_save
}}}

[[Anchor(ls)]]
==== ls ====

This command is similar to the Unix `ls` command and allows to list the content of a directory.

{{{
ls <DIR>
or
ls
}}}

If `DIR` is specified, the command lists the content of the specified directory. Otherwise, the content of the current working directory is listed.

Example:

{{{
grunt> ls /data
/data/DDLs  <dir>
/data/count <dir>
/data/data  <dir>
/data/schema        <dir>
grunt>
}}}

[[Anchor(mkdir)]]
==== mkdir ====

This command is similar to the Unix `mkdir` command and allows to create new directories.

{{{
mkdir <DIR>
}}}

If parts of the path do not exist, they will get created.

Example:

{{{
grunt> mkdir data/20070905
grunt> 
}}}

If neither `data` nor `20070905` directories existed, they both would be created.

[[Anchor(mv)]]
==== mv ====

This command is identical to `cp` except it removes the source file/directory as soon as it is copied.

Example:

{{{
grunt> mv output output2
grunt> ls output
File or directory output does not exist.
grunt> ls output2
/data/output2/map-000000<r 3>     508844
/data/output2/output3     <dir>
/data/output2/part-00000<r 3>     0
}}}

[[Anchor(pwd)]]
==== pwd ====

This command is identical to Unix `cat` command and it prints the name of the current working directory.

Example:

{{{
grunt> pwd
/data
grunt>
}}}

[[Anchor(rm)]]
==== rm ====

This command is similar to Unix `rm` command and it allows to remove one or more file/directory. %RED% Note that it would recursively remove a directory even if it is not empty and it does not confirm remove and the removed data is not recoverable.%ENDCOLOR%

{{{
rm <PATH1> <PATH2> ...
}}}

Examples:

{{{
grunt> rm /data/students
grunt> rm students students_sav
}}}

[[Anchor(Pig)]]
=== Pig ===

All regular pig commands can be executed from the shell. See PigLatin for more details.

[[Anchor(Other_Commands)]]
=== Other Commands ===

[[Anchor(define)]]
==== define ====
Allows to define parameterized user defined function. Used in conjunction with `register`. Described in PigFunctions

[[Anchor(describe)]]
==== describe ====

This command allows to review a schema of a particular alias. Schema format is described in PigLatinSchemas.

Example:

{{{
grunt> a = load '/data/students' as (name,
age, gpa);
grunt> b = filter a by name matches 'zach%';
grunt> c = group b by name;
grunt> d = foreach c generate group, COUNT(b.age);
grunt> describe a
a: (name, age, gpa )
grunt> describe b
b: (name, age, gpa )
grunt> describe c
c: (group, b: (name, age, gpa ) )
grunt> describe d
d: (group, count1 )
}}}

If you don't specify names for columns, you would not see it generated as the example below shows:

{{{
grunt> a = load '/data/students';
grunt> b = filter a by $0 matches 'zach%';
grunt> c = group b by $0;
grunt> d = foreach c generate group, COUNT(b.$1);
grunt> describe a;
a: ( )
grunt> describe b;
b: ( )
grunt> describe c;
c: (group: ( ), b: ( ) )
grunt> describe d;
d: (group: ( ), count1 )
}}}

[[Anchor(dump)]]
==== dump ====
Allows to dump content of `pig` alias to the screen. Useful for debugging. Described in PigLatin
[[Anchor(help)]]
==== help ====
This command shows available commands.

Example:
{{{
grunt> help
Commands:
<pig latin statement>;
store <alias> into <filename> [using <functionSpec>]
dump <alias>
describe <alias>
kill <job_id>
ls <path>
du <path>
mv <src> <dst>
cp <src> <dst>
rm <src>
copyFromLocal <localsrc> <dst>
cd <dir>
pwd
cat <src>
copyToLocal <src> <localdst>
mkdir <path>
cd <path>
define <functionAlias> <functionSpec>
register <udfJar>
debugOn
debugOff
quit
}}}
[[Anchor(kill)]]
==== kill ====
This command allows to kill a job based on its job id.


{{{
kill <JOBID>
}}}

Example:

{{{
grunt> kill job_0001
}}}

This command is currently not working.

[[Anchor(quit)]]
==== quit ====
This command should be used to exit the shell.

{{{
grunt> quit
}}}

[[Anchor(register)]]
==== register ====
Allows to register `jar` with user defined functions. Can be used in conjunction with `define`. Described in PigFunctions

#SetCommand
[[Anchor(set)]]
==== set ====
This command allows to path key-value pairs to pig. The format of the command is:

{{{
grunt> set <key> '<value>'
}}}

Both keys and values are case sensitive.

The following keys are currently supported:

|| Key || Value || Description ||
|| debug || on/off || enables/disables debug level logging ||
|| job.name || single quoted string that contains the name || allows to set user specified name for the the job ||

Examples:

{{{
grunt> set debug on
grunt> set debug off
grunt> set job.name 'my job'
grunt>
}}}


[[Anchor(store)]]
==== store ====
This command allows to store content of `pig` alias to a file. Described in PigLatin.