You are viewing a plain text version of this content. The canonical link for it is here.
Posted to by Apache Wiki <> on 2007/11/07 18:34:07 UTC

[Pig Wiki] Update of "Grunt" by OlgaN

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by OlgaN:

New page:
= Grunt Shell =

== Introduction ==

This document describes commands supported by `grunt` that can be used in interactive shell as well as in batch mode. The supported commands include DFS commands, `pig` commands as well as a few others. All of them are discussed in the document.

== Commands ==

This section describes currently available commands. The commands in each section are listed in alphabetical order. All commands are case insensitive and white spaces are not significant.

=== DFS ===

This is a basic set of commands that allow you to navigate hadoop file system.

==== cat ====

This command is similar to the Unix `cat` commands and allows to print content of file(s) to the screen.

cat <PATH1> <PATH2> ...

If multiple files are specified, they are concatenated together . If directory is specified, it is recursively traversed and all content is concatenated together.


grunt> cat students
joe smith
john adams
anne white

==== cd ====

This command is similar to the Unix `cd` command and can be used to navigate the file system:

cd <DIR>

If directory is specified, this directory is made user's current working directory and all other operations happen relatively to this directory. If no directory is specified, user's home directory (/user/NAME) becomes the current working directory.


grunt> cd /data

==== copyFromLocal ====

This command allows to copy a file or a director from local file system to DFS.

copyFromLocal <SRC PATH> <DST PATH>

If a directory is specified, it is recursively copied over. "." can be used to specify that the new file/directory should be created in the current working directory and retain the name of the source file/directory.


grunt> copyFromLocal /data/students students
grunt> ls students
/data/students <r 3> 8270
grunt>  copyFromLocal  /data/tests new_tests
grunt> ls new_test
/data/new_test/<r 3>   664
/data/new_test/<r 3>    344
/data/new_test/more_data        <dir>

==== copyToLocal ====

This command allows to copy file or directory from DFS to a local file system. 

copyToLocal <SRC PATH> <DST PATH>

If a directory is specified, it is recursively copied over. "." can be used to specify that the new file/directory should be created in the current working directory (directory from which the script was executed or grunt shell started) and retain the name of the source file/directory.


grunt> copyToLocal students /data
copyToLocal data /data/mydata

==== cp ====

This command is similar to the Unix `cp` command and allows to copy files or directories within DFS.


If a directory is specified, it is recursively copied over. "." can be used to specify that the new file/directory should be created in the current working directory and retain the name of the source file/directory.


cp students students_save

==== ls ====

This command is similar to the Unix `ls` command and allows to list the content of a directory.

ls <DIR>

If `DIR` is specified, the command lists the content of the specified directory. Otherwise, the content of the current working directory is listed.


grunt> ls /data
/data/DDLs  <dir>
/data/count <dir>
/data/data  <dir>
/data/schema        <dir>

==== mkdir ====

This command is similar to the Unix `mkdir` command and allows to create new directories.

mkdir <DIR>

If parts of the path do not exist, they will get created.


grunt> mkdir data/20070905

If neither `data` nor `20070905` directories existed, they both would be created.

==== mv ====

This command is identical to `cp` except it removes the source file/directory as soon as it is copied.


grunt> mv output output2
grunt> ls output
File or directory output does not exist.
grunt> ls output2
/data/output2/map-000000<r 3>     508844
/data/output2/output3     <dir>
/data/output2/part-00000<r 3>     0

==== pwd ====

This command is identical to Unix `cat` command and it prints the name of the current working directory.


grunt> pwd

==== rm ====

This command is similar to Unix `rm` command and it allows to remove one or more file/directory. %RED% Note that it would recursively remove a directory even if it is not empty and it does not confirm remove and the removed data is not recoverable.%ENDCOLOR%

rm <PATH1> <PATH2> ...


grunt> rm /data/students
grunt> rm students students_sav

=== Pig ===

All regular pig commands can be executed from the shell. See PigLatin for more details.

=== Other Commands ===

==== define ====
Allows to define parameterized user defined function. Used in conjunction with `register`. Described in PigFunctions

==== describe ====

This command allows to review a schema of a particular alias. Schema format is described in PigLatinSchemas.


grunt> a = load '/data/students' as (name,
age, gpa);
grunt> b = filter a by name matches 'zach%';
grunt> c = group b by name;
grunt> d = foreach c generate group, COUNT(b.age);
grunt> describe a
a: (name, age, gpa )
grunt> describe b
b: (name, age, gpa )
grunt> describe c
c: (group, b: (name, age, gpa ) )
grunt> describe d
d: (group, count1 )

If you don't specify names for columns, you would not see it generated as the example below shows:

grunt> a = load '/data/students';
grunt> b = filter a by $0 matches 'zach%';
grunt> c = group b by $0;
grunt> d = foreach c generate group, COUNT(b.$1);
grunt> describe a;
a: ( )
grunt> describe b;
b: ( )
grunt> describe c;
c: (group: ( ), b: ( ) )
grunt> describe d;
d: (group: ( ), count1 )

==== dump ====
Allows to dump content of `pig` alias to the screen. Useful for debugging. Described in PigLatin
==== help ====
This command shows available commands.

grunt> help
<pig latin statement>;
store <alias> into <filename> [using <functionSpec>]
dump <alias>
describe <alias>
kill <job_id>
ls <path>
du <path>
mv <src> <dst>
cp <src> <dst>
rm <src>
copyFromLocal <localsrc> <dst>
cd <dir>
cat <src>
copyToLocal <src> <localdst>
mkdir <path>
cd <path>
define <functionAlias> <functionSpec>
register <udfJar>
==== kill ====
This command allows to kill a job based on its job id.

kill <JOBID>


grunt> kill job_0001

This command is currently not working.

==== quit ====
This command should be used to exit the shell.

grunt> quit

==== register ====
Allows to register `jar` with user defined functions. Can be used in conjunction with `define`. Described in PigFunctions

==== set ====
This command allows to path key-value pairs to pig. The format of the command is:

grunt> set <key> '<value>'

Both keys and values are case sensitive.

The following keys are currently supported:

|| Key || Value || Description ||
|| debug || on/off || enables/disables debug level logging ||
|| || single quoted string that contains the name || allows to set user specified name for the the job ||


grunt> set debug on
grunt> set debug off
grunt> set 'my job'

==== store ====
This command allows to store content of `pig` alias to a file. Described in PigLatin.