You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Kazuho Fujii (JIRA)" <ji...@apache.org> on 2016/03/05 10:02:40 UTC

[jira] [Updated] (HADOOP-12830) Bash environment for quick command operations

     [ https://issues.apache.org/jira/browse/HADOOP-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kazuho Fujii updated HADOOP-12830:
----------------------------------
    Attachment: HADOOP-12830.002.patch

Hi,

I attached the second patch. Differences from the previous patch is as following:

- Follow the style guide, shellcheck, checkstyle, findbugs
- Partly merge hadoop-shell.sh into the main hadoop script
- Do not use flock command
- Enable to work stdin such as hadoop fs -put - /
- Enable to work in the batch mode
- Add unit tests

The first patch calls hadoop-shell.sh twice as executable and as bashrc. I rethink this is strange. So, the new patch creates the work files, starts the daemon, starts bash, stops the daemon in the main hadoop script. The bashrc part is separated because Bash requires and the script might be larger for the future.

For avoiding two processes write into the fifo at the same time, we should lock hadoop fs command. Previously, I used flock command because I could not find a simple way to take atomic lock. (I guess this is the reason flock command exists.) The new patch does not use flock, but takes a lock with a lock file. I think it will works any UNIX.

The new patch works in the batch mode with specifying BASH_ENV. I think this is useful.

I added unit tests. I tried to write integration test, but it is difficult for me because there is no integration tests for the normal hadoop fs command.

The security issue is not solved yet. I have confidence of convenience of this feature, but security sensitive users can not use this now. I do not have the fundamental resolution. I am planning to temporarily print warning or block the feature for Kerberos users. If you have a good resolution, could you tell me?


> Bash environment for quick command operations
> ---------------------------------------------
>
>                 Key: HADOOP-12830
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12830
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: bin
>            Reporter: Kazuho Fujii
>            Assignee: Kazuho Fujii
>         Attachments: HADOOP-12830.001.patch, HADOOP-12830.002.patch
>
>
> Hadoop file system shell commands are slow. This issue is about building a shell environment for quick command operations.
> Previously an interactive shell is tried to build in HADOOP-6541. But, it seems to be poor because users are used to powerful shells like bash. This issue is not about creating a new shell, but just opening a new bash process. Therefore, user can operate commands as before.
> {code}
> fjk@x240:~/hadoop-2.7.2$ ./bin/hadoop shell
> fjk@x240 hadoop> hadoop fs -ls /
> Found 2 items
> -rw-r--r--   3 fjk supergroup          0 2016-02-21 00:26 /file1
> -rw-r--r--   3 fjk supergroup          0 2016-02-21 00:26 /file2
> {code}
> The shell has a mini daemon process that is living until the shell is closed. The hadoop fs command delegates the operation to the daemon. They communicate with named pipes. The daemon conducts the operation and returns the result to the command.
> In this shell the hadoop fs commands operation becomes quick. In a local environment, "hadoop fs -ls" command is about 100 times faster than the normal command.
> {code}
> fjk@x240 hadoop> time hadoop fs -ls hdfs://localhost:8020/ > /dev/null
> real	0m0.021s
> user	0m0.003s
> sys	0m0.011s
> {code}
> Using bash's function, commands and file names are automatically completed.
> {code}
> fjk@x240 hadoop> hadoop fs -ch<TAB><TAB>
> -checksum  -chgrp     -chmod     -chown
> fjk@x240 hadoop> hadoop fs -ls /file<TAB><TAB>
> /file1  /file2  /file3
> {code}
> Additionally, we can make equivalents with bash build-in commands, e.g., cd, umask. In this shell, they can work because the daemon remembers the state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)