You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hdt.apache.org by Apache Wiki <wi...@apache.org> on 2013/02/26 09:37:26 UTC

[Hdt Wiki] Update of "HDTProductExperience" by BobKerns

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hdt Wiki" for change notification.

The "HDTProductExperience" page has been changed by BobKerns:
http://wiki.apache.org/hdt/HDTProductExperience

New page:
= HDT Product Experience =

This page is intended to collect thoughts about the overall product experience, from a user's point of view, for a mature product. It is hoped this will lead to a shared idealized vision of things like packaging, distribution, scope, and feature set that might guide more practical and concrete development plans.

Many of these ideas might require cooperation, design, and implementation efforts in conjunction with other Hadoop projects. We might expect to provide leadership and some core technology to ensure high commonality.

== Installation ==

 * The user should be able to install using a single update site for all Hadoop-related Eclipse tools.
   * Not all components would need to be supplied by us.
   * It might (or might not) be better for the update site to be handled by Hadoop-Core.
   * It should at least be linked from all relevant projects.
 * Simple Eclipse install
 * Updates made available promptly.
 * Compatible with current and recent Eclipse versions.

== Configuration ==

 * The user should be able to obtain a cluster's configuration by entering the hostname of the master namenode.
   * All Hadoop services ought to coordinate to publish the necessary configuration information, downloadable via a single URL.
   * The cluster should be able to provide multiple alternative configurations for the user to select from.
 * The user should be able to override cluster-provided information.
   * The user should be able to provide multiple overrides of cluster-provided information for different purposes, and select where appropriate.
     * The exact definition of "where appropriate" may get tricky.

== Feature Set ==

This list is intended to be broad features that users might expect to be available across components.

In some cases, these duplicate functionality that is or might be provided via the web UI. It's fine to integrate the web UI into HDT, but we should be looking for ways to link more deeply into to Eclipse. Errors in the log, for example, should take us to Java source for classes mentioned, and failing tasks and jobs should lead us to the appropriate step in the appropriate script editor.

  * Data browsing
    * Filesystem browsing, for all supported filesystems
    * Database browsing, for all supported databases
    * Facilities for viewing large data sets -- sampling, searching, random access. Should ''not'' run out of memory trying to look at files in HDFS.
  * Job submission, for all types of jobs
  * Job tracking, for all types of jobs
    * This would include drilling down into more complex composite jobs, for example Pig, or the multiple jobs that make up a Hive select.
    * Ultimately, this drills down to tasks.
    * Counters
  * Log browsing, for logs from all types of jobs and tasks
  * Debugging
    * Remote attachment of Eclipse to tasks
    * Remote debugging of scripts (for example, pause after first intermediate join to examine result before passing it to next step, etc.)
    * Counters
  * Cluster monitoring (perhaps in one compact display)
    * Service and node status
    * Cluster load monitoring
    * Filesystem available space display

== Administration ==

Ideally, we would provide optional administration facilities. These would almost certainly duplicate the web interfaces (which they should reuse). The advantages of including with Eclipe are

  * One-stop shopping
  * Tighter integration -- for example, if service status display shows a service is down, allowing you to restart the service in place would be handy -- or at least taking you to the appropriate admin page.
    * Taking you to the appropriate admin page would allow better integration with 3rd-party tools such as Cloudera Manager.
  * Tools for managing filesystem space. Who's using what, where.

== Security ==

  * The tools should properly integrate, and be tested against, a locked-down secure install.
    * "Secure" may be overstating things now, but security will become an increasing concern as clusters are used by increasingly heterogeneous user communities.
  * Support for multiple user credentials on a particular cluster as part of the user-selectable customized configuration.
  * The tools should aid the user in identifying the security status of resources, and tracking down failures due to security restrictions.
  * Tools could aid the user in determining whether jobs comply with defined security policies.