You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hive.apache.org by ga...@apache.org on 2016/05/13 18:34:51 UTC
hive git commit: HIVE-13681 Update README with latest Hive functionality (Alan Gates reviewed by Lefty Leverenz)

Repository: hive
Updated Branches:
  refs/heads/master 7d766d0bd -> c4b2549be


HIVE-13681 Update README with latest Hive functionality (Alan Gates reviewed by Lefty Leverenz)


Project: http://git-wip-us.apache.org/repos/asf/hive/repo
Commit: http://git-wip-us.apache.org/repos/asf/hive/commit/c4b2549b
Tree: http://git-wip-us.apache.org/repos/asf/hive/tree/c4b2549b
Diff: http://git-wip-us.apache.org/repos/asf/hive/diff/c4b2549b

Branch: refs/heads/master
Commit: c4b2549be9112a0d032c8b99da119a05daf1563e
Parents: 7d766d0
Author: Alan Gates <ga...@hortonworks.com>
Authored: Fri May 13 11:33:02 2016 -0700
Committer: Alan Gates <ga...@hortonworks.com>
Committed: Fri May 13 11:33:02 2016 -0700

----------------------------------------------------------------------
 README.txt | 40 +++++++++++++++++-----------------------
 1 file changed, 17 insertions(+), 23 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hive/blob/c4b2549b/README.txt
----------------------------------------------------------------------
diff --git a/README.txt b/README.txt
index 2d164e3..969abde 100644
--- a/README.txt
+++ b/README.txt
@@ -1,11 +1,13 @@
 Apache Hive (TM) @VERSION@
 ======================
 
-The Apache Hive (TM) data warehouse software facilitates querying and
-managing large datasets residing in distributed storage. Built on top
-of Apache Hadoop (TM), it provides:
+The Apache Hive (TM) data warehouse software facilitates reading, 
+writing, and managing large datasets residing in distributed storage
+using SQL. Built on top of Apache Hadoop (TM), it provides:
 
-* Tools to enable easy data extract/transform/load (ETL)
+* Tools to enable easy access to data via SQL, thus enabling data 
+  warehousing tasks such as extract/transform/load (ETL), reporting, 
+  and data analysis
 
 * A mechanism to impose structure on a variety of data formats
 
@@ -15,17 +17,11 @@ of Apache Hadoop (TM), it provides:
 * Query execution using Apache Hadoop MapReduce, Apache Tez
   or Apache Spark frameworks.
 
-Hive implements a dialect of SQL (Hive QL) that focuses on analytics
-and presents a rich set of SQL semantics including OLAP functions,
-subqueries, common table expressions and more. Hive allows SQL
-developers or users with SQL tools to easily query, analyze and
-process data stored in Hadoop.
-Hive also allows programmers familiar with the MapReduce framework
-to plug in their custom mappers and reducers to perform more
-sophisticated analysis that may not be supported by the built-in
-capabilities of the language. QL can also be extended with custom
-scalar functions (UDF's), aggregations (UDAF's), and table
-functions (UDTF's).
+Hive provides standard SQL functionality, including many of the later
+2003 and 2011 features for analytics.  These include OLAP functions, 
+subqueries, common table expressions, and more.  Hive's SQL can also be
+extended with user code via user defined functions (UDFs), user defined
+aggregates (UDAFs), and user defined table functions (UDTFs).
 
 Hive users have a choice of 3 runtimes when executing SQL queries.
 Users can choose between Apache Hadoop MapReduce, Apache Tez or
@@ -46,12 +42,10 @@ at any time. In each case, Hive is best suited for use cases
 where the amount of data processed is large enough to require a
 distributed system.
 
-Hive is not designed for online transaction processing and does
-not support row level insert/updates. It is best used for batch
-jobs over large sets of immutable data (like web logs). What
-Hive values most are scalability (scale out with more machines
-added dynamically to the Hadoop cluster), extensibility (with
-MapReduce framework and UDF/UDAF/UDTF), fault-tolerance, and
+Hive is not designed for online transaction processing. It is best used
+for traditional data warehousing tasks.  Hive is designed to maximize
+scalability (scale out with more machines added dynamically to the Hadoop
+cluster), performance, extensibility, fault-tolerance, and
 loose-coupling with its input formats.
 
 
@@ -79,9 +73,9 @@ Getting Started
 Requirements
 ============
 
-- Java 1.7
+- Java 1.7 or 1.8
 
-- Hadoop 1.x, 2.x
+- Hadoop 1.x, 2.x (2.x required for Hive 2.x)
 
 
 Upgrading from older versions of Hive