You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hive.apache.org by th...@apache.org on 2014/05/01 00:40:16 UTC

svn commit: r1591507 - /hive/trunk/README.txt

Author: thejas
Date: Wed Apr 30 22:40:16 2014
New Revision: 1591507

URL: http://svn.apache.org/r1591507
Log:
HIVE-6932 : hive README needs update (Thejas Nair, reviewed by Ashutosh Chauhan)

Modified:
    hive/trunk/README.txt

Modified: hive/trunk/README.txt
URL: http://svn.apache.org/viewvc/hive/trunk/README.txt?rev=1591507&r1=1591506&r2=1591507&view=diff
==============================================================================
--- hive/trunk/README.txt (original)
+++ hive/trunk/README.txt Wed Apr 30 22:40:16 2014
@@ -12,36 +12,41 @@ of Apache Hadoop (TM), it provides:
 * Access to files stored either directly in Apache HDFS (TM) or in other
   data storage systems such as Apache HBase (TM)
 
-* Query execution via MapReduce
+* Query execution using Apache Hadoop MapReduce or Apache Tez
+  frameworks.
 
-Hive defines a simple SQL-like query language, called QL, that enables
-users familiar with SQL to query the data. At the same time, this
-language also allows programmers who are familiar with the MapReduce
-framework to be able to plug in their custom mappers and reducers to
-perform more sophisticated analysis that may not be supported by the
-built-in capabilities of the language. QL can also be extended with
-custom scalar functions (UDF's), aggregations (UDAF's), and table
+Hive implements a dialect of SQL (Hive QL) that focuses on analytics
+and presents a rich set of SQL semantics including OLAP functions,
+subqueries, common table expressions and more. Hive allows SQL
+developers or users with SQL tools to easily query, analyze and
+process data stored in Hadoop.
+Hive also allows programmers familiar with the MapReduce framework
+to plug in their custom mappers and reducers to perform more
+sophisticated analysis that may not be supported by the built-in
+capabilities of the language. QL can also be extended with custom
+scalar functions (UDF's), aggregations (UDAF's), and table
 functions (UDTF's).
 
-Please note that Hadoop is a batch processing system and Hadoop jobs
-tend to have high latency and incur substantial overheads in job
-submission and scheduling. Consequently the average latency for Hive
-queries is generally very high (minutes) even when data sets involved
-are very small (say a few hundred megabytes). As a result it cannot be
-compared with systems such as Oracle where analyses are conducted on a
-significantly smaller amount of data but the analyses proceed much
-more iteratively with the response times between iterations being less
-than a few minutes. Hive aims to provide acceptable (but not optimal)
-latency for interactive data browsing, queries over small data sets or
-test queries.
-
-Hive is not designed for online transaction processing and does not
-support real-time queries or row level insert/updates. It is best used
-for batch jobs over large sets of immutable data (like web logs). What
-Hive values most are scalability (scale out with more machines added
-dynamically to the Hadoop cluster), extensibility (with MapReduce
-framework and UDF/UDAF/UDTF), fault-tolerance, and loose-coupling with
-its input formats.
+Hive users have a choice of 2 runtimes when executing SQL queries.
+Users can choose to use the Apache Hadoop MapReduce framework,
+which is mature and proven at large scales. MapReduce is a purely
+batch framework, and queries run using the MapReduce framework
+may experience higher latencies (tens of seconds), even
+over small datasets. Alternatively, users can choose to use the
+newer Apache Tez framework to process SQL queries. Tez is
+designed for interactive query and has substantially reduced
+overheads versus MapReduce. Users are free to swtich back and
+forth between these frameworks at any time. In either case,
+Hive is best suited for use cases where the amount of data
+processed is large enough to require a distributed system.
+
+Hive is not designed for online transaction processing and does
+not support row level insert/updates. It is best used for batch
+jobs over large sets of immutable data (like web logs). What
+Hive values most are scalability (scale out with more machines
+added dynamically to the Hadoop cluster), extensibility (with
+MapReduce framework and UDF/UDAF/UDTF), fault-tolerance, and
+loose-coupling with its input formats.
 
 
 General Info
@@ -68,9 +73,9 @@ Getting Started
 Requirements
 ============
 
-- Java 1.6
+- Java 1.6, 1.7
 
-- Hadoop 0.20.x (x >= 1)
+- Hadoop 1.x, 2.x
 
 
 Upgrading from older versions of Hive
@@ -82,9 +87,10 @@ Upgrading from older versions of Hive
   schema upgrade scripts located in the scripts/metastore/upgrade
   directory.
 
-- We have provided upgrade scripts for MySQL, PostgreSQL, Oracle and Derby
-  databases. If you are using a different database for your MetaStore
-  you will need to provide your own upgrade script.
+- We have provided upgrade scripts for MySQL, PostgreSQL, Oracle,
+  Microsoft SQL Server, and Derby databases. If you are using a
+  different database for your MetaStore you will need to provide
+  your own upgrade script.
 
 Useful mailing lists
 ====================