You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/04/18 03:05:29 UTC
[Hadoop Wiki] Update of "Hive/Roadmap" by NamitJain
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by NamitJain:
http://wiki.apache.org/hadoop/Hive/Roadmap
------------------------------------------------------------------------------
Before adding to the list below, please check [https://issues.apache.org/jira/browse/HADOOP/component/12312455 JIRA] to see if a ticket has already been opened for the feature. If not, please open a ticket on the [http://issues.apache.org/jira/browse/HADOOP Hadoop JIRA] and select "contrib/hive" as the component and also update the following list.
- = 10/27/08 Roadmap Update =
-
- 1. Integrating Dynamic SerDe with the DDL. (Zheng/Pete) - This allows the users to create typed tables along with list and map types from the DDL
- 2. Support for Statistics. (Ashish) - These stats are needed to make optimization decisions
- 3. Join Optimizations. (Prasad) - Mapside joins, semi join techniques etc to do the join faster
- 4. Predicate Pushdown Optimizations. (Namit) - pushing predicates just above the table scan for certain situations in joins as well as ensuring that only required columns are sent across map/reduce boundaries
- 5. Group By Optimizations. (Joydeep) - various optimizations to make group by faster
- 6. Optimizations to reduce the number of map files created by filter operations. (Dhrubha) - Filters with a large number of mappers produces a lot of files which slows down the following operations. This tries to address problems with that.
- 7. Transformations in LOAD. (Joydeep) - LOAD currently does not transform the input data if it is not in the format expected by the destination table.
- 8. Schemaless map/reduce. (Zheng) - TRANSFORM needs schema while map/reduce is schema less.
- 9. Improvements to TRANSFORM. (Zheng) - Make this more intuitive to map/reduce developers - evaluate some other keywords etc..
- 10. Error Reporting Improvements. (Pete) - Make error reporting for parse errors better
- 11. Help on CLI. (Joydeep) - add help to the CLI
- 12. Explode and Collect Operators. (Zheng) - Explode and collect operators to convert collections to individual items and vice versa.
- 13. Propagating sort properties to destination tables. (Prasad) - If the query produces sorted we want to capture that in the destination table's metadata so that downstream optimizations can be enabled.
-
- Other contributions from outside FB ...
- 14. JDBC driver (Michi Mutsuzaki @ stanford.edu, Raghu @ stanford.edu)
- 15. Fixes to CLI driver (Jeremy Huylebroeck)
- 16. Web interface...
= Roadmap/call to add more features =
The following is the list of useful features that are on the Hive Roadmap:
+ * HAVING clause support
+ * Support for various statistical functions like Median, Standard Deviation, Variance etc.
+ * Support for Create Table as Select
+ * Support for views
+ * Support for Insert Appends
+ * Support for Inserts without listing the partitioning columns explicitly - the query should be able to derive that
+ * Support for Indexes
+ * Support for IN
+ * Support for Column Alias
+ * Support for Statistics. - These stats are needed to make optimization decisions
+ * Join Optimizations. - Mapside joins, semi join techniques etc to do the join faster
+ * Optimizations to reduce the number of map files created by filter operations.
+ * Transformations in LOAD. - LOAD currently does not transform the input data if it is not in the format expected by the destination table.
+ * Schemaless map/reduce. - TRANSFORM needs schema while map/reduce is schema less.
+ * Improvements to TRANSFORM. - Make this more intuitive to map/reduce developers - evaluate some other keywords etc..
+ * Error Reporting Improvements. - Make error reporting for parse errors better
+ * Help on CLI. - add help to the CLI
+ * Explode and Collect Operators. - Explode and collect operators to convert collections to individual items and vice versa.
+ * Propagating sort properties to destination tables. - If the query produces sorted we want to capture that in the destination table's metadata so that downstream optimizations can be enabled.
+ * Propagating bucketing properties to destination tables.
* Multiple group-by inserts
* Generate multiple group-by results by scanning the source table only once
* Example:
@@ -39, +39 @@
* Let the user register UDF and UDAF
* Expose register functions in UDFRegistry and UDAFRegistry
* Provide commands in HiveCli to call those register functions
- * JDBC driver
+ * ODBC/JDBC driver
* Alter table
* rename column
* serde properties (delims, thrift classes)