You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by ol...@apache.org on 2009/03/15 19:43:29 UTC

svn commit: r754716 - in /hadoop/pig/trunk: CHANGES.txt bin/pig src/org/apache/pig/data/SingleTupleBag.java

Author: olga
Date: Sun Mar 15 18:43:28 2009
New Revision: 754716

URL: http://svn.apache.org/viewvc?rev=754716&view=rev
Log:
changes in preparation for Pig 1.0.0 release

Modified:
    hadoop/pig/trunk/CHANGES.txt
    hadoop/pig/trunk/bin/pig
    hadoop/pig/trunk/src/org/apache/pig/data/SingleTupleBag.java

Modified: hadoop/pig/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/pig/trunk/CHANGES.txt?rev=754716&r1=754715&r2=754716&view=diff
==============================================================================
--- hadoop/pig/trunk/CHANGES.txt (original)
+++ hadoop/pig/trunk/CHANGES.txt Sun Mar 15 18:43:28 2009
@@ -1,159 +1,86 @@
-Pig Change Log
-
-Trunk (unreleased changes)
-
-  INCOMPATIBLE CHANGES
-
-  NEW FEATURES
-
-    PIG-554 Added fragment replicate map side join (shravanmn via pkamath and gates)
-
-	PIG-620: Added MaxTupleBy1stField UDF to piggybank (vzaliva via gates)
-
-	PIG-692: When running a job from a script, use the name of that script as
-	the default name for the job (vzaliva via gates)
-
-  OPTIMIZATIONS
-
-  BUG FIXES
-  	PIG-24 Files that were incorrectly placed under test/reports have been
-	removed.  ant clean now cleans test/reports.  (milindb via gates)
-
-	PIG-25 com.yahoo.pig dir left under pig/test by mistake. removed it (olgan@)
-
-	PIG-23 Made pig work with java 1.5. (milindb via gates)
-
-	PIG-8 added binary comparator (olgan)
-
-	PIG-17 integrated with Hadoop 0.15 (olgan@)
-
-    PIG-11 Add capability to search for jar file to register. (antmagna via
-	olgan)
-
-	PIG-20 Added custom comparator functions for order by (phunt via gates)
-
-	PIG-33 Help was commented out - uncommented (olgan)
-
-	PIG-31: second half of concurrent mode problem addressed (olgan)
-
-	PIG-14: added heartbeat functionality (olgan)
-
-	PIG-17: updated hadoop15.jar to match hadoop 0.15.1 release
-
-	PIG-7: Added use of combiner in some restricted cases. (gates)
-	
-	PIG-29: fixed bag factory to be properly initialized (utkarsh)
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
 
-    PIG-43: fixed problem where using the combiner prevented a pig alias
-    from being evaluated more than once. (gates)
-
-    PIG-45: Fixed pig.pl to not assume hodrc file is named the same as
-    cluster name (gates).
-
-    PIG-7 (more): Fixed bug in PigCombiner where it was writing IndexedTuples
-    instead of Tuples, causing Reducer to crash in some cases.
-
-    PIG-47: Added methods to DataMap to provide access to its content
-
-	PIG-12: Added time stamps to log4j messages (phunt via gates).
-
-	PIG-44: Added adaptive decision of the number of records to hold in memory 
-	before spilling (utkarsh)
-    PIG-39: created more efficient version of read (spullara via olgan)
-
-    PIG-41: Added patterns to svn:ignore
-
-    PIG-51: Fixed combiner in the presence of flattening
-
-	PIG-30: Rewrote DataBags to better handle decisions of when to spill to
-	disk and to spill more intelligently. (gates)
-
-	PIG-61: Fixed MapreducePlanCompiler to use PigContext to load up the
-	comparator function instead of Class.forName.  (gates)
-
-	PIG-56: Made DataBag implement Iterable. (groves via gates)
-
-	PIG-63: Fix for non-ascii UTF-8 data (breed@ and olgan@)
-
-	PIG-77: Added eclipse specific files to svn:ignore
-
-	PIG-57: Fixed NPE in PigContext.fixUpDomain (francisoud via gates)
+Pig Change Log
 
-	PIG-69: NPE in PigContext.setJobtrackerLocation (francisoud via gates)
+Release 1.0.0 - Unreleased
 
-	PIG-78: src/org/apache/pig/builtin/PigStorage.java doesn't compile (arun
-	via olgan)
+INCOMPATIBLE CHANGES
 
-	PIG-32: ABstraction layer (olgan)
+    PIG-157: Add types and rework execution pipeline (gates)
+    
+    PIG-458: integration with Hadoop 18 (olgan)
 
-	PIG-87: Fix pig.pl to find java via JAVA_HOME instead of hardcoded default
-	path.  Also fix it to not die if pigclient.conf is missing. (craigm via
-	gates).
+NEW FEATURES
+    PIG-139: command line editing (daijy via olgan)
 
-	PIG-89: Fix DefaultDataBag, DistinctDataBag, SortedDataBag to close spill
-	files when they are done spilling (contributions by craigm, breed, and
-	gates, committed by gates).
+    PIG-554 Added fragment replicate map side join (shravanmn via pkamath and gates)
 
-	PIG-95: Remove System.exit() statements from inside pig (joa23 via gates).
+    PIG-535: added rmf command
 
-	PIG-65: convert tabs to spaces (groves via olgan)
+IMPROVEMENTS
 
-	PIG-97: Turn off combiner in the case of Cogroup, as it doesn't work when
-	more than one bag is involved (gates).
+    PIG-270: proper line number for parse errors (daijy via olgan)
+    
+    PIG-367: convinience function for UDFs to name schema
 
-	PIG-92: Fix NullPointerException in PIgContext due to uninitialized conf
-	reference. (francisoud via gates)
+    PIG-443:  Illustrate for the Types branch (shubham via olgan)
 
-	PIG-83: Change everything except grunt and Main (PigServer on down) to use
-	common logging abstraction instead of log4j.  By default in grunt, log4j
-	still used as logging layer.  Also converted all System.out/err.println
-	statements to use logging instead. (francisoud via gates)
+	PIG-599: Added buffering to BufferedPositionedInputStream (gates)
+    
+    PIG-629: performance improvement: getting rid of targeted tuple (pradeepkth
+    via olgan)
 
-	PIG-80: In a number of places stack trace information was being lost by an
-	exception being caught, and a different exception then thrown.  All those
-	locations have been changed so that the new exception now wraps the old.
-	(francisoud via gates).
+    PIG-628: misc performance improvements (pradeepkth via olgan)
 
-	PIG-84: Converted printStackTrace calls to calls to the logger.
-	(francisoud via gates).
+    PIG-589: error handling, phase 1-2 (sms via olgan)
 
-	PIG-88: Remove unused HadoopExe import from Main.  (pi_song via gates).
+    PIG-590: error handling, phase 3 (sms)
 
-	PIG-99: Fix to make unit tests not run out of memory. (francisoud via
-	gates).
+    PIG-591: error handling, phase 4 (sms)
 
-    PIG-107: enabled several tests. (francisoud via olgan)
+    PIG-545: PERFORMANCE: Sampler for order bys does not produce a good
+    distribution (pradeepkth)
 
-    PIG-46: abort processing on error for non-interactive mode (olston via
-    olgan)
+    PIG-580: using combiner to compute distinct aggs (pradeepkth via olgan)
 
-    PIG-109: improved exception handling (oae via olgan)
+    PIG-636: Use lightweight bag implementations which do not register with
+    SpillableMemoryManager with Combiner (pradeepkth)
 
-	PIG-72: Move unit tests to use MiniDFS and MiniMR so that unit tests can
-	be run w/o access to a hadoop cluster. (xuzh via gates)
+    PIG-563: support for multiple combiner invocations (pradeepkth via olgan)
 
-    PIG-68: improvements to build.xml (joa23 via olgan)
+    PIG-465: performance improvement - removing keys from the value (pradeepkth
+    via olgan)
 
-	PIG-110: Replaced code accidently merged out in PIG-32 fix that handled
-	flattening the combiner case. (gates and oae)
+    PIG-450: PERFORMANCE: Distinct should make use of combiner to remove
+    duplicate values from keys. (gates)
 
-    PIG-213: Remove non-static references to logger from data bags and tuples, 
-    as it causes significant overhead (vgeschel via gates).
+    PIG-350: PERFORMANCE: Join optimization for pipeline rework (pradeepkth
+    via gates)
 
-    PIG-284: target for building source jar (oae via olgan)
+BUG FIXES
 
     PIG-294: string comparator unit tests (sms via pi_song)
 
     PIG-258: cleaning up directories on failure (daijy via olgan)
 
-    PIG-139: command line editing (daijy via olgan)
-
-    PIG-270: proper line number for parse errors (daijy via olgan)
-
     PIG-363: fix for describe to produce schema name
 
-    PIG-367: convinience function for UDFs to name schema
-
     PIG-368: making JobConf available to Load/Store UDFs
 
     PIG-311: cross is broken
@@ -254,15 +181,11 @@
 
     PIG-455: "group" alias is lost after a flatten(group) (pradeepkth vi olgan)
 
-    PIG-458: integration with Hadoop 18 (olgan)
-
     PIG-459: increased sleep time before checking for job progress
 
     PIG-462: LIMIT N should create one output file with N rows (shravanmn via
     olgan)
 
-    PIG-443:  Illustrate for the Types branch (shubham via olgan)
-    
     PIG-376: set job name (olgan)
 
     PIG-463: POCast changes (pradeepkth via olgan)
@@ -283,9 +206,6 @@
 
     PIG-471: ignoring status errors from hadoop (pradeepkth via olgan)
 
-    PIG-465: performance improvement - removing keys from the value (pradeepkth
-    via olgan)
-    
     PIG-489: (*) processing (sms via olgan)
 
     PIG-475: missing heartbeats (shravanmn via olgan)
@@ -351,10 +271,6 @@
 
     PIG-522: make negation work (pradeepkth via olgan)
 
-    PIG-563: support for multiple combiner invocations (pradeepkth via olgan)
-
-    PIG-580: using combiner to compute distinct aggs (pradeepkth via olgan)
-
     PIG-558: Distinct followed by a Join results in Invalid size 0 for a tuple
     error (pradeepkth via olgan)
 
@@ -363,21 +279,12 @@
 
     PIG-570:  problems with handling bzip data (breed via olgan)
 
-	PIG-599: Added buffering to BufferedPositionedInputStream (gates)
-
     PIG-597: Fix for how * is treated by UDFs (shravanmn via olgan)
 
-    PIG-629: performance improvement: getting rid of targeted tuple (pradeepkth
-    via olgan)
-
     PIG-623: Fix spelling errors in output messages (tomwhite via sms)
 
     PIG-622: Include pig executable in distribution (tomwhite via sms)
 
-    PIG-628: misc performance improvements (pradeepkth via olgan)
-    
-    PIG-589: error handling, phase 1-2 (sms via olgan)
-
     PIG-615: Wrong number of jobs with limit (shravanmn via sms)
 
     PIG-635: POCast.java has incorrect formatting (sms)
@@ -427,9 +334,6 @@
 
     PIG-590: error handling on the backend (sms)
 
-    PIG-545: PERFORMANCE: Sampler for order bys does not produce a good
-    distribution (pradeepkth)
-
     PIG-658: Data type long : When 'L' or 'l' is included with data 
     (123L or 123l) load produces null value. Also the case with Float (thejas
     via sms)
@@ -475,4 +379,183 @@
 
     PIG-715: doc updates (chandec vi olgan)
 
+	PIG-620: Added MaxTupleBy1stField UDF to piggybank (vzaliva via gates)
+
+	PIG-692: When running a job from a script, use the name of that script as
+	the default name for the job (vzaliva via gates)
+
     PIG-718: To add standard ant targets to build.xml file  (gkesavan via olgan)
+
+Release 0.1.1 - 2008-12-04
+
+INCOMPATIBLE CHANGES
+
+NEW FEATURES
+
+IMPROVEMENTS
+
+PIG-253: integration with hadoop-18
+
+BUG FIXES
+
+PIG-342: Fix DistinctDataBag to recalculate size after it has spilled.
+(bdimcheff via gates)
+
+Release 0.1.0 - 2008-09-11
+
+  INCOMPATIBLE CHANGES
+
+  PIG-123: requires escape of '\' in chars and string
+
+  NEW FEATURES
+
+  PIG-20 Added custom comparator functions for order by (phunt via gates)
+  
+  PIG-94: Streaming implementation (arunc via olgan)
+  
+  PIG-58: parameter substitution
+
+  PIG-55: added custom splitter (groves via olgan)
+  
+  PIG-59: Add a new ILLUSTRATE command (shubhamc via gates).
+
+  PIG-256: Added variable argument support for UDFs (pi_song)
+
+  IMPROVEMENTS:
+
+  PIG-8 added binary comparator (olgan)
+  
+  PIG-11 Add capability to search for jar file to register. (antmagna via olgan)
+  
+  PIG-7: Added use of combiner in some restricted cases. (gates)
+
+  PIG-47: Added methods to DataMap to provide access to its content
+
+  PIG-30: Rewrote DataBags to better handle decisions of when to spill to
+	disk and to spill more intelligently. (gates)
+
+  PIG-12: Added time stamps to log4j messages (phunt via gates).
+
+  PIG-44: Added adaptive decision of the number of records to hold in memory 
+	before spilling (utkarsh)
+
+  PIG-56: Made DataBag implement Iterable. (groves via gates)
+
+  PIG-39: created more efficient version of read (spullara via olgan)
+
+  PIG-32: ABstraction layer (olgan)
+
+  PIG-83: Change everything except grunt and Main (PigServer on down) to use
+	common logging abstraction instead of log4j.  By default in grunt, log4j
+	still used as logging layer.  Also converted all System.out/err.println
+	statements to use logging instead. (francisoud via gates)
+
+  PIG-13: adding version to the system (joa23 via olgan)
+
+  PIG-113:  Make explain output more understandable (pi_song via gates)
+
+  PIG-120:  Support map reduce in local mode.  To do this user needs to
+    specify execution type as mapreduce and cluster name as local (joa23 via gates).
+
+  PIG-106:  Change StringBuffer and String '+' to StringBuilder (francisoud via gates).
+
+  PIG-111: Reworked configuration to be setable via properties. (joa23, pi_song, oae via gates).
+      
+  BUG FIXES
+  	PIG-24 Files that were incorrectly placed under test/reports have been
+	removed.  ant clean now cleans test/reports.  (milindb via gates)
+
+	PIG-25 com.yahoo.pig dir left under pig/test by mistake. removed it (olgan@)
+
+	PIG-23 Made pig work with java 1.5. (milindb via gates)
+
+	PIG-17 integrated with Hadoop 0.15 (olgan@)
+
+	PIG-33 Help was commented out - uncommented (olgan)
+
+	PIG-31: second half of concurrent mode problem addressed (olgan)
+
+	PIG-14: added heartbeat functionality (olgan)
+
+	PIG-17: updated hadoop15.jar to match hadoop 0.15.1 release
+
+	PIG-29: fixed bag factory to be properly initialized (utkarsh)
+
+    PIG-43: fixed problem where using the combiner prevented a pig alias
+    from being evaluated more than once. (gates)
+
+    PIG-45: Fixed pig.pl to not assume hodrc file is named the same as
+    cluster name (gates).
+
+    PIG-7 (more): Fixed bug in PigCombiner where it was writing IndexedTuples
+    instead of Tuples, causing Reducer to crash in some cases.
+
+    PIG-41: Added patterns to svn:ignore
+
+    PIG-51: Fixed combiner in the presence of flattening
+
+	PIG-61: Fixed MapreducePlanCompiler to use PigContext to load up the
+	comparator function instead of Class.forName.  (gates)
+
+	PIG-63: Fix for non-ascii UTF-8 data (breed@ and olgan@)
+
+	PIG-77: Added eclipse specific files to svn:ignore
+
+	PIG-57: Fixed NPE in PigContext.fixUpDomain (francisoud via gates)
+
+	PIG-69: NPE in PigContext.setJobtrackerLocation (francisoud via gates)
+
+	PIG-78: src/org/apache/pig/builtin/PigStorage.java doesn't compile (arun
+	via olgan)
+
+	PIG-87: Fix pig.pl to find java via JAVA_HOME instead of hardcoded default
+	path.  Also fix it to not die if pigclient.conf is missing. (craigm via
+	gates).
+
+	PIG-89: Fix DefaultDataBag, DistinctDataBag, SortedDataBag to close spill
+	files when they are done spilling (contributions by craigm, breed, and
+	gates, committed by gates).
+
+	PIG-95: Remove System.exit() statements from inside pig (joa23 via gates).
+
+	PIG-65: convert tabs to spaces (groves via olgan)
+
+	PIG-97: Turn off combiner in the case of Cogroup, as it doesn't work when
+	more than one bag is involved (gates).
+
+	PIG-92: Fix NullPointerException in PIgContext due to uninitialized conf
+	reference. (francisoud via gates)
+
+	PIG-80: In a number of places stack trace information was being lost by an
+	exception being caught, and a different exception then thrown.  All those
+	locations have been changed so that the new exception now wraps the old.
+	(francisoud via gates).
+
+	PIG-84: Converted printStackTrace calls to calls to the logger.
+	(francisoud via gates).
+
+	PIG-88: Remove unused HadoopExe import from Main.  (pi_song via gates).
+
+	PIG-99: Fix to make unit tests not run out of memory. (francisoud via
+	gates).
+
+    PIG-107: enabled several tests. (francisoud via olgan)
+
+    PIG-46: abort processing on error for non-interactive mode (olston via
+    olgan)
+
+    PIG-109: improved exception handling (oae via olgan)
+
+	PIG-72: Move unit tests to use MiniDFS and MiniMR so that unit tests can
+	be run w/o access to a hadoop cluster. (xuzh via gates)
+
+    PIG-68: improvements to build.xml (joa23 via olgan)
+
+	PIG-110: Replaced code accidently merged out in PIG-32 fix that handled
+	flattening the combiner case. (gates and oae)
+
+    PIG-213: Remove non-static references to logger from data bags and tuples, 
+    as it causes significant overhead (vgeschel via gates).
+
+    PIG-284: target for building source jar (oae via olgan)
+

Modified: hadoop/pig/trunk/bin/pig
URL: http://svn.apache.org/viewvc/hadoop/pig/trunk/bin/pig?rev=754716&r1=754715&r2=754716&view=diff
==============================================================================
--- hadoop/pig/trunk/bin/pig (original)
+++ hadoop/pig/trunk/bin/pig Sun Mar 15 18:43:28 2009
@@ -1,4 +1,21 @@
 #!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 # 
 # The Pig command script
 #

Modified: hadoop/pig/trunk/src/org/apache/pig/data/SingleTupleBag.java
URL: http://svn.apache.org/viewvc/hadoop/pig/trunk/src/org/apache/pig/data/SingleTupleBag.java?rev=754716&r1=754715&r2=754716&view=diff
==============================================================================
--- hadoop/pig/trunk/src/org/apache/pig/data/SingleTupleBag.java (original)
+++ hadoop/pig/trunk/src/org/apache/pig/data/SingleTupleBag.java Sun Mar 15 18:43:28 2009
@@ -1,3 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
 /**
  * 
  */