You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hive.apache.org by gu...@apache.org on 2014/10/21 23:12:49 UTC
svn commit: r1633468 - in /hive/branches/branch-0.14: data/files/
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/
ql/src/java/org/apache/hadoop/hive/ql/plan/
ql/src/java/org/apache/hadoop/hive/ql/stats/
ql/src/test/queries/clientposit...
Author: gunther
Date: Tue Oct 21 21:12:49 2014
New Revision: 1633468
URL: http://svn.apache.org/r1633468
Log:
HIVE-8168: With dynamic partition enabled fact table selectivity is not taken into account when generating the physical plan (Use CBO cardinality using physical plan generation) (Prasanth J via Gunther Hagleitner)
Added:
hive/branches/branch-0.14/data/files/customer_address.txt
hive/branches/branch-0.14/data/files/store.txt
hive/branches/branch-0.14/ql/src/test/queries/clientpositive/annotate_stats_join_pkfk.q
hive/branches/branch-0.14/ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out
Modified:
hive/branches/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
hive/branches/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/plan/ColStatistics.java
hive/branches/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
Added: hive/branches/branch-0.14/data/files/customer_address.txt
URL: http://svn.apache.org/viewvc/hive/branches/branch-0.14/data/files/customer_address.txt?rev=1633468&view=auto
==============================================================================
--- hive/branches/branch-0.14/data/files/customer_address.txt (added)
+++ hive/branches/branch-0.14/data/files/customer_address.txt Tue Oct 21 21:12:49 2014
@@ -0,0 +1,20 @@
+1|AAAAAAAABAAAAAAA|18|Jackson |Parkway|Suite 280|Fairfield|Maricopa County|AZ|86192|United States|-7|condo|
+2|AAAAAAAACAAAAAAA|362|Washington 6th|RD|Suite 80|Fairview|Taos County|NM|85709|United States|-7|condo|
+3|AAAAAAAADAAAAAAA|585|Dogwood Washington|Circle|Suite Q|Pleasant Valley|York County|PA|12477|United States|-5|single family|
+4|AAAAAAAAEAAAAAAA|111|Smith |Wy|Suite A|Oak Ridge|Kit Carson County|CO|88371|United States|-7|condo|
+5|AAAAAAAAFAAAAAAA|31|College |Blvd|Suite 180|Glendale|Barry County|MO|63951|United States|-6|single family|
+6|AAAAAAAAGAAAAAAA|59|Williams Sixth|Parkway|Suite 100|Lakeview|Chelan County|WA|98579|United States|-8|single family|
+7|AAAAAAAAHAAAAAAA||Hill 7th|Road|Suite U|Farmington|||39145|United States|||
+8|AAAAAAAAIAAAAAAA|875|Lincoln |Ct.|Suite Y|Union|Bledsoe County|TN|38721|United States|-5|apartment|
+9|AAAAAAAAJAAAAAAA|819|1st Laurel|Ave|Suite 70|New Hope|Perry County|AL|39431|United States|-6|condo|
+10|AAAAAAAAKAAAAAAA|851|Woodland Poplar|ST|Suite Y|Martinsville|Haines Borough|AK|90419|United States|-9|condo|
+11|AAAAAAAALAAAAAAA|189|13th 2nd|Street|Suite 470|Maple Grove|Madison County|MT|68252|United States|-7|single family|
+12|AAAAAAAAMAAAAAAA|76|Ash 8th|Ct.|Suite O|Edgewood|Mifflin County|PA|10069|United States|-5|apartment|
+13|AAAAAAAANAAAAAAA|424|Main Second|Ln|Suite 130|Greenville|Noxubee County|MS|51387|United States|-6|single family|
+14|AAAAAAAAOAAAAAAA|923|Pine Oak|Dr.|Suite 100||Lipscomb County|TX|77752||-6||
+15|AAAAAAAAPAAAAAAA|314|Spring |Ct.|Suite B|Oakland|Washington County|OH|49843|United States|-5|apartment|
+16|AAAAAAAAABAAAAAA|576|Adams Center|Street|Suite J|Valley View|Oldham County|TX|75124|United States|-6|condo|
+17|AAAAAAAABBAAAAAA|801|Green |Dr.|Suite 0|Montpelier|Richland County|OH|48930|United States|-5|single family|
+18|AAAAAAAACBAAAAAA|460|Maple Spruce|Court|Suite 480|Somerville|Potter County|SD|57783|United States|-7|condo|
+19|AAAAAAAADBAAAAAA|611|Wilson |Way|Suite O|Oakdale|Tangipahoa Parish|LA|79584|United States|-6|apartment|
+20|AAAAAAAAEBAAAAAA|675|Elm Wilson|Street|Suite I|Hopewell|Williams County|OH|40587|United States|-5|condo|
Added: hive/branches/branch-0.14/data/files/store.txt
URL: http://svn.apache.org/viewvc/hive/branches/branch-0.14/data/files/store.txt?rev=1633468&view=auto
==============================================================================
--- hive/branches/branch-0.14/data/files/store.txt (added)
+++ hive/branches/branch-0.14/data/files/store.txt Tue Oct 21 21:12:49 2014
@@ -0,0 +1,12 @@
+1|AAAAAAAABAAAAAAA|1997-03-13||2451189|ought|245|5250760|8AM-4PM|William Ward|2|Unknown|Enough high areas stop expectations. Elaborate, local is|Charles Bartley|1|Unknown|1|Unknown|767|Spring |Wy|Suite 250|Midway|Williamson County|TN|31904|United States|-5|0.03|
+2|AAAAAAAACAAAAAAA|1997-03-13|2000-03-12||able|236|5285950|8AM-4PM|Scott Smith|8|Unknown|Parliamentary candidates wait then heavy, keen mil|David Lamontagne|1|Unknown|1|Unknown|255|Sycamore |Dr.|Suite 410|Midway|Williamson County|TN|31904|United States|-5|0.03|
+3|AAAAAAAACAAAAAAA|2000-03-13|||able|236|7557959|8AM-4PM|Scott Smith|7|Unknown|Impossible, true arms can treat constant, complete w|David Lamontagne|1|Unknown|1|Unknown|877|Park Laurel|Road|Suite T|Midway|Williamson County|TN|31904|United States|-5|0.03|
+4|AAAAAAAAEAAAAAAA|1997-03-13|1999-03-13|2451044|ese|218|9341467|8AM-4PM|Edwin Adams|4|Unknown|Events would achieve other, eastern hours. Mechanisms must not eat other, new org|Thomas Pollack|1|Unknown|1|Unknown|27|Lake |Ln|Suite 260|Midway|Williamson County|TN|31904|United States|-5|0.03|
+5|AAAAAAAAEAAAAAAA|1999-03-14|2001-03-12|2450910|anti|288|9078805|8AM-4PM|Edwin Adams|8|Unknown|Events would achieve other, eastern hours. Mechanisms must not eat other, new org|Thomas Pollack|1|Unknown|1|Unknown|27|Lee 6th|Court|Suite 80|Fairview|Williamson County|TN|35709|United States|-5|0.03|
+6|AAAAAAAAEAAAAAAA|2001-03-13|||cally|229|9026222|8AM-4PM|Edwin Adams|10|Unknown|Events would achieve other, eastern hours. Mechanisms must not eat other, new org|Thomas Pollack|1|Unknown|1|Unknown|220|6th |Lane|Suite 140|Midway|Williamson County|TN|31904|United States|-5|0.03|
+7|AAAAAAAAHAAAAAAA|1997-03-13|||ation|297|8954883|8AM-4PM|David Thomas|9|Unknown|Architects coul|Thomas Benton|1|Unknown|1|Unknown|811|Lee |Circle|Suite T|Midway|Williamson County|TN|31904|United States|-5|0.01|
+8|AAAAAAAAIAAAAAAA|1997-03-13|2000-03-12||eing|278|6995995|8AM-4PM|Brett Yates|2|Unknown|Various bars make most. Difficult levels introduce at a boots. Buildings welcome only never el|Dean Morrison|1|Unknown|1|Unknown|226|12th |Lane|Suite D|Fairview|Williamson County|TN|35709|United States|-5|0.08|
+9|AAAAAAAAIAAAAAAA|2000-03-13|||eing|271|6995995|8AM-4PM|Brett Yates|2|Unknown|Formal, psychological pounds relate reasonable, young principles. Black, |Dean Morrison|1|Unknown|1|Unknown|226|Hill |Boulevard|Suite 190|Midway|Williamson County|TN|31904|United States|-5|0.08|
+10|AAAAAAAAKAAAAAAA|1997-03-13|1999-03-13||bar|294|9294113|8AM-4PM|Raymond Jacobs|8|Unknown|Little expectations include yet forward meetings.|Michael Wilson|1|Unknown|1|Unknown|175|4th |Court|Suite C|Midway|Williamson County|TN|31904|United States|-5|0.06|
+11|AAAAAAAAKAAAAAAA|1999-03-14|2001-03-12||ought|294|9294113|8AM-4PM|Raymond Jacobs|6|Unknown|Mysterious employe|Michael Wilson|1|Unknown|1|Unknown|175|Park Green|Court|Suite 160|Midway|Williamson County|TN|31904|United States|-5|0.11|
+12|AAAAAAAAKAAAAAAA|2001-03-13|||ought|294|5219562|8AM-12AM|Robert Thompson|6|Unknown|Events develop i|Dustin Kelly|1|Unknown|1|Unknown|337|College |Boulevard|Suite 100|Fairview|Williamson County|TN|31904|United States|-5|0.01|
Modified: hive/branches/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
URL: http://svn.apache.org/viewvc/hive/branches/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java?rev=1633468&r1=1633467&r2=1633468&view=diff
==============================================================================
--- hive/branches/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java (original)
+++ hive/branches/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java Tue Oct 21 21:12:49 2014
@@ -1021,11 +1021,17 @@ public class StatsRulesProcFactory {
*/
public static class JoinStatsRule extends DefaultStatsRule implements NodeProcessor {
+ private boolean pkfkInferred = false;
+ private long newNumRows = 0;
+ private List<Operator<? extends OperatorDesc>> parents;
+ private CommonJoinOperator<? extends JoinDesc> jop;
+ private int numAttr = 1;
+
@Override
public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx,
Object... nodeOutputs) throws SemanticException {
- CommonJoinOperator<? extends JoinDesc> jop = (CommonJoinOperator<? extends JoinDesc>) nd;
- List<Operator<? extends OperatorDesc>> parents = jop.getParentOperators();
+ jop = (CommonJoinOperator<? extends JoinDesc>) nd;
+ parents = jop.getParentOperators();
AnnotateStatsProcCtx aspCtx = (AnnotateStatsProcCtx) procCtx;
HiveConf conf = aspCtx.getConf();
boolean allStatsAvail = true;
@@ -1052,22 +1058,25 @@ public class StatsRulesProcFactory {
Statistics stats = new Statistics();
Map<String, Long> rowCountParents = new HashMap<String, Long>();
List<Long> distinctVals = Lists.newArrayList();
-
- // 2 relations, multiple attributes
- boolean multiAttr = false;
- int numAttr = 1;
int numParent = parents.size();
-
Map<String, ColStatistics> joinedColStats = Maps.newHashMap();
Map<Integer, List<String>> joinKeys = Maps.newHashMap();
List<Long> rowCounts = Lists.newArrayList();
+ // detect if there are multiple attributes in join key
+ ReduceSinkOperator rsOp = (ReduceSinkOperator) jop.getParentOperators().get(0);
+ List<ExprNodeDesc> keyExprs = rsOp.getConf().getKeyCols();
+ numAttr = keyExprs.size();
+
+ // infer PK-FK relationship in single attribute join case
+ inferPKFKRelationship();
+
// get the join keys from parent ReduceSink operators
for (int pos = 0; pos < parents.size(); pos++) {
ReduceSinkOperator parent = (ReduceSinkOperator) jop.getParentOperators().get(pos);
Statistics parentStats = parent.getStatistics();
- List<ExprNodeDesc> keyExprs = parent.getConf().getKeyCols();
+ keyExprs = parent.getConf().getKeyCols();
// Parent RS may have column statistics from multiple parents.
// Populate table alias to row count map, this will be used later to
@@ -1082,12 +1091,6 @@ public class StatsRulesProcFactory {
}
rowCounts.add(parentStats.getNumRows());
- // multi-attribute join key
- if (keyExprs.size() > 1) {
- multiAttr = true;
- numAttr = keyExprs.size();
- }
-
// compute fully qualified join key column names. this name will be
// used to quickly look-up for column statistics of join key.
// TODO: expressions in join condition will be ignored. assign
@@ -1110,7 +1113,7 @@ public class StatsRulesProcFactory {
// attribute join, else max(V(R,y1), V(S,y1)) * max(V(R,y2), V(S,y2))
// in case of multi-attribute join
long denom = 1;
- if (multiAttr) {
+ if (numAttr > 1) {
List<Long> perAttrDVs = Lists.newArrayList();
for (int idx = 0; idx < numAttr; idx++) {
for (Integer i : joinKeys.keySet()) {
@@ -1149,9 +1152,7 @@ public class StatsRulesProcFactory {
}
// Update NDV of joined columns to be min(V(R,y), V(S,y))
- if (multiAttr) {
- updateJoinColumnsNDV(joinKeys, joinedColStats, numAttr);
- }
+ updateJoinColumnsNDV(joinKeys, joinedColStats, numAttr);
// column statistics from different sources are put together and rename
// fully qualified column names based on output schema of join operator
@@ -1181,10 +1182,9 @@ public class StatsRulesProcFactory {
// update join statistics
stats.setColumnStats(outColStats);
- long newRowCount = computeNewRowCount(rowCounts, denom);
+ long newRowCount = pkfkInferred ? newNumRows : computeNewRowCount(rowCounts, denom);
- updateStatsForJoinType(stats, newRowCount, jop, rowCountParents,
- outInTabAlias);
+ updateStatsForJoinType(stats, newRowCount, jop, rowCountParents,outInTabAlias);
jop.setStatistics(stats);
if (isDebugEnabled) {
@@ -1229,6 +1229,146 @@ public class StatsRulesProcFactory {
return null;
}
+ private void inferPKFKRelationship() {
+ if (numAttr == 1) {
+ List<Integer> parentsWithPK = getPrimaryKeyCandidates(parents);
+
+ // in case of fact to many dimensional tables join, the join key in fact table will be
+ // mostly foreign key which will have corresponding primary key in dimension table.
+ // The selectivity of fact table in that case will be product of all selectivities of
+ // dimension tables (assumes conjunctivity)
+ for (Integer id : parentsWithPK) {
+ ColStatistics csPK = null;
+ Operator<? extends OperatorDesc> parent = parents.get(id);
+ for (ColStatistics cs : parent.getStatistics().getColumnStats()) {
+ if (cs.isPrimaryKey()) {
+ csPK = cs;
+ break;
+ }
+ }
+
+ // infer foreign key candidates positions
+ List<Integer> parentsWithFK = getForeignKeyCandidates(parents, csPK);
+ if (parentsWithFK.size() == 1 &&
+ parentsWithFK.size() + parentsWithPK.size() == parents.size()) {
+ Operator<? extends OperatorDesc> parentWithFK = parents.get(parentsWithFK.get(0));
+ List<Float> parentsSel = getSelectivity(parents, parentsWithPK);
+ Float prodSelectivity = 1.0f;
+ for (Float selectivity : parentsSel) {
+ prodSelectivity *= selectivity;
+ }
+ newNumRows = (long) (parentWithFK.getStatistics().getNumRows() * prodSelectivity);
+ pkfkInferred = true;
+
+ // some debug information
+ if (isDebugEnabled) {
+ List<String> parentIds = Lists.newArrayList();
+
+ // print primary key containing parents
+ for (Integer i : parentsWithPK) {
+ parentIds.add(parents.get(i).toString());
+ }
+ LOG.debug("STATS-" + jop.toString() + ": PK parent id(s) - " + parentIds);
+ parentIds.clear();
+
+ // print foreign key containing parents
+ for (Integer i : parentsWithFK) {
+ parentIds.add(parents.get(i).toString());
+ }
+ LOG.debug("STATS-" + jop.toString() + ": FK parent id(s) - " + parentIds);
+ }
+ }
+ }
+ }
+ }
+
+ /**
+ * Get selectivity of reduce sink operators.
+ * @param ops - reduce sink operators
+ * @param opsWithPK - reduce sink operators with primary keys
+ * @return - list of selectivity for primary key containing operators
+ */
+ private List<Float> getSelectivity(List<Operator<? extends OperatorDesc>> ops,
+ List<Integer> opsWithPK) {
+ List<Float> result = Lists.newArrayList();
+ for (Integer idx : opsWithPK) {
+ Operator<? extends OperatorDesc> op = ops.get(idx);
+ TableScanOperator tsOp = OperatorUtils
+ .findSingleOperatorUpstream(op, TableScanOperator.class);
+ long inputRow = tsOp.getStatistics().getNumRows();
+ long outputRow = op.getStatistics().getNumRows();
+ result.add((float) outputRow / (float) inputRow);
+ }
+ return result;
+ }
+
+ /**
+ * Returns the index of parents whose join key column statistics ranges are within the specified
+ * primary key range (inferred as foreign keys).
+ * @param ops - operators
+ * @param csPK - column statistics of primary key
+ * @return - list of foreign key containing parent ids
+ */
+ private List<Integer> getForeignKeyCandidates(List<Operator<? extends OperatorDesc>> ops,
+ ColStatistics csPK) {
+ List<Integer> result = Lists.newArrayList();
+ if (csPK == null || ops == null) {
+ return result;
+ }
+
+ for (int i = 0; i < ops.size(); i++) {
+ Operator<? extends OperatorDesc> op = ops.get(i);
+ if (op != null && op instanceof ReduceSinkOperator) {
+ ReduceSinkOperator rsOp = (ReduceSinkOperator) op;
+ List<ExprNodeDesc> keys = rsOp.getConf().getKeyCols();
+ List<String> fqCols = StatsUtils.getFullQualifedColNameFromExprs(keys,
+ rsOp.getColumnExprMap());
+ if (fqCols.size() == 1) {
+ String joinCol = fqCols.get(0);
+ if (rsOp.getStatistics() != null) {
+ ColStatistics cs = rsOp.getStatistics().getColumnStatisticsFromFQColName(joinCol);
+ if (cs != null && !cs.isPrimaryKey()) {
+ if (StatsUtils.inferForeignKey(csPK, cs)) {
+ result.add(i);
+ }
+ }
+ }
+ }
+ }
+ }
+ return result;
+ }
+
+ /**
+ * Returns the index of parents whose join key columns are infer as primary keys
+ * @param ops - operators
+ * @return - list of primary key containing parent ids
+ */
+ private List<Integer> getPrimaryKeyCandidates(List<Operator<? extends OperatorDesc>> ops) {
+ List<Integer> result = Lists.newArrayList();
+ if (ops != null || !ops.isEmpty()) {
+ for (int i = 0; i < ops.size(); i++) {
+ Operator<? extends OperatorDesc> op = ops.get(i);
+ if (op instanceof ReduceSinkOperator) {
+ ReduceSinkOperator rsOp = (ReduceSinkOperator) op;
+ List<ExprNodeDesc> keys = rsOp.getConf().getKeyCols();
+ List<String> fqCols = StatsUtils.getFullQualifedColNameFromExprs(keys,
+ rsOp.getColumnExprMap());
+ if (fqCols.size() == 1) {
+ String joinCol = fqCols.get(0);
+ if (rsOp.getStatistics() != null) {
+ ColStatistics cs = rsOp.getStatistics().getColumnStatisticsFromFQColName(joinCol);
+ if (cs != null && cs.isPrimaryKey()) {
+ result.add(i);
+ }
+ }
+ }
+ }
+ }
+ }
+ return result;
+ }
+
private Long getEasedOutDenominator(List<Long> distinctVals) {
// Exponential back-off for NDVs.
// 1) Descending order sort of NDVs
Modified: hive/branches/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/plan/ColStatistics.java
URL: http://svn.apache.org/viewvc/hive/branches/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/plan/ColStatistics.java?rev=1633468&r1=1633467&r2=1633468&view=diff
==============================================================================
--- hive/branches/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/plan/ColStatistics.java (original)
+++ hive/branches/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/plan/ColStatistics.java Tue Oct 21 21:12:49 2014
@@ -33,12 +33,14 @@ public class ColStatistics {
private long numTrues;
private long numFalses;
private Range range;
+ private boolean isPrimaryKey;
public ColStatistics(String tabAlias, String colName, String colType) {
this.setTableAlias(tabAlias);
this.setColumnName(colName);
this.setColumnType(colType);
this.setFullyQualifiedColName(StatsUtils.getFullyQualifiedColumnName(tabAlias, colName));
+ this.setPrimaryKey(false);
}
public ColStatistics() {
@@ -150,6 +152,12 @@ public class ColStatistics {
sb.append(numTrues);
sb.append(" numFalses: ");
sb.append(numFalses);
+ if (range != null) {
+ sb.append(" ");
+ sb.append(range);
+ }
+ sb.append(" isPrimaryKey: ");
+ sb.append(isPrimaryKey);
return sb.toString();
}
@@ -162,24 +170,47 @@ public class ColStatistics {
clone.setNumNulls(numNulls);
clone.setNumTrues(numTrues);
clone.setNumFalses(numFalses);
+ clone.setPrimaryKey(isPrimaryKey);
if (range != null ) {
clone.setRange(range.clone());
}
return clone;
}
+ public boolean isPrimaryKey() {
+ return isPrimaryKey;
+ }
+
+ public void setPrimaryKey(boolean isPrimaryKey) {
+ this.isPrimaryKey = isPrimaryKey;
+ }
+
public static class Range {
public final Number minValue;
public final Number maxValue;
+
Range(Number minValue, Number maxValue) {
super();
this.minValue = minValue;
this.maxValue = maxValue;
}
+
@Override
public Range clone() {
return new Range(minValue, maxValue);
}
+
+ @Override
+ public String toString() {
+ StringBuilder sb = new StringBuilder();
+ sb.append("Range: [");
+ sb.append(" min: ");
+ sb.append(minValue);
+ sb.append(" max: ");
+ sb.append(maxValue);
+ sb.append(" ]");
+ return sb.toString();
+ }
}
}
Modified: hive/branches/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
URL: http://svn.apache.org/viewvc/hive/branches/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java?rev=1633468&r1=1633467&r2=1633468&view=diff
==============================================================================
--- hive/branches/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java (original)
+++ hive/branches/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java Tue Oct 21 21:12:49 2014
@@ -177,6 +177,9 @@ public class StatsUtils {
colStats = getTableColumnStats(table, schema, neededColumns);
}
+ // infer if any column can be primary key based on column statistics
+ inferAndSetPrimaryKey(stats.getNumRows(), colStats);
+
stats.setColumnStatsState(deriveStatType(colStats, neededColumns));
stats.addToColumnStats(colStats);
} else if (partList != null) {
@@ -263,6 +266,9 @@ public class StatsUtils {
addParitionColumnStats(neededColumns, referencedColumns, schema, table, partList,
columnStats);
+ // infer if any column can be primary key based on column statistics
+ inferAndSetPrimaryKey(stats.getNumRows(), columnStats);
+
stats.addToColumnStats(columnStats);
State colState = deriveStatType(columnStats, referencedColumns);
if (aggrStats.getPartsFound() != partNames.size() && colState != State.NONE) {
@@ -277,6 +283,58 @@ public class StatsUtils {
return stats;
}
+
+ /**
+ * Based on the provided column statistics and number of rows, this method infers if the column
+ * can be primary key. It checks if the difference between the min and max value is equal to
+ * number of rows specified.
+ * @param numRows - number of rows
+ * @param colStats - column statistics
+ */
+ public static void inferAndSetPrimaryKey(long numRows, List<ColStatistics> colStats) {
+ if (colStats != null) {
+ for (ColStatistics cs : colStats) {
+ if (cs != null && cs.getRange() != null && cs.getRange().minValue != null &&
+ cs.getRange().maxValue != null) {
+ if (numRows ==
+ ((cs.getRange().maxValue.longValue() - cs.getRange().minValue.longValue()) + 1)) {
+ cs.setPrimaryKey(true);
+ }
+ }
+ }
+ }
+ }
+
+ /**
+ * Infer foreign key relationship from given column statistics.
+ * @param csPK - column statistics of primary key
+ * @param csFK - column statistics of potential foreign key
+ * @return
+ */
+ public static boolean inferForeignKey(ColStatistics csPK, ColStatistics csFK) {
+ if (csPK != null && csFK != null) {
+ if (csPK.isPrimaryKey()) {
+ if (csPK.getRange() != null && csFK.getRange() != null) {
+ ColStatistics.Range pkRange = csPK.getRange();
+ ColStatistics.Range fkRange = csFK.getRange();
+ return isWithin(fkRange, pkRange);
+ }
+ }
+ }
+ return false;
+ }
+
+ private static boolean isWithin(ColStatistics.Range range1, ColStatistics.Range range2) {
+ if (range1.minValue != null && range2.minValue != null && range1.maxValue != null &&
+ range2.maxValue != null) {
+ if (range1.minValue.longValue() >= range2.minValue.longValue() &&
+ range1.maxValue.longValue() <= range2.maxValue.longValue()) {
+ return true;
+ }
+ }
+ return false;
+ }
+
private static void addParitionColumnStats(List<String> neededColumns,
List<String> referencedColumns, List<ColumnInfo> schema, Table table,
PrunedPartitionList partList, List<ColStatistics> colStats)
@@ -533,6 +591,7 @@ public class StatsUtils {
// Columns statistics for complex datatypes are not supported yet
return null;
}
+
return cs;
}
Added: hive/branches/branch-0.14/ql/src/test/queries/clientpositive/annotate_stats_join_pkfk.q
URL: http://svn.apache.org/viewvc/hive/branches/branch-0.14/ql/src/test/queries/clientpositive/annotate_stats_join_pkfk.q?rev=1633468&view=auto
==============================================================================
--- hive/branches/branch-0.14/ql/src/test/queries/clientpositive/annotate_stats_join_pkfk.q (added)
+++ hive/branches/branch-0.14/ql/src/test/queries/clientpositive/annotate_stats_join_pkfk.q Tue Oct 21 21:12:49 2014
@@ -0,0 +1,123 @@
+set hive.stats.fetch.column.stats=true;
+
+drop table store_sales;
+drop table store;
+drop table customer_address;
+
+-- s_store_sk is PK, ss_store_sk is FK
+-- ca_address_sk is PK, ss_addr_sk is FK
+
+create table store_sales
+(
+ ss_sold_date_sk int,
+ ss_sold_time_sk int,
+ ss_item_sk int,
+ ss_customer_sk int,
+ ss_cdemo_sk int,
+ ss_hdemo_sk int,
+ ss_addr_sk int,
+ ss_store_sk int,
+ ss_promo_sk int,
+ ss_ticket_number int,
+ ss_quantity int,
+ ss_wholesale_cost float,
+ ss_list_price float,
+ ss_sales_price float,
+ ss_ext_discount_amt float,
+ ss_ext_sales_price float,
+ ss_ext_wholesale_cost float,
+ ss_ext_list_price float,
+ ss_ext_tax float,
+ ss_coupon_amt float,
+ ss_net_paid float,
+ ss_net_paid_inc_tax float,
+ ss_net_profit float
+)
+row format delimited fields terminated by '|';
+
+create table store
+(
+ s_store_sk int,
+ s_store_id string,
+ s_rec_start_date string,
+ s_rec_end_date string,
+ s_closed_date_sk int,
+ s_store_name string,
+ s_number_employees int,
+ s_floor_space int,
+ s_hours string,
+ s_manager string,
+ s_market_id int,
+ s_geography_class string,
+ s_market_desc string,
+ s_market_manager string,
+ s_division_id int,
+ s_division_name string,
+ s_company_id int,
+ s_company_name string,
+ s_street_number string,
+ s_street_name string,
+ s_street_type string,
+ s_suite_number string,
+ s_city string,
+ s_county string,
+ s_state string,
+ s_zip string,
+ s_country string,
+ s_gmt_offset float,
+ s_tax_precentage float
+)
+row format delimited fields terminated by '|';
+
+create table customer_address
+(
+ ca_address_sk int,
+ ca_address_id string,
+ ca_street_number string,
+ ca_street_name string,
+ ca_street_type string,
+ ca_suite_number string,
+ ca_city string,
+ ca_county string,
+ ca_state string,
+ ca_zip string,
+ ca_country string,
+ ca_gmt_offset float,
+ ca_location_type string
+)
+row format delimited fields terminated by '|';
+
+load data local inpath '../../data/files/store.txt' overwrite into table store;
+load data local inpath '../../data/files/store_sales.txt' overwrite into table store_sales;
+load data local inpath '../../data/files/customer_address.txt' overwrite into table customer_address;
+
+analyze table store compute statistics;
+analyze table store compute statistics for columns s_store_sk, s_floor_space;
+analyze table store_sales compute statistics;
+analyze table store_sales compute statistics for columns ss_store_sk, ss_addr_sk, ss_quantity;
+analyze table customer_address compute statistics;
+analyze table customer_address compute statistics for columns ca_address_sk;
+
+explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk);
+
+explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_store_sk > 0;
+
+explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_company_id > 0 and ss.ss_quantity > 10;
+
+explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_floor_space > 0;
+
+explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where ss.ss_quantity > 10;
+
+explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk);
+
+explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where s.s_store_sk > 1000;
+
+explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where s.s_floor_space > 1000;
+
+explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where ss.ss_quantity > 10;
+
+explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join customer_address ca on (ca.ca_address_sk = ss.ss_addr_sk);
+
+drop table store_sales;
+drop table store;
+drop table customer_address;
Added: hive/branches/branch-0.14/ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out
URL: http://svn.apache.org/viewvc/hive/branches/branch-0.14/ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out?rev=1633468&view=auto
==============================================================================
--- hive/branches/branch-0.14/ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out (added)
+++ hive/branches/branch-0.14/ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out Tue Oct 21 21:12:49 2014
@@ -0,0 +1,987 @@
+PREHOOK: query: drop table store_sales
+PREHOOK: type: DROPTABLE
+POSTHOOK: query: drop table store_sales
+POSTHOOK: type: DROPTABLE
+PREHOOK: query: drop table store
+PREHOOK: type: DROPTABLE
+POSTHOOK: query: drop table store
+POSTHOOK: type: DROPTABLE
+PREHOOK: query: drop table customer_address
+PREHOOK: type: DROPTABLE
+POSTHOOK: query: drop table customer_address
+POSTHOOK: type: DROPTABLE
+PREHOOK: query: -- s_store_sk is PK, ss_store_sk is FK
+-- ca_address_sk is PK, ss_addr_sk is FK
+
+create table store_sales
+(
+ ss_sold_date_sk int,
+ ss_sold_time_sk int,
+ ss_item_sk int,
+ ss_customer_sk int,
+ ss_cdemo_sk int,
+ ss_hdemo_sk int,
+ ss_addr_sk int,
+ ss_store_sk int,
+ ss_promo_sk int,
+ ss_ticket_number int,
+ ss_quantity int,
+ ss_wholesale_cost float,
+ ss_list_price float,
+ ss_sales_price float,
+ ss_ext_discount_amt float,
+ ss_ext_sales_price float,
+ ss_ext_wholesale_cost float,
+ ss_ext_list_price float,
+ ss_ext_tax float,
+ ss_coupon_amt float,
+ ss_net_paid float,
+ ss_net_paid_inc_tax float,
+ ss_net_profit float
+)
+row format delimited fields terminated by '|'
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@store_sales
+POSTHOOK: query: -- s_store_sk is PK, ss_store_sk is FK
+-- ca_address_sk is PK, ss_addr_sk is FK
+
+create table store_sales
+(
+ ss_sold_date_sk int,
+ ss_sold_time_sk int,
+ ss_item_sk int,
+ ss_customer_sk int,
+ ss_cdemo_sk int,
+ ss_hdemo_sk int,
+ ss_addr_sk int,
+ ss_store_sk int,
+ ss_promo_sk int,
+ ss_ticket_number int,
+ ss_quantity int,
+ ss_wholesale_cost float,
+ ss_list_price float,
+ ss_sales_price float,
+ ss_ext_discount_amt float,
+ ss_ext_sales_price float,
+ ss_ext_wholesale_cost float,
+ ss_ext_list_price float,
+ ss_ext_tax float,
+ ss_coupon_amt float,
+ ss_net_paid float,
+ ss_net_paid_inc_tax float,
+ ss_net_profit float
+)
+row format delimited fields terminated by '|'
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@store_sales
+PREHOOK: query: create table store
+(
+ s_store_sk int,
+ s_store_id string,
+ s_rec_start_date string,
+ s_rec_end_date string,
+ s_closed_date_sk int,
+ s_store_name string,
+ s_number_employees int,
+ s_floor_space int,
+ s_hours string,
+ s_manager string,
+ s_market_id int,
+ s_geography_class string,
+ s_market_desc string,
+ s_market_manager string,
+ s_division_id int,
+ s_division_name string,
+ s_company_id int,
+ s_company_name string,
+ s_street_number string,
+ s_street_name string,
+ s_street_type string,
+ s_suite_number string,
+ s_city string,
+ s_county string,
+ s_state string,
+ s_zip string,
+ s_country string,
+ s_gmt_offset float,
+ s_tax_precentage float
+)
+row format delimited fields terminated by '|'
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@store
+POSTHOOK: query: create table store
+(
+ s_store_sk int,
+ s_store_id string,
+ s_rec_start_date string,
+ s_rec_end_date string,
+ s_closed_date_sk int,
+ s_store_name string,
+ s_number_employees int,
+ s_floor_space int,
+ s_hours string,
+ s_manager string,
+ s_market_id int,
+ s_geography_class string,
+ s_market_desc string,
+ s_market_manager string,
+ s_division_id int,
+ s_division_name string,
+ s_company_id int,
+ s_company_name string,
+ s_street_number string,
+ s_street_name string,
+ s_street_type string,
+ s_suite_number string,
+ s_city string,
+ s_county string,
+ s_state string,
+ s_zip string,
+ s_country string,
+ s_gmt_offset float,
+ s_tax_precentage float
+)
+row format delimited fields terminated by '|'
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@store
+PREHOOK: query: create table customer_address
+(
+ ca_address_sk int,
+ ca_address_id string,
+ ca_street_number string,
+ ca_street_name string,
+ ca_street_type string,
+ ca_suite_number string,
+ ca_city string,
+ ca_county string,
+ ca_state string,
+ ca_zip string,
+ ca_country string,
+ ca_gmt_offset float,
+ ca_location_type string
+)
+row format delimited fields terminated by '|'
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@customer_address
+POSTHOOK: query: create table customer_address
+(
+ ca_address_sk int,
+ ca_address_id string,
+ ca_street_number string,
+ ca_street_name string,
+ ca_street_type string,
+ ca_suite_number string,
+ ca_city string,
+ ca_county string,
+ ca_state string,
+ ca_zip string,
+ ca_country string,
+ ca_gmt_offset float,
+ ca_location_type string
+)
+row format delimited fields terminated by '|'
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@customer_address
+PREHOOK: query: load data local inpath '../../data/files/store.txt' overwrite into table store
+PREHOOK: type: LOAD
+#### A masked pattern was here ####
+PREHOOK: Output: default@store
+POSTHOOK: query: load data local inpath '../../data/files/store.txt' overwrite into table store
+POSTHOOK: type: LOAD
+#### A masked pattern was here ####
+POSTHOOK: Output: default@store
+PREHOOK: query: load data local inpath '../../data/files/store_sales.txt' overwrite into table store_sales
+PREHOOK: type: LOAD
+#### A masked pattern was here ####
+PREHOOK: Output: default@store_sales
+POSTHOOK: query: load data local inpath '../../data/files/store_sales.txt' overwrite into table store_sales
+POSTHOOK: type: LOAD
+#### A masked pattern was here ####
+POSTHOOK: Output: default@store_sales
+PREHOOK: query: load data local inpath '../../data/files/customer_address.txt' overwrite into table customer_address
+PREHOOK: type: LOAD
+#### A masked pattern was here ####
+PREHOOK: Output: default@customer_address
+POSTHOOK: query: load data local inpath '../../data/files/customer_address.txt' overwrite into table customer_address
+POSTHOOK: type: LOAD
+#### A masked pattern was here ####
+POSTHOOK: Output: default@customer_address
+PREHOOK: query: analyze table store compute statistics
+PREHOOK: type: QUERY
+PREHOOK: Input: default@store
+PREHOOK: Output: default@store
+POSTHOOK: query: analyze table store compute statistics
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@store
+POSTHOOK: Output: default@store
+PREHOOK: query: analyze table store compute statistics for columns s_store_sk, s_floor_space
+PREHOOK: type: QUERY
+PREHOOK: Input: default@store
+#### A masked pattern was here ####
+POSTHOOK: query: analyze table store compute statistics for columns s_store_sk, s_floor_space
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@store
+#### A masked pattern was here ####
+PREHOOK: query: analyze table store_sales compute statistics
+PREHOOK: type: QUERY
+PREHOOK: Input: default@store_sales
+PREHOOK: Output: default@store_sales
+POSTHOOK: query: analyze table store_sales compute statistics
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@store_sales
+POSTHOOK: Output: default@store_sales
+PREHOOK: query: analyze table store_sales compute statistics for columns ss_store_sk, ss_addr_sk, ss_quantity
+PREHOOK: type: QUERY
+PREHOOK: Input: default@store_sales
+#### A masked pattern was here ####
+POSTHOOK: query: analyze table store_sales compute statistics for columns ss_store_sk, ss_addr_sk, ss_quantity
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@store_sales
+#### A masked pattern was here ####
+PREHOOK: query: analyze table customer_address compute statistics
+PREHOOK: type: QUERY
+PREHOOK: Input: default@customer_address
+PREHOOK: Output: default@customer_address
+POSTHOOK: query: analyze table customer_address compute statistics
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@customer_address
+POSTHOOK: Output: default@customer_address
+PREHOOK: query: analyze table customer_address compute statistics for columns ca_address_sk
+PREHOOK: type: QUERY
+PREHOOK: Input: default@customer_address
+#### A masked pattern was here ####
+POSTHOOK: query: analyze table customer_address compute statistics for columns ca_address_sk
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@customer_address
+#### A masked pattern was here ####
+PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk)
+PREHOOK: type: QUERY
+POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk)
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+ Stage: Stage-1
+ Map Reduce
+ Map Operator Tree:
+ TableScan
+ alias: s
+ Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: s_store_sk is not null (type: boolean)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: s_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: s_store_sk (type: int)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ TableScan
+ alias: ss
+ Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: ss_store_sk is not null (type: boolean)
+ Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: ss_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: ss_store_sk (type: int)
+ Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Operator Tree:
+ Join Operator
+ condition map:
+ Inner Join 0 to 1
+ condition expressions:
+ 0 {KEY.reducesinkkey0}
+ 1
+ outputColumnNames: _col0
+ Statistics: Num rows: 964 Data size: 3856 Basic stats: COMPLETE Column stats: COMPLETE
+ Select Operator
+ expressions: _col0 (type: int)
+ outputColumnNames: _col0
+ Statistics: Num rows: 964 Data size: 3856 Basic stats: COMPLETE Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ Statistics: Num rows: 964 Data size: 3856 Basic stats: COMPLETE Column stats: COMPLETE
+ table:
+ input format: org.apache.hadoop.mapred.TextInputFormat
+ output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+ Stage: Stage-0
+ Fetch Operator
+ limit: -1
+ Processor Tree:
+ ListSink
+
+PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_store_sk > 0
+PREHOOK: type: QUERY
+POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_store_sk > 0
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+ Stage: Stage-1
+ Map Reduce
+ Map Operator Tree:
+ TableScan
+ alias: s
+ Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: (s_store_sk is not null and (s_store_sk > 0)) (type: boolean)
+ Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: s_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: s_store_sk (type: int)
+ Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
+ TableScan
+ alias: ss
+ Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: (ss_store_sk is not null and (ss_store_sk > 0)) (type: boolean)
+ Statistics: Num rows: 321 Data size: 1236 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: ss_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: ss_store_sk (type: int)
+ Statistics: Num rows: 321 Data size: 1236 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Operator Tree:
+ Join Operator
+ condition map:
+ Inner Join 0 to 1
+ condition expressions:
+ 0 {KEY.reducesinkkey0}
+ 1
+ outputColumnNames: _col0
+ Statistics: Num rows: 107 Data size: 428 Basic stats: COMPLETE Column stats: COMPLETE
+ Select Operator
+ expressions: _col0 (type: int)
+ outputColumnNames: _col0
+ Statistics: Num rows: 107 Data size: 428 Basic stats: COMPLETE Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ Statistics: Num rows: 107 Data size: 428 Basic stats: COMPLETE Column stats: COMPLETE
+ table:
+ input format: org.apache.hadoop.mapred.TextInputFormat
+ output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+ Stage: Stage-0
+ Fetch Operator
+ limit: -1
+ Processor Tree:
+ ListSink
+
+PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_company_id > 0 and ss.ss_quantity > 10
+PREHOOK: type: QUERY
+POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_company_id > 0 and ss.ss_quantity > 10
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+ Stage: Stage-1
+ Map Reduce
+ Map Operator Tree:
+ TableScan
+ alias: s
+ Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: PARTIAL
+ Filter Operator
+ predicate: (s_store_sk is not null and (s_company_id > 0)) (type: boolean)
+ Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: PARTIAL
+ Reduce Output Operator
+ key expressions: s_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: s_store_sk (type: int)
+ Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: PARTIAL
+ TableScan
+ alias: ss
+ Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: (ss_store_sk is not null and (ss_quantity > 10)) (type: boolean)
+ Statistics: Num rows: 321 Data size: 2460 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: ss_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: ss_store_sk (type: int)
+ Statistics: Num rows: 321 Data size: 2460 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Operator Tree:
+ Join Operator
+ condition map:
+ Inner Join 0 to 1
+ condition expressions:
+ 0 {KEY.reducesinkkey0}
+ 1
+ outputColumnNames: _col0
+ Statistics: Num rows: 107 Data size: 428 Basic stats: COMPLETE Column stats: PARTIAL
+ Select Operator
+ expressions: _col0 (type: int)
+ outputColumnNames: _col0
+ Statistics: Num rows: 107 Data size: 428 Basic stats: COMPLETE Column stats: PARTIAL
+ File Output Operator
+ compressed: false
+ Statistics: Num rows: 107 Data size: 428 Basic stats: COMPLETE Column stats: PARTIAL
+ table:
+ input format: org.apache.hadoop.mapred.TextInputFormat
+ output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+ Stage: Stage-0
+ Fetch Operator
+ limit: -1
+ Processor Tree:
+ ListSink
+
+PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_floor_space > 0
+PREHOOK: type: QUERY
+POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where s.s_floor_space > 0
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+ Stage: Stage-1
+ Map Reduce
+ Map Operator Tree:
+ TableScan
+ alias: s
+ Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: (s_store_sk is not null and (s_floor_space > 0)) (type: boolean)
+ Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: s_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: s_store_sk (type: int)
+ Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
+ TableScan
+ alias: ss
+ Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: ss_store_sk is not null (type: boolean)
+ Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: ss_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: ss_store_sk (type: int)
+ Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Operator Tree:
+ Join Operator
+ condition map:
+ Inner Join 0 to 1
+ condition expressions:
+ 0 {KEY.reducesinkkey0}
+ 1
+ outputColumnNames: _col0
+ Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE
+ Select Operator
+ expressions: _col0 (type: int)
+ outputColumnNames: _col0
+ Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE
+ table:
+ input format: org.apache.hadoop.mapred.TextInputFormat
+ output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+ Stage: Stage-0
+ Fetch Operator
+ limit: -1
+ Processor Tree:
+ ListSink
+
+PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where ss.ss_quantity > 10
+PREHOOK: type: QUERY
+POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) where ss.ss_quantity > 10
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+ Stage: Stage-1
+ Map Reduce
+ Map Operator Tree:
+ TableScan
+ alias: s
+ Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: s_store_sk is not null (type: boolean)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: s_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: s_store_sk (type: int)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ TableScan
+ alias: ss
+ Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: (ss_store_sk is not null and (ss_quantity > 10)) (type: boolean)
+ Statistics: Num rows: 321 Data size: 2460 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: ss_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: ss_store_sk (type: int)
+ Statistics: Num rows: 321 Data size: 2460 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Operator Tree:
+ Join Operator
+ condition map:
+ Inner Join 0 to 1
+ condition expressions:
+ 0 {KEY.reducesinkkey0}
+ 1
+ outputColumnNames: _col0
+ Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE
+ Select Operator
+ expressions: _col0 (type: int)
+ outputColumnNames: _col0
+ Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE
+ table:
+ input format: org.apache.hadoop.mapred.TextInputFormat
+ output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+ Stage: Stage-0
+ Fetch Operator
+ limit: -1
+ Processor Tree:
+ ListSink
+
+PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk)
+PREHOOK: type: QUERY
+POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk)
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+ Stage: Stage-1
+ Map Reduce
+ Map Operator Tree:
+ TableScan
+ alias: s1
+ Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: s_store_sk is not null (type: boolean)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: s_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: s_store_sk (type: int)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ TableScan
+ alias: s
+ Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: s_store_sk is not null (type: boolean)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: s_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: s_store_sk (type: int)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ TableScan
+ alias: ss
+ Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: ss_store_sk is not null (type: boolean)
+ Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: ss_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: ss_store_sk (type: int)
+ Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Operator Tree:
+ Join Operator
+ condition map:
+ Inner Join 0 to 1
+ Inner Join 1 to 2
+ condition expressions:
+ 0 {KEY.reducesinkkey0}
+ 1
+ 2
+ outputColumnNames: _col0
+ Statistics: Num rows: 964 Data size: 3856 Basic stats: COMPLETE Column stats: COMPLETE
+ Select Operator
+ expressions: _col0 (type: int)
+ outputColumnNames: _col0
+ Statistics: Num rows: 964 Data size: 3856 Basic stats: COMPLETE Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ Statistics: Num rows: 964 Data size: 3856 Basic stats: COMPLETE Column stats: COMPLETE
+ table:
+ input format: org.apache.hadoop.mapred.TextInputFormat
+ output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+ Stage: Stage-0
+ Fetch Operator
+ limit: -1
+ Processor Tree:
+ ListSink
+
+PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where s.s_store_sk > 1000
+PREHOOK: type: QUERY
+POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where s.s_store_sk > 1000
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+ Stage: Stage-1
+ Map Reduce
+ Map Operator Tree:
+ TableScan
+ alias: s1
+ Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: (s_store_sk is not null and (s_store_sk > 1000)) (type: boolean)
+ Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: s_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: s_store_sk (type: int)
+ Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
+ TableScan
+ alias: s
+ Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: (s_store_sk is not null and (s_store_sk > 1000)) (type: boolean)
+ Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: s_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: s_store_sk (type: int)
+ Statistics: Num rows: 4 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
+ TableScan
+ alias: ss
+ Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: (ss_store_sk is not null and (ss_store_sk > 1000)) (type: boolean)
+ Statistics: Num rows: 321 Data size: 1236 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: ss_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: ss_store_sk (type: int)
+ Statistics: Num rows: 321 Data size: 1236 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Operator Tree:
+ Join Operator
+ condition map:
+ Inner Join 0 to 1
+ Inner Join 1 to 2
+ condition expressions:
+ 0 {KEY.reducesinkkey0}
+ 1
+ 2
+ outputColumnNames: _col0
+ Statistics: Num rows: 35 Data size: 140 Basic stats: COMPLETE Column stats: COMPLETE
+ Select Operator
+ expressions: _col0 (type: int)
+ outputColumnNames: _col0
+ Statistics: Num rows: 35 Data size: 140 Basic stats: COMPLETE Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ Statistics: Num rows: 35 Data size: 140 Basic stats: COMPLETE Column stats: COMPLETE
+ table:
+ input format: org.apache.hadoop.mapred.TextInputFormat
+ output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+ Stage: Stage-0
+ Fetch Operator
+ limit: -1
+ Processor Tree:
+ ListSink
+
+PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where s.s_floor_space > 1000
+PREHOOK: type: QUERY
+POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where s.s_floor_space > 1000
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+ Stage: Stage-1
+ Map Reduce
+ Map Operator Tree:
+ TableScan
+ alias: s1
+ Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: s_store_sk is not null (type: boolean)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: s_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: s_store_sk (type: int)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ TableScan
+ alias: s
+ Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: (s_store_sk is not null and (s_floor_space > 1000)) (type: boolean)
+ Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: s_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: s_store_sk (type: int)
+ Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
+ TableScan
+ alias: ss
+ Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: ss_store_sk is not null (type: boolean)
+ Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: ss_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: ss_store_sk (type: int)
+ Statistics: Num rows: 964 Data size: 3716 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Operator Tree:
+ Join Operator
+ condition map:
+ Inner Join 0 to 1
+ Inner Join 1 to 2
+ condition expressions:
+ 0 {KEY.reducesinkkey0}
+ 1
+ 2
+ outputColumnNames: _col0
+ Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE
+ Select Operator
+ expressions: _col0 (type: int)
+ outputColumnNames: _col0
+ Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE
+ table:
+ input format: org.apache.hadoop.mapred.TextInputFormat
+ output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+ Stage: Stage-0
+ Fetch Operator
+ limit: -1
+ Processor Tree:
+ ListSink
+
+PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where ss.ss_quantity > 10
+PREHOOK: type: QUERY
+POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join store s1 on (s1.s_store_sk = ss.ss_store_sk) where ss.ss_quantity > 10
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+ Stage-1 is a root stage
+ Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+ Stage: Stage-1
+ Map Reduce
+ Map Operator Tree:
+ TableScan
+ alias: s1
+ Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: s_store_sk is not null (type: boolean)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: s_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: s_store_sk (type: int)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ TableScan
+ alias: s
+ Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: s_store_sk is not null (type: boolean)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: s_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: s_store_sk (type: int)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ TableScan
+ alias: ss
+ Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: (ss_store_sk is not null and (ss_quantity > 10)) (type: boolean)
+ Statistics: Num rows: 321 Data size: 2460 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: ss_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: ss_store_sk (type: int)
+ Statistics: Num rows: 321 Data size: 2460 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Operator Tree:
+ Join Operator
+ condition map:
+ Inner Join 0 to 1
+ Inner Join 1 to 2
+ condition expressions:
+ 0 {KEY.reducesinkkey0}
+ 1
+ 2
+ outputColumnNames: _col0
+ Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE
+ Select Operator
+ expressions: _col0 (type: int)
+ outputColumnNames: _col0
+ Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ Statistics: Num rows: 321 Data size: 1284 Basic stats: COMPLETE Column stats: COMPLETE
+ table:
+ input format: org.apache.hadoop.mapred.TextInputFormat
+ output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+ Stage: Stage-0
+ Fetch Operator
+ limit: -1
+ Processor Tree:
+ ListSink
+
+PREHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join customer_address ca on (ca.ca_address_sk = ss.ss_addr_sk)
+PREHOOK: type: QUERY
+POSTHOOK: query: explain select s.s_store_sk from store s join store_sales ss on (s.s_store_sk = ss.ss_store_sk) join customer_address ca on (ca.ca_address_sk = ss.ss_addr_sk)
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+ Stage-2 is a root stage
+ Stage-1 depends on stages: Stage-2
+ Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+ Stage: Stage-2
+ Map Reduce
+ Map Operator Tree:
+ TableScan
+ alias: s
+ Statistics: Num rows: 12 Data size: 3143 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: s_store_sk is not null (type: boolean)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: s_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: s_store_sk (type: int)
+ Statistics: Num rows: 12 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE
+ TableScan
+ alias: ss
+ Statistics: Num rows: 1000 Data size: 130523 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: (ss_store_sk is not null and ss_addr_sk is not null) (type: boolean)
+ Statistics: Num rows: 916 Data size: 7012 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: ss_store_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: ss_store_sk (type: int)
+ Statistics: Num rows: 916 Data size: 7012 Basic stats: COMPLETE Column stats: COMPLETE
+ value expressions: ss_addr_sk (type: int)
+ Reduce Operator Tree:
+ Join Operator
+ condition map:
+ Inner Join 0 to 1
+ condition expressions:
+ 0 {KEY.reducesinkkey0}
+ 1 {VALUE._col6}
+ outputColumnNames: _col0, _col38
+ Statistics: Num rows: 916 Data size: 7328 Basic stats: COMPLETE Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ table:
+ input format: org.apache.hadoop.mapred.SequenceFileInputFormat
+ output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
+
+ Stage: Stage-1
+ Map Reduce
+ Map Operator Tree:
+ TableScan
+ alias: ca
+ Statistics: Num rows: 20 Data size: 2114 Basic stats: COMPLETE Column stats: COMPLETE
+ Filter Operator
+ predicate: ca_address_sk is not null (type: boolean)
+ Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE
+ Reduce Output Operator
+ key expressions: ca_address_sk (type: int)
+ sort order: +
+ Map-reduce partition columns: ca_address_sk (type: int)
+ Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE
+ TableScan
+ Reduce Output Operator
+ key expressions: _col38 (type: int)
+ sort order: +
+ Map-reduce partition columns: _col38 (type: int)
+ Statistics: Num rows: 916 Data size: 7328 Basic stats: COMPLETE Column stats: COMPLETE
+ value expressions: _col0 (type: int)
+ Reduce Operator Tree:
+ Join Operator
+ condition map:
+ Inner Join 0 to 1
+ condition expressions:
+ 0 {VALUE._col0}
+ 1
+ outputColumnNames: _col0
+ Statistics: Num rows: 916 Data size: 3664 Basic stats: COMPLETE Column stats: COMPLETE
+ Select Operator
+ expressions: _col0 (type: int)
+ outputColumnNames: _col0
+ Statistics: Num rows: 916 Data size: 3664 Basic stats: COMPLETE Column stats: COMPLETE
+ File Output Operator
+ compressed: false
+ Statistics: Num rows: 916 Data size: 3664 Basic stats: COMPLETE Column stats: COMPLETE
+ table:
+ input format: org.apache.hadoop.mapred.TextInputFormat
+ output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+ serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+ Stage: Stage-0
+ Fetch Operator
+ limit: -1
+ Processor Tree:
+ ListSink
+
+PREHOOK: query: drop table store_sales
+PREHOOK: type: DROPTABLE
+PREHOOK: Input: default@store_sales
+PREHOOK: Output: default@store_sales
+POSTHOOK: query: drop table store_sales
+POSTHOOK: type: DROPTABLE
+POSTHOOK: Input: default@store_sales
+POSTHOOK: Output: default@store_sales
+PREHOOK: query: drop table store
+PREHOOK: type: DROPTABLE
+PREHOOK: Input: default@store
+PREHOOK: Output: default@store
+POSTHOOK: query: drop table store
+POSTHOOK: type: DROPTABLE
+POSTHOOK: Input: default@store
+POSTHOOK: Output: default@store
+PREHOOK: query: drop table customer_address
+PREHOOK: type: DROPTABLE
+PREHOOK: Input: default@customer_address
+PREHOOK: Output: default@customer_address
+POSTHOOK: query: drop table customer_address
+POSTHOOK: type: DROPTABLE
+POSTHOOK: Input: default@customer_address
+POSTHOOK: Output: default@customer_address