You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ma...@apache.org on 2014/11/15 00:16:53 UTC
spark git commit: [SPARK-4365][SQL] Remove unnecessary filter call on
records returned from parquet library
Repository: spark
Updated Branches:
refs/heads/master f76b96837 -> 63ca3af66
[SPARK-4365][SQL] Remove unnecessary filter call on records returned from parquet library
Since parquet library has been updated , we no longer need to filter the records returned from parquet library for null records , as now the library skips those :
from parquet-hadoop/src/main/java/parquet/hadoop/InternalParquetRecordReader.java
public boolean nextKeyValue() throws IOException, InterruptedException {
boolean recordFound = false;
while (!recordFound) {
// no more records left
if (current >= total)
{ return false; }
try {
checkRead();
currentValue = recordReader.read();
current ++;
if (recordReader.shouldSkipCurrentRecord())
{
// this record is being filtered via the filter2 package
if (DEBUG) LOG.debug("skipping record");
continue;
}
if (currentValue == null)
{
// only happens with FilteredRecordReader at end of block current = totalCountLoadedSoFar;
if (DEBUG) LOG.debug("filtered record reader reached end of block");
continue;
}
recordFound = true;
if (DEBUG) LOG.debug("read value: " + currentValue);
} catch (RuntimeException e)
{ throw new ParquetDecodingException(format("Can not read value at %d in block %d in file %s", current, currentBlock, file), e); }
}
return true;
}
Author: Yash Datta <Ya...@guavus.com>
Closes #3229 from saucam/remove_filter and squashes the following commits:
8909ae9 [Yash Datta] SPARK-4365: Remove unnecessary filter call on records returned from parquet library
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/63ca3af6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/63ca3af6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/63ca3af6
Branch: refs/heads/master
Commit: 63ca3af66f9680fd12adee82fb4d342caae5cea4
Parents: f76b968
Author: Yash Datta <Ya...@guavus.com>
Authored: Fri Nov 14 15:16:36 2014 -0800
Committer: Michael Armbrust <mi...@databricks.com>
Committed: Fri Nov 14 15:16:40 2014 -0800
----------------------------------------------------------------------
.../org/apache/spark/sql/parquet/ParquetTableOperations.scala | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/63ca3af6/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
index 5f93279..f6bed50 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala
@@ -159,7 +159,7 @@ case class ParquetTableScan(
}
} else {
baseRDD.map(_._2)
- }.filter(_ != null) // Parquet's record filters may produce null values
+ }
}
/**
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org