You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by witgo <gi...@git.apache.org> on 2014/04/05 16:23:58 UTC
[GitHub] spark pull request: remove scalalogging-slf4j dependency
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/332
remove scalalogging-slf4j dependency
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark remove_scalalogging
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/332.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #332
----
commit eb93ee2e88cfe030da1eb2af01c5fb04ddb6d647
Author: witgo <wi...@qq.com>
Date: 2014-04-05T14:21:18Z
remove scalalogging-slf4j dependency
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1470][SPARK-1842] Use the scala-logging...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-50962790
QA results for PR 332:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17767/consoleFull
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:
https://github.com/apache/spark/pull/332#discussion_r11330987
--- Diff: core/src/main/scala/org/apache/spark/Logging.scala ---
@@ -30,65 +30,66 @@ trait Logging {
// Make the log field transient so that objects with Logging can
// be serialized and used on another machine
@transient private var log_ : Logger = null
+ @transient protected var logName_ : String = null
--- End diff --
Yes, I agree with your idea.Here method is better
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39694787
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39695940
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13835/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: remove scalalogging-slf4j dependency
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39656418
My thought is - if it's only a matter of saving us from having to write 10 `if` statements (which I don't think have changed or been modified in several years), it might not be worth having an extra dependency to track.
I'm guessing these if statements get optimized at runtime to the point where they are very cheap.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1470: Spark logger moving to use scala-l...
Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-40269717
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1470: Spark logger moving to use scala-l...
Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-40221509
How to let Jenkins to run the test?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: remove scalalogging-slf4j dependency
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39675896
Its not a matter of the if statements inside of spark logging code. Scala logging rewrites your log statements to put the if checks _inline_ with _every_ logging statement.
You write: `logger.debug(s"Log message $someExpensiveComputation")`
You get:
```
if(logger.debugEnabled) {
val logMsg = "Log message" + someExpensiveComputation()
logger.debug(logMessage)
}
```
Sparks code does something like this:
```
val logMessage = new Function0() { def call() = "Log message" + someExpensiveComputation() }
log.debug(logMessage)
```
The macros allow you to avoid both the function call, and possible gc overhead of allocating the closure in cases where the log does not fire.
While not huge, this does make a performance difference in practice:
```
std logging: 19885.48ms
spark logging 914.408ms
scala logging 729.779ms
```
(And this was only a micro benchmark with no GC pressure).
Given that this is library is a thin layer, endorsed by typesafe, that integrates nicely with our current logging framework (and is easily pluggable if we ever decide to change), and improves performance, I really think this PR is a step backwards. If we want consistency we should probably instead get rid of Sparks home grown logging code.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:
https://github.com/apache/spark/pull/332#discussion_r11331115
--- Diff: core/src/main/scala/org/apache/spark/Logging.scala ---
@@ -30,65 +30,66 @@ trait Logging {
// Make the log field transient so that objects with Logging can
// be serialized and used on another machine
@transient private var log_ : Logger = null
+ @transient protected var logName_ : String = null
// Method to get or create the logger for this object
protected def log: Logger = {
if (log_ == null) {
initializeIfNecessary()
- var className = this.getClass.getName
+ var className = if(logName_ == null) this.getClass.getName else logName_
// Ignore trailing $'s in the class names for Scala objects
if (className.endsWith("$")) {
className = className.substring(0, className.length - 1)
}
- log_ = LoggerFactory.getLogger(className)
+ log_ = Logger(LoggerFactory.getLogger(className))
}
log_
}
// Log methods that take only a String
protected def logInfo(msg: => String) {
- if (log.isInfoEnabled) log.info(msg)
+ log.info(msg)
}
protected def logDebug(msg: => String) {
- if (log.isDebugEnabled) log.debug(msg)
+ log.debug(msg)
--- End diff --
Sorry, this is my bad, I failed to see that the scala logging wrapper we're using does this for us. Apologies!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1470] remove scalalogging-slf4j depende...
Posted by witgo <gi...@git.apache.org>.
GitHub user witgo reopened a pull request:
https://github.com/apache/spark/pull/332
[SPARK-1470] remove scalalogging-slf4j dependency
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark remove_scalalogging
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/332.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #332
----
commit dd95abada78b4d0aec97dacda50fdfd74464b073
Author: Reynold Xin <rx...@apache.org>
Date: 2014-07-15T08:46:57Z
[SPARK-2399] Add support for LZ4 compression.
Based on Greg Bowyer's patch from JIRA https://issues.apache.org/jira/browse/SPARK-2399
Author: Reynold Xin <rx...@apache.org>
Closes #1416 from rxin/lz4 and squashes the following commits:
6c8fefe [Reynold Xin] Fixed typo.
8a14d38 [Reynold Xin] [SPARK-2399] Add support for LZ4 compression.
commit 52beb20f7904e0333198b9b14619366ddf53ab85
Author: DB Tsai <db...@alpinenow.com>
Date: 2014-07-15T09:14:58Z
[SPARK-2477][MLlib] Using appendBias for adding intercept in GeneralizedLinearAlgorithm
Instead of using prependOne currently in GeneralizedLinearAlgorithm, we would like to use appendBias for 1) keeping the indices of original training set unchanged by adding the intercept into the last element of vector and 2) using the same public API for consistently adding intercept.
Author: DB Tsai <db...@alpinenow.com>
Closes #1410 from dbtsai/SPARK-2477_intercept_with_appendBias and squashes the following commits:
011432c [DB Tsai] From Alpine Data Labs
commit 8f1d4226c285e33d2fb839d3163bb374eb6db0e7
Author: Reynold Xin <rx...@apache.org>
Date: 2014-07-15T09:15:29Z
Update README.md to include a slightly more informative project description.
(cherry picked from commit 401083be9f010f95110a819a49837ecae7d9c4ec)
Signed-off-by: Reynold Xin <rx...@apache.org>
commit 6555618c8f39b4e7da9402c3fd9da7a75bf7794e
Author: Reynold Xin <rx...@apache.org>
Date: 2014-07-15T09:20:01Z
README update: added "for Big Data".
commit 04b01bb101eeaf76c2e7c94c291669f0b2372c9a
Author: Alexander Ulanov <na...@yandex.ru>
Date: 2014-07-15T15:40:22Z
[MLLIB] [SPARK-2222] Add multiclass evaluation metrics
Adding two classes:
1) MulticlassMetrics implements various multiclass evaluation metrics
2) MulticlassMetricsSuite implements unit tests for MulticlassMetrics
Author: Alexander Ulanov <na...@yandex.ru>
Author: unknown <ul...@ULANOV1.emea.hpqcorp.net>
Author: Xiangrui Meng <me...@databricks.com>
Closes #1155 from avulanov/master and squashes the following commits:
2eae80f [Alexander Ulanov] Merge pull request #1 from mengxr/avulanov-master
5ebeb08 [Xiangrui Meng] minor updates
79c3555 [Alexander Ulanov] Addressing reviewers comments mengxr
0fa9511 [Alexander Ulanov] Addressing reviewers comments mengxr
f0dadc9 [Alexander Ulanov] Addressing reviewers comments mengxr
4811378 [Alexander Ulanov] Removing println
87fb11f [Alexander Ulanov] Addressing reviewers comments mengxr. Added confusion matrix
e3db569 [Alexander Ulanov] Addressing reviewers comments mengxr. Added true positive rate and false positive rate. Test suite code style.
a7e8bf0 [Alexander Ulanov] Addressing reviewers comments mengxr
c3a77ad [Alexander Ulanov] Addressing reviewers comments mengxr
e2c91c3 [Alexander Ulanov] Fixes to mutliclass metics
d5ce981 [unknown] Comments about Double
a5c8ba4 [unknown] Unit tests. Class rename
fcee82d [unknown] Unit tests. Class rename
d535d62 [unknown] Multiclass evaluation
commit cb09e93c1d7ef9c8f0a1abe4e659783c74993a4e
Author: William Benton <wi...@redhat.com>
Date: 2014-07-15T16:13:39Z
Reformat multi-line closure argument.
Author: William Benton <wi...@redhat.com>
Closes #1419 from willb/reformat-2486 and squashes the following commits:
2676231 [William Benton] Reformat multi-line closure argument.
commit 9dd635eb5df52835b3b7f4f2b9c789da9e813c71
Author: witgo <wi...@qq.com>
Date: 2014-07-15T17:46:17Z
SPARK-2480: Resolve sbt warnings "NOTE: SPARK_YARN is deprecated, please use -Pyarn flag"
Author: witgo <wi...@qq.com>
Closes #1404 from witgo/run-tests and squashes the following commits:
f703aee [witgo] fix Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
2944f51 [witgo] Remove "NOTE: SPARK_YARN is deprecated, please use -Pyarn flag"
ef59c70 [witgo] fix Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
6cefee5 [witgo] Remove "NOTE: SPARK_YARN is deprecated, please use -Pyarn flag"
commit 72ea56da8e383c61c6f18eeefef03b9af00f5158
Author: witgo <wi...@qq.com>
Date: 2014-07-15T18:52:56Z
SPARK-1291: Link the spark UI to RM ui in yarn-client mode
Author: witgo <wi...@qq.com>
Closes #1112 from witgo/SPARK-1291 and squashes the following commits:
6022bcd [witgo] review commit
1fbb925 [witgo] add addAmIpFilter to yarn alpha
210299c [witgo] review commit
1b92a07 [witgo] review commit
6896586 [witgo] Add comments to addWebUIFilter
3e9630b [witgo] review commit
142ee29 [witgo] review commit
1fe7710 [witgo] Link the spark UI to RM ui in yarn-client mode
commit e7ec815d9a2b0f89a56dc7dd3106c31a09492028
Author: Reynold Xin <rx...@apache.org>
Date: 2014-07-15T20:13:33Z
Added LZ4 to compression codec in configuration page.
Author: Reynold Xin <rx...@apache.org>
Closes #1417 from rxin/lz4 and squashes the following commits:
472f6a1 [Reynold Xin] Set the proper default.
9cf0b2f [Reynold Xin] Added LZ4 to compression codec in configuration page.
commit a21f9a7543309320bb2791468243c8f10bc6e81b
Author: Xiangrui Meng <me...@databricks.com>
Date: 2014-07-15T21:00:54Z
[SPARK-2471] remove runtime scope for jets3t
The assembly jar (built by sbt) doesn't include jets3t if we set it to runtime only, but I don't know whether it was set this way for a particular reason.
CC: srowen ScrapCodes
Author: Xiangrui Meng <me...@databricks.com>
Closes #1402 from mengxr/jets3t and squashes the following commits:
bfa2d17 [Xiangrui Meng] remove runtime scope for jets3t
commit 0f98ef1a2c9ecf328f6c5918808fa5ca486e8afd
Author: Michael Armbrust <mi...@databricks.com>
Date: 2014-07-15T21:01:48Z
[SPARK-2483][SQL] Fix parsing of repeated, nested data access.
Author: Michael Armbrust <mi...@databricks.com>
Closes #1411 from marmbrus/nestedRepeated and squashes the following commits:
044fa09 [Michael Armbrust] Fix parsing of repeated, nested data access.
commit bcd0c30c7eea4c50301cb732c733fdf4d4142060
Author: Michael Armbrust <mi...@databricks.com>
Date: 2014-07-15T21:04:01Z
[SQL] Whitelist more Hive tests.
Author: Michael Armbrust <mi...@databricks.com>
Closes #1396 from marmbrus/moreTests and squashes the following commits:
6660b60 [Michael Armbrust] Blacklist a test that requires DFS command.
8b6001c [Michael Armbrust] Add golden files.
ccd8f97 [Michael Armbrust] Whitelist more tests.
commit 8af46d58464b96471825ce376c3e11c8b1108c0e
Author: Yin Huai <hu...@cse.ohio-state.edu>
Date: 2014-07-15T21:06:45Z
[SPARK-2474][SQL] For a registered table in OverrideCatalog, the Analyzer failed to resolve references in the format of "tableName.fieldName"
Please refer to JIRA (https://issues.apache.org/jira/browse/SPARK-2474) for how to reproduce the problem and my understanding of the root cause.
Author: Yin Huai <hu...@cse.ohio-state.edu>
Closes #1406 from yhuai/SPARK-2474 and squashes the following commits:
96b1627 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2474
af36d65 [Yin Huai] Fix comment.
be86ba9 [Yin Huai] Correct SQL console settings.
c43ad00 [Yin Huai] Wrap the relation in a Subquery named by the table name in OverrideCatalog.lookupRelation.
a5c2145 [Yin Huai] Support sql/console.
commit 61de65bc69f9a5fc396b76713193c6415436d452
Author: William Benton <wi...@redhat.com>
Date: 2014-07-15T21:11:57Z
SPARK-2407: Added internal implementation of SQL SUBSTR()
This replaces the Hive UDF for SUBSTR(ING) with an implementation in Catalyst
and adds tests to verify correct operation.
Author: William Benton <wi...@redhat.com>
Closes #1359 from willb/internalSqlSubstring and squashes the following commits:
ccedc47 [William Benton] Fixed too-long line.
a30a037 [William Benton] replace view bounds with implicit parameters
ec35c80 [William Benton] Adds fixes from review:
4f3bfdb [William Benton] Added internal implementation of SQL SUBSTR()
commit 502f90782ad474e2630ed5be4d3c4be7dab09c34
Author: Michael Armbrust <mi...@databricks.com>
Date: 2014-07-16T00:56:17Z
[SQL] Attribute equality comparisons should be done by exprId.
Author: Michael Armbrust <mi...@databricks.com>
Closes #1414 from marmbrus/exprIdResolution and squashes the following commits:
97b47bc [Michael Armbrust] Attribute equality comparisons should be done by exprId.
commit c2048a5165b270f5baf2003fdfef7bc6c5875715
Author: Zongheng Yang <zo...@gmail.com>
Date: 2014-07-16T00:58:28Z
[SPARK-2498] [SQL] Synchronize on a lock when using scala reflection inside data type objects.
JIRA ticket: https://issues.apache.org/jira/browse/SPARK-2498
Author: Zongheng Yang <zo...@gmail.com>
Closes #1423 from concretevitamin/scala-ref-catalyst and squashes the following commits:
325a149 [Zongheng Yang] Synchronize on a lock when initializing data type objects in Catalyst.
commit 4576d80a5155c9fbfebe9c36cca06c208bca5bd3
Author: Reynold Xin <rx...@apache.org>
Date: 2014-07-16T01:47:39Z
[SPARK-2469] Use Snappy (instead of LZF) for default shuffle compression codec
This reduces shuffle compression memory usage by 3x.
Author: Reynold Xin <rx...@apache.org>
Closes #1415 from rxin/snappy and squashes the following commits:
06c1a01 [Reynold Xin] SPARK-2469: Use Snappy (instead of LZF) for default shuffle compression codec.
commit 9c12de5092312319aa22f24df47a6de0e41a0102
Author: Henry Saputra <he...@gmail.com>
Date: 2014-07-16T04:21:52Z
[SPARK-2500] Move the logInfo for registering BlockManager to BlockManagerMasterActor.register method
PR for SPARK-2500
Move the logInfo call for BlockManager to BlockManagerMasterActor.register instead of BlockManagerInfo constructor.
Previously the loginfo call for registering the registering a BlockManager is happening in the BlockManagerInfo constructor. This kind of confusing because the code could call "new BlockManagerInfo" without actually registering a BlockManager and could confuse when reading the log files.
Author: Henry Saputra <he...@gmail.com>
Closes #1424 from hsaputra/move_registerblockmanager_log_to_registration_method and squashes the following commits:
3370b4a [Henry Saputra] Move the loginfo for BlockManager to BlockManagerMasterActor.register instead of BlockManagerInfo constructor.
commit 563acf5edfbfb2fa756a1f0accde0940592663e9
Author: Ken Takagiwa <ke...@kens-macbook-pro.local>
Date: 2014-07-16T04:34:05Z
follow pep8 None should be compared using is or is not
http://legacy.python.org/dev/peps/pep-0008/
## Programming Recommendations
- Comparisons to singletons like None should always be done with is or is not, never the equality operators.
Author: Ken Takagiwa <ke...@Kens-MacBook-Pro.local>
Closes #1422 from giwa/apache_master and squashes the following commits:
7b361f3 [Ken Takagiwa] follow pep8 None should be checked using is or is not
commit 90ca532a0fd95dc85cff8c5722d371e8368b2687
Author: Aaron Staple <aa...@gmail.com>
Date: 2014-07-16T04:35:36Z
[SPARK-2314][SQL] Override collect and take in JavaSchemaRDD, forwarding to SchemaRDD implementations.
Author: Aaron Staple <aa...@gmail.com>
Closes #1421 from staple/SPARK-2314 and squashes the following commits:
73e04dc [Aaron Staple] [SPARK-2314] Override collect and take in JavaSchemaRDD, forwarding to SchemaRDD implementations.
commit 9b38b7c71352bb5e6d359515111ad9ca33299127
Author: Takuya UESHIN <ue...@happy-camper.st>
Date: 2014-07-16T05:35:34Z
[SPARK-2509][SQL] Add optimization for Substring.
`Substring` including `null` literal cases could be added to `NullPropagation`.
Author: Takuya UESHIN <ue...@happy-camper.st>
Closes #1428 from ueshin/issues/SPARK-2509 and squashes the following commits:
d9eb85f [Takuya UESHIN] Add Substring cases to NullPropagation.
commit 632fb3d9a9ebb3d2218385403145d5b89c41c025
Author: Takuya UESHIN <ue...@happy-camper.st>
Date: 2014-07-16T05:43:48Z
[SPARK-2504][SQL] Fix nullability of Substring expression.
This is a follow-up of #1359 with nullability narrowing.
Author: Takuya UESHIN <ue...@happy-camper.st>
Closes #1426 from ueshin/issues/SPARK-2504 and squashes the following commits:
5157832 [Takuya UESHIN] Remove unnecessary white spaces.
80958ac [Takuya UESHIN] Fix nullability of Substring expression.
commit efc452a16322e8b20b3c4fe1d6847315f928cd2d
Author: Cheng Lian <li...@gmail.com>
Date: 2014-07-16T16:44:51Z
[SPARK-2119][SQL] Improved Parquet performance when reading off S3
JIRA issue: [SPARK-2119](https://issues.apache.org/jira/browse/SPARK-2119)
Essentially this PR fixed three issues to gain much better performance when reading large Parquet file off S3.
1. When reading the schema, fetching Parquet metadata from a part-file rather than the `_metadata` file
The `_metadata` file contains metadata of all row groups, and can be very large if there are many row groups. Since schema information and row group metadata are coupled within a single Thrift object, we have to read the whole `_metadata` to fetch the schema. On the other hand, schema is replicated among footers of all part-files, which are fairly small.
1. Only add the root directory of the Parquet file rather than all the part-files to input paths
HDFS API can automatically filter out all hidden files and underscore files (`_SUCCESS` & `_metadata`), there's no need to filter out all part-files and add them individually to input paths. What make it much worse is that, `FileInputFormat.listStatus()` calls `FileSystem.globStatus()` on each individual input path sequentially, each results a blocking remote S3 HTTP request.
1. Worked around [PARQUET-16](https://issues.apache.org/jira/browse/PARQUET-16)
Essentially PARQUET-16 is similar to the above issue, and results lots of sequential `FileSystem.getFileStatus()` calls, which are further translated into a bunch of remote S3 HTTP requests.
`FilteringParquetRowInputFormat` should be cleaned up once PARQUET-16 is fixed.
Below is the micro benchmark result. The dataset used is a S3 Parquet file consists of 3,793 partitions, about 110MB per partition in average. The benchmark is done with a 9-node AWS cluster.
- Creating a Parquet `SchemaRDD` (Parquet schema is fetched)
```scala
val tweets = parquetFile(uri)
```
- Before: 17.80s
- After: 8.61s
- Fetching partition information
```scala
tweets.getPartitions
```
- Before: 700.87s
- After: 21.47s
- Counting the whole file (both steps above are executed altogether)
```scala
parquetFile(uri).count()
```
- Before: ??? (haven't test yet)
- After: 53.26s
Author: Cheng Lian <li...@gmail.com>
Closes #1370 from liancheng/faster-parquet and squashes the following commits:
94a2821 [Cheng Lian] Added comments about schema consistency
d2c4417 [Cheng Lian] Worked around PARQUET-16 to improve Parquet performance
1c0d1b9 [Cheng Lian] Accelerated Parquet schema retrieving
5bd3d29 [Cheng Lian] Fixed Parquet log level
commit 33e64ecacbc44567f9cba2644a30a118653ea5fa
Author: Rui Li <ru...@intel.com>
Date: 2014-07-16T17:23:37Z
SPARK-2277: make TaskScheduler track hosts on rack
Hi mateiz, I've created [SPARK-2277](https://issues.apache.org/jira/browse/SPARK-2277) to make TaskScheduler track hosts on each rack. Please help to review, thanks.
Author: Rui Li <ru...@intel.com>
Closes #1212 from lirui-intel/trackHostOnRack and squashes the following commits:
2b4bd0f [Rui Li] SPARK-2277: refine UT
fbde838 [Rui Li] SPARK-2277: add UT
7bbe658 [Rui Li] SPARK-2277: rename the method
5e4ef62 [Rui Li] SPARK-2277: remove unnecessary import
79ac750 [Rui Li] SPARK-2277: make TaskScheduler track hosts on rack
commit efe2a8b1262a371471f52ca7d47dc34789e80558
Author: Reynold Xin <rx...@apache.org>
Date: 2014-07-16T17:44:54Z
Tightening visibility for various Broadcast related classes.
In preparation for SPARK-2521.
Author: Reynold Xin <rx...@apache.org>
Closes #1438 from rxin/broadcast and squashes the following commits:
432f1cc [Reynold Xin] Tightening visibility for various Broadcast related classes.
commit df95d82da7c76c074fd4064f7c870d55d99e0d8e
Author: Yin Huai <hu...@cse.ohio-state.edu>
Date: 2014-07-16T17:53:59Z
[SPARK-2525][SQL] Remove as many compilation warning messages as possible in Spark SQL
JIRA: https://issues.apache.org/jira/browse/SPARK-2525.
Author: Yin Huai <hu...@cse.ohio-state.edu>
Closes #1444 from yhuai/SPARK-2517 and squashes the following commits:
edbac3f [Yin Huai] Removed some compiler type erasure warnings.
commit 1c5739f68510c2336bf6cb3e18aea03d85988bfb
Author: Reynold Xin <rx...@apache.org>
Date: 2014-07-16T17:55:47Z
[SQL] Cleaned up ConstantFolding slightly.
Moved couple rules out of NullPropagation and added more comments.
Author: Reynold Xin <rx...@apache.org>
Closes #1430 from rxin/sql-folding-rule and squashes the following commits:
7f9a197 [Reynold Xin] Updated documentation for ConstantFolding.
7f8cf61 [Reynold Xin] [SQL] Cleaned up ConstantFolding slightly.
commit fc7edc9e76f97b25e456ae7b72ef8636656f4f1a
Author: Sandy Ryza <sa...@cloudera.com>
Date: 2014-07-16T18:07:16Z
SPARK-2519. Eliminate pattern-matching on Tuple2 in performance-critical...
... aggregation code
Author: Sandy Ryza <sa...@cloudera.com>
Closes #1435 from sryza/sandy-spark-2519 and squashes the following commits:
640706a [Sandy Ryza] SPARK-2519. Eliminate pattern-matching on Tuple2 in performance-critical aggregation code
commit cc965eea510397642830acb21f61127b68c098d6
Author: Takuya UESHIN <ue...@happy-camper.st>
Date: 2014-07-16T18:13:38Z
[SPARK-2518][SQL] Fix foldability of Substring expression.
This is a follow-up of #1428.
Author: Takuya UESHIN <ue...@happy-camper.st>
Closes #1432 from ueshin/issues/SPARK-2518 and squashes the following commits:
37d1ace [Takuya UESHIN] Fix foldability of Substring expression.
commit ef48222c10be3d29a83dfc2329f455eba203cd38
Author: Reynold Xin <rx...@apache.org>
Date: 2014-07-16T18:15:07Z
[SPARK-2517] Remove some compiler warnings.
Author: Reynold Xin <rx...@apache.org>
Closes #1433 from rxin/compile-warning and squashes the following commits:
8d0b890 [Reynold Xin] Remove some compiler warnings.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-1470] Use the scala-logging wrapper ins...
Posted by witgo <gi...@git.apache.org>.
Github user witgo closed the pull request at:
https://github.com/apache/spark/pull/332
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:
https://github.com/apache/spark/pull/332#discussion_r11331064
--- Diff: core/src/main/scala/org/apache/spark/Logging.scala ---
@@ -30,65 +30,66 @@ trait Logging {
// Make the log field transient so that objects with Logging can
// be serialized and used on another machine
@transient private var log_ : Logger = null
+ @transient protected var logName_ : String = null
// Method to get or create the logger for this object
protected def log: Logger = {
if (log_ == null) {
initializeIfNecessary()
- var className = this.getClass.getName
+ var className = if(logName_ == null) this.getClass.getName else logName_
// Ignore trailing $'s in the class names for Scala objects
if (className.endsWith("$")) {
className = className.substring(0, className.length - 1)
}
- log_ = LoggerFactory.getLogger(className)
+ log_ = Logger(LoggerFactory.getLogger(className))
}
log_
}
// Log methods that take only a String
protected def logInfo(msg: => String) {
- if (log.isInfoEnabled) log.info(msg)
+ log.info(msg)
}
protected def logDebug(msg: => String) {
- if (log.isDebugEnabled) log.debug(msg)
+ log.debug(msg)
--- End diff --
This could cause a significant performance hit. Previously, we would not evaluate the "msg" function unless the specified level was enabled, which allows us to avoid string building and calling various toString methods when the log messages won't be seen. Can we re-add these checks using `log.underlying`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39700055
Is performance really a problem here? I don't think we log stuff in critical path (we probably shouldn't). If we do, maybe we can just get rid of those logging.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: remove scalalogging-slf4j dependency
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39655093
Scala logging is not a new logger. This is just a set of macro functions that call sfl4j, wrapping the logging calls in checks to see if they are enabled. This lets you perform logging without the performance overhead of doing string concatenation when the log doesn't, allocating a closure (like current spark logging does), or having to manually write out the if checks yourself.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1470][SPARK-1842] Use the scala-logging...
Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-50961153
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: SPARK-1470: Spark logger moving to use scala-l...
Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-40863767
@marmbrus
You are right ,macros is faster than the inline method. There's a short gap.
the [test code](https://github.com/witgo/spark/blob/logger/core/src/test/scala/org/apache/spark/LoggingSuite.scala)
the results =>
```
1
logString: 38987.745 ms
logInline: 14.198 ms
logFunction0: 16.22 ms
logMacros: 12.312 ms
2
logString: 39037.412 ms
logInline: 14.349 ms
logFunction0: 16.2 ms
logMacros: 12.249 ms
3
logString: 39169.036 ms
logInline: 14.324 ms
logFunction0: 16.871 ms
logMacros: 11.324 ms
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1470: Spark logger moving to use scala-l...
Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-40626222
I don't think macros is faster than the inline method
```scala
import scala.compat.Platform
class Log(val flag: Boolean) {
def logFunction0(msg: => String) {
if (flag)
println(msg)
}
def logString(msg: String) {
if (flag)
println(msg)
}
@inline
def logInline(msg: => String) {
if (flag)
println(msg)
}
}
val log = new Log(false)
var startTime = Platform.currentTime
var i = 0
while (i < 1000000) {
log.logInline("" + Thread.currentThread.getStackTrace().
toSeq.take(10).mkString("\\n"))
i += 1
}
var stopTime = Platform.currentTime
Platform.collectGarbage
println("logInline: " + (stopTime - startTime) + " ms")
startTime = Platform.currentTime
i = 0
while (i < 1000000) {
log.logFunction0("" + Thread.currentThread.getStackTrace().
toSeq.take(10).mkString("\\n"))
i += 1
}
stopTime = Platform.currentTime
Platform.collectGarbage
println("logFunction0: " + (stopTime - startTime) + " ms")
startTime = Platform.currentTime
i = 0
while (i < 1000000) {
log.logString("" + Thread.currentThread.getStackTrace().
toSeq.take(10).mkString("\\n"))
i += 1
}
stopTime = Platform.currentTime
Platform.collectGarbage
println("logString: " + (stopTime - startTime) + " ms")
````
=>
1.
logInline: 9 ms
logFunction0: 14 ms
logString: 29114 ms
2.
logInline: 10 ms
logFunction0: 13 ms
logString: 28876 ms
3.
logInline: 9 ms
logFunction0: 13 ms
logString: 28521 ms
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1470: Spark logger moving to use scala-l...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-40646342
It doesn't appear that your code is actually using macros, so I'm not sure what the comparison is here. I'd also caution against reading to much into simple micro benchmarks. Its entirely possible that the JVM is statically eliminating the if statement or making other simplifications when it JITs this code, so you might just be measuring the time to add an integer. Also, for benchmarks you should [use nanoTime not currentTime](http://stackoverflow.com/questions/351565/system-currenttimemillis-vs-system-nanotime).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: remove scalalogging-slf4j dependency
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39652335
/cc @marumbrus - do you have strong attachments to scala logging? For the short term I do agree consolidating on Spark's existing logging would be nice. In particular, downstream consumers of Spark's logging subsystem do crazy stuff like routing to different sfl4j backends or inserting other log bridges and interceptors (e.g. http://apache-spark-developers-list.1001551.n3.nabble.com/0-9-0-forces-log4j-usage-td532.html)
Adding another logger into the mix seems like it could be risky. Or at least, it might be nice to merge this now and then separately decide if we want Spark's overall logging system to migrate to scalalogging.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1470: Spark logger moving to use scala-l...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-40295844
We should wait until questions like these are answered before we move to scala-logging.
https://github.com/typesafehub/scala-logging/issues/4
https://github.com/typesafehub/scalalogging/issues/42
At the very least, when we decide to move, we should make sure we can switch off it easily in case we found (blocker level) problems with it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: remove scalalogging-slf4j dependency
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39677846
@witgo could you take a crack at doing that? I think it would be a fairly small change to `Logging.scala` where we just use the scala-logging wrapper instead of the directly sfl4j api
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39701040
I did not find the call that affect performance
It is possible that here:
Spark Catalyst `logger.debug` is called many times
May be like this
[RuleExecutor.scala#L64](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala#L64)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/332#discussion_r11330919
--- Diff: core/src/main/scala/org/apache/spark/Logging.scala ---
@@ -30,65 +30,66 @@ trait Logging {
// Make the log field transient so that objects with Logging can
// be serialized and used on another machine
@transient private var log_ : Logger = null
+ @transient protected var logName_ : String = null
--- End diff --
It might be cleaner to do this:
```
protected def logName = {
var className = this.getClass.getName
// Ignore trailing $'s in the class names for Scala objects
if (className.endsWith("$")) {
className = className.substring(0, className.length - 1)
}
className
}
```
Then let subclasses override that function. Then below you could just have:
```
log_ = Logger(LoggerFactory.getLogger(logName))
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:
https://github.com/apache/spark/pull/332#discussion_r11331098
--- Diff: project/SparkBuild.scala ---
@@ -315,36 +315,37 @@ object SparkBuild extends Build {
def coreSettings = sharedSettings ++ Seq(
name := "spark-core",
libraryDependencies ++= Seq(
- "com.google.guava" % "guava" % "14.0.1",
- "com.google.code.findbugs" % "jsr305" % "1.3.9",
- "log4j" % "log4j" % "1.2.17",
- "org.slf4j" % "slf4j-api" % slf4jVersion,
- "org.slf4j" % "slf4j-log4j12" % slf4jVersion,
- "org.slf4j" % "jul-to-slf4j" % slf4jVersion,
- "org.slf4j" % "jcl-over-slf4j" % slf4jVersion,
- "commons-daemon" % "commons-daemon" % "1.0.10", // workaround for bug HADOOP-9407
- "com.ning" % "compress-lzf" % "1.0.0",
- "org.xerial.snappy" % "snappy-java" % "1.0.5",
- "org.spark-project.akka" %% "akka-remote" % akkaVersion excludeAll(excludeNetty),
- "org.spark-project.akka" %% "akka-slf4j" % akkaVersion excludeAll(excludeNetty),
- "org.spark-project.akka" %% "akka-testkit" % akkaVersion % "test",
- "org.json4s" %% "json4s-jackson" % "3.2.6" excludeAll(excludeScalap),
- "it.unimi.dsi" % "fastutil" % "6.4.4",
- "colt" % "colt" % "1.2.0",
- "org.apache.mesos" % "mesos" % "0.13.0",
- "commons-net" % "commons-net" % "2.2",
- "net.java.dev.jets3t" % "jets3t" % "0.7.1" excludeAll(excludeCommonsLogging),
- "org.apache.derby" % "derby" % "10.4.2.0" % "test",
- "org.apache.hadoop" % hadoopClient % hadoopVersion excludeAll(excludeNetty, excludeAsm, excludeCommonsLogging, excludeSLF4J, excludeOldAsm),
- "org.apache.curator" % "curator-recipes" % "2.4.0" excludeAll(excludeNetty),
- "com.codahale.metrics" % "metrics-core" % codahaleMetricsVersion,
- "com.codahale.metrics" % "metrics-jvm" % codahaleMetricsVersion,
- "com.codahale.metrics" % "metrics-json" % codahaleMetricsVersion,
- "com.codahale.metrics" % "metrics-graphite" % codahaleMetricsVersion,
- "com.twitter" %% "chill" % chillVersion excludeAll(excludeAsm),
- "com.twitter" % "chill-java" % chillVersion excludeAll(excludeAsm),
- "org.tachyonproject" % "tachyon" % "0.4.1-thrift" excludeAll(excludeHadoop, excludeCurator, excludeEclipseJetty, excludePowermock),
- "com.clearspring.analytics" % "stream" % "2.5.1"
+ "com.google.guava" % "guava" % "14.0.1",
+ "com.google.code.findbugs" % "jsr305" % "1.3.9",
+ "log4j" % "log4j" % "1.2.17",
+ "org.slf4j" % "slf4j-api" % slf4jVersion,
+ "org.slf4j" % "slf4j-log4j12" % slf4jVersion,
+ "org.slf4j" % "jul-to-slf4j" % slf4jVersion,
+ "org.slf4j" % "jcl-over-slf4j" % slf4jVersion,
+ "com.typesafe" %% "scalalogging-slf4j" % "1.0.1",
--- End diff --
I don't think it's really worth the diff changes to re-align the %s for this longer one, but if you're intent on keeping it we should follow the style -- remove one space before the %%
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39694872
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: remove scalalogging-slf4j dependency
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39645375
@witgo why do you want to remove this? I agree it would be good to consolidate the logging infrastructure (e.g. not use different logging for catalyst)... but is there a specific issue with scala logging"
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39695999
@marmbrus As it stands, this patch seems to give us both worlds (unnecessarily): we wrap the logging statements in a closure and then use the scala logging macro to determine if we should unwrap. However, if we were to simply make the logInfo() arguments into plain Strings, wouldn't we also lose the benefits of the Scala macro, since the string has already been materialized as the argument of logInfo? Would adding an `@inline` annotation avoid this or would we have to make our log* functions into macros themselves to avoid this overhead while keeping the name?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39695939
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1470][SPARK-1842] Use the scala-logging...
Posted by witgo <gi...@git.apache.org>.
Github user witgo closed the pull request at:
https://github.com/apache/spark/pull/332
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: SPARK-1470: Spark logger moving to use scala-l...
Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-40625682
I don't think macros is faster than the inline method
```scala
import scala.compat.Platform
import language.experimental.macros
import scala.reflect.macros.Context
class Log(val flag: Boolean) {
def logFunction0(msg: => String) {
if (flag)
println(msg)
}
def logString(msg: String) {
if (flag)
println(msg)
}
@inline
def logInline(msg: => String) {
if (flag)
println(msg)
}
}
val log = new Log(false)
var startTime = Platform.currentTime
var i = 0
while (i < 1000000) {
log.logInline("" + Thread.currentThread.getStackTrace().
toSeq.take(10).mkString("\\n"))
i += 1
}
var stopTime = Platform.currentTime
Platform.collectGarbage
println("logInline: " + (stopTime - startTime) + " ms")
startTime = Platform.currentTime
i = 0
while (i < 1000000) {
log.logFunction0("" + Thread.currentThread.getStackTrace().
toSeq.take(10).mkString("\\n"))
i += 1
}
stopTime = Platform.currentTime
Platform.collectGarbage
println("logFunction0: " + (stopTime - startTime) + " ms")
startTime = Platform.currentTime
i = 0
while (i < 1000000) {
log.logString("" + Thread.currentThread.getStackTrace().
toSeq.take(10).mkString("\\n"))
i += 1
}
stopTime = Platform.currentTime
Platform.collectGarbage
println("logString: " + (stopTime - startTime) + " ms")
````
=>
1.
logInline: 9 ms
logFunction0: 14 ms
logString: 29114 ms
2.
logInline: 10 ms
logFunction0: 13 ms
logString: 28876 ms
3.
logInline: 9 ms
logFunction0: 13 ms
logString: 28521 ms
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/332#discussion_r11330926
--- Diff: core/src/main/scala/org/apache/spark/Logging.scala ---
@@ -30,65 +30,66 @@ trait Logging {
// Make the log field transient so that objects with Logging can
// be serialized and used on another machine
@transient private var log_ : Logger = null
+ @transient protected var logName_ : String = null
// Method to get or create the logger for this object
protected def log: Logger = {
if (log_ == null) {
initializeIfNecessary()
- var className = this.getClass.getName
+ var className = if(logName_ == null) this.getClass.getName else logName_
// Ignore trailing $'s in the class names for Scala objects
if (className.endsWith("$")) {
--- End diff --
This is a bit confusing now because you have this thing called `className` but it may-or-may-not refer to a class. See the note above.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1470] Spark logger moving to use scala-...
Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:
https://github.com/apache/spark/pull/332#discussion_r12509814
--- Diff: project/SparkBuild.scala ---
@@ -317,6 +317,7 @@ object SparkBuild extends Build {
val excludeFastutil = ExclusionRule(organization = "it.unimi.dsi")
val excludeJruby = ExclusionRule(organization = "org.jruby")
val excludeThrift = ExclusionRule(organization = "org.apache.thrift")
+ val excludeScalalogging= ExclusionRule(organization = "com.typesafe", artifact = "scalalogging-slf4j")
--- End diff --
nit: space before =. Also, does this exclusion rule work despite having only an approximate artifact id?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1470] Spark logger moving to use scala-...
Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-42771828
@pwendell @marmbrus @rxin
typesafehub/scala-logging#4 has been solved by typesafehub/scala-logging#15
The PR should be able to merge into master master
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:
https://github.com/apache/spark/pull/332#discussion_r11331104
--- Diff: project/SparkBuild.scala ---
@@ -315,36 +315,37 @@ object SparkBuild extends Build {
def coreSettings = sharedSettings ++ Seq(
name := "spark-core",
libraryDependencies ++= Seq(
- "com.google.guava" % "guava" % "14.0.1",
- "com.google.code.findbugs" % "jsr305" % "1.3.9",
- "log4j" % "log4j" % "1.2.17",
- "org.slf4j" % "slf4j-api" % slf4jVersion,
- "org.slf4j" % "slf4j-log4j12" % slf4jVersion,
- "org.slf4j" % "jul-to-slf4j" % slf4jVersion,
- "org.slf4j" % "jcl-over-slf4j" % slf4jVersion,
- "commons-daemon" % "commons-daemon" % "1.0.10", // workaround for bug HADOOP-9407
- "com.ning" % "compress-lzf" % "1.0.0",
- "org.xerial.snappy" % "snappy-java" % "1.0.5",
- "org.spark-project.akka" %% "akka-remote" % akkaVersion excludeAll(excludeNetty),
- "org.spark-project.akka" %% "akka-slf4j" % akkaVersion excludeAll(excludeNetty),
- "org.spark-project.akka" %% "akka-testkit" % akkaVersion % "test",
- "org.json4s" %% "json4s-jackson" % "3.2.6" excludeAll(excludeScalap),
- "it.unimi.dsi" % "fastutil" % "6.4.4",
- "colt" % "colt" % "1.2.0",
- "org.apache.mesos" % "mesos" % "0.13.0",
- "commons-net" % "commons-net" % "2.2",
- "net.java.dev.jets3t" % "jets3t" % "0.7.1" excludeAll(excludeCommonsLogging),
- "org.apache.derby" % "derby" % "10.4.2.0" % "test",
- "org.apache.hadoop" % hadoopClient % hadoopVersion excludeAll(excludeNetty, excludeAsm, excludeCommonsLogging, excludeSLF4J, excludeOldAsm),
- "org.apache.curator" % "curator-recipes" % "2.4.0" excludeAll(excludeNetty),
- "com.codahale.metrics" % "metrics-core" % codahaleMetricsVersion,
- "com.codahale.metrics" % "metrics-jvm" % codahaleMetricsVersion,
- "com.codahale.metrics" % "metrics-json" % codahaleMetricsVersion,
- "com.codahale.metrics" % "metrics-graphite" % codahaleMetricsVersion,
- "com.twitter" %% "chill" % chillVersion excludeAll(excludeAsm),
- "com.twitter" % "chill-java" % chillVersion excludeAll(excludeAsm),
- "org.tachyonproject" % "tachyon" % "0.4.1-thrift" excludeAll(excludeHadoop, excludeCurator, excludeEclipseJetty, excludePowermock),
- "com.clearspring.analytics" % "stream" % "2.5.1"
+ "com.google.guava" % "guava" % "14.0.1",
+ "com.google.code.findbugs" % "jsr305" % "1.3.9",
+ "log4j" % "log4j" % "1.2.17",
+ "org.slf4j" % "slf4j-api" % slf4jVersion,
+ "org.slf4j" % "slf4j-log4j12" % slf4jVersion,
+ "org.slf4j" % "jul-to-slf4j" % slf4jVersion,
+ "org.slf4j" % "jcl-over-slf4j" % slf4jVersion,
+ "com.typesafe" %% "scalalogging-slf4j" % "1.0.1",
+ "commons-daemon" % "commons-daemon" % "1.0.10", // workaround for bug HADOOP-9407
+ "com.ning" % "compress-lzf" % "1.0.0",
+ "org.xerial.snappy" % "snappy-java" % "1.0.5",
+ "org.spark-project.akka" %% "akka-remote" % akkaVersion excludeAll(excludeNetty),
+ "org.spark-project.akka" %% "akka-slf4j" % akkaVersion excludeAll(excludeNetty),
+ "org.spark-project.akka" %% "akka-testkit" % akkaVersion % "test",
+ "org.json4s" %% "json4s-jackson" % "3.2.6" excludeAll(excludeScalap),
+ "it.unimi.dsi" % "fastutil" % "6.4.4",
+ "colt" % "colt" % "1.2.0",
+ "org.apache.mesos" % "mesos" % "0.13.0",
+ "commons-net" % "commons-net" % "2.2",
+ "net.java.dev.jets3t" % "jets3t" % "0.7.1" excludeAll(excludeCommonsLogging),
+ "org.apache.derby" % "derby" % "10.4.2.0" % "test",
+ "org.apache.hadoop" % hadoopClient % hadoopVersion excludeAll(excludeNetty, excludeAsm, excludeCommonsLogging, excludeSLF4J, excludeOldAsm),
+ "org.apache.curator" % "curator-recipes" % "2.4.0" excludeAll(excludeNetty),
+ "com.codahale.metrics" % "metrics-core" % codahaleMetricsVersion,
+ "com.codahale.metrics" % "metrics-jvm" % codahaleMetricsVersion,
+ "com.codahale.metrics" % "metrics-json" % codahaleMetricsVersion,
+ "com.codahale.metrics" % "metrics-graphite" % codahaleMetricsVersion,
+ "com.twitter" %% "chill" % chillVersion excludeAll(excludeAsm),
+ "com.twitter" % "chill-java" % chillVersion excludeAll(excludeAsm),
+ "org.tachyonproject" % "tachyon" % "0.4.1-thrift" excludeAll(excludeHadoop, excludeCurator, excludeEclipseJetty, excludePowermock),
+ "com.clearspring.analytics" % "stream" % "2.5.1"
--- End diff --
nit: re-add space before the last % to align it again
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1470] Use the scala-logging wrapper ins...
Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-48708675
It can't automatic test. I submit a new PR #1369.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1470] Use the scala-logging wrapper ins...
Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-46771412
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1470] Spark logger moving to use scala-...
Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:
https://github.com/apache/spark/pull/332#discussion_r12509797
--- Diff: core/src/main/scala/org/apache/spark/Logging.scala ---
@@ -116,7 +121,8 @@ trait Logging {
val log4jInitialized = LogManager.getRootLogger.getAllAppenders.hasMoreElements
if (!log4jInitialized && usingLog4j) {
val defaultLogProps = "org/apache/spark/log4j-defaults.properties"
- Option(Utils.getSparkClassLoader.getResource(defaultLogProps)) match {
+ val classLoader = this.getClass.getClassLoader
--- End diff --
Why was this change made? It's almost the same thing as Utils.getSparkClassLoader, just not using that abstraction?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: remove scalalogging-slf4j dependency
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39651255
(For what my $0.02 may be worth it seems worth it just for the consolidation.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39696780
One solution is to have these older messages for compatibility (with deprecation) but then also expose the underlying log directly to users and convert spark's code to use the new version. The `Logging` trait in Spark is public so it's likely downstream applications are using it.
It's too bad though - the whole reason `Logging` exists is to provide an indirection layer so we don't have to depend on specific implementations but that's exactly what will cause performance overhead in this case.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: remove scalalogging-slf4j dependency
Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39655050
We should use unified logging interfaces and overall logging system to migrate to scalalogging is a good idea.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: remove scalalogging-slf4j dependency
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39656388
Ah I see. If that's the case, how does this differ from the checks we already do inside of Spark's normal logging?
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Logging.scala#L54
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1470] Spark logger moving to use scala-...
Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:
https://github.com/apache/spark/pull/332#discussion_r12512463
--- Diff: project/SparkBuild.scala ---
@@ -317,6 +317,7 @@ object SparkBuild extends Build {
val excludeFastutil = ExclusionRule(organization = "it.unimi.dsi")
val excludeJruby = ExclusionRule(organization = "org.jruby")
val excludeThrift = ExclusionRule(organization = "org.apache.thrift")
+ val excludeScalalogging= ExclusionRule(organization = "com.typesafe", artifact = "scalalogging-slf4j")
--- End diff --
I have not tested this case,and I think that this code is not readable.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-1470: Spark logger moving to use scala-l...
Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-40674103
``` scala
// inline
while (i < 1000000) {
if (log.isInfoEnabled) // will only perform here more than micro
println("" + Thread.currentThread.getStackTrace().
toSeq.take(10).mkString("\\n"))
i += 1
}
// micro
while (i < 1000000) {
i += 1
}
```
Just a simple if block will not activate collect garbage.I do not think you can test out the gap.
Weekend, I wrote a more complex test code.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: remove scalalogging-slf4j dependency
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39677214
The thing is - adding a dependency is not free. It means that everyone who packages spark downstream needs to deal with a bunch of annoying stuff. For instance, this introduces a new way for us to get an unintentional version bump of slf4j in the Spark build. It also means that if users are upgrading sfl4j on their own, they might have an issue where scala-logging doesn't work against newer versions of slf4j. It means that OS packagers will have to create a new RPM/Debian package for scala-logging if one doesn't exist. Basically, my point is we should have a high bar for any dependency.
If we go with the default plan of "add it if it provides nonzero benefit" - that's way too low of a bar and would result in an unwieldy project.
Anyways on this one, maybe the best answer is to just re-write the spark logging to use scala-logging. Seems like it would be very straightforward. From my end, the main concern is just consolidating it with the normal spark logging.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39696209
Hm good call @aarondav I'm not sure if that would work. If it doesn't then would there be any other way to do this than to just directly use the scala logger?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1470][SPARK-1842] Use the scala-logging...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-50961189
QA tests have started for PR 332. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17767/consoleFull
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: remove scalalogging-slf4j dependency
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39639841
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Spark logger moving to use scala-logging
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39694868
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: remove scalalogging-slf4j dependency
Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-39676694
Very similar to what I was thinking @marmbrus -- but you wrote it up more completely than I would have. This is also why I've been moving some performance-critical proprietary code over to using Scala logging. It isn't huge, but it does make a difference, and it's generally no more difficult to use Scala logging than older methods, so I'd also like to see Spark moving to use Typesafe's library.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1470] Spark logger moving to use scala-...
Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on a diff in the pull request:
https://github.com/apache/spark/pull/332#discussion_r12512415
--- Diff: core/src/main/scala/org/apache/spark/Logging.scala ---
@@ -116,7 +121,8 @@ trait Logging {
val log4jInitialized = LogManager.getRootLogger.getAllAppenders.hasMoreElements
if (!log4jInitialized && usingLog4j) {
val defaultLogProps = "org/apache/spark/log4j-defaults.properties"
- Option(Utils.getSparkClassLoader.getResource(defaultLogProps)) match {
+ val classLoader = this.getClass.getClassLoader
--- End diff --
Uh, I'm sorry, I think this is a problem caused by the merging code.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---