You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yinxusen <gi...@git.apache.org> on 2015/12/02 18:42:26 UTC
[GitHub] spark pull request: [SPARK-12098] Cross validator with multi-arm b...
GitHub user yinxusen opened a pull request:
https://github.com/apache/spark/pull/10105
[SPARK-12098] Cross validator with multi-arm bandit search
https://issues.apache.org/jira/browse/SPARK-12098
The classic cross-validation requires all inner classifiers iterate to a fixed number of iterations, or until convergence states. It is costly especially in the massive data scenario. According to the paper [Non-stochastic Best Arm Identification and Hyperparameter Optimization](http://arxiv.org/pdf/1502.07943v1.pdf), we can see a promising way to reduce the amount of total iterations of cross-validation with multi-armed bandit search.
The multi-armed bandit search for cross-validation (bandit search for short) requires warm-start of ml algorithms, and fine-grained control of the inner behavior of the corss validator.
Since there are bunch of algorithms of bandit search to find the best parameter set, we intent to provide only a few of them in the beginning to reduce the test/perf-test work and make it more stable.
Here we only provide StaticSearch and ExponentialWeightsSearch (See chapter 3 of the [Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems](http://arxiv.org/abs/1204.5721)) in the version. More search strategies and perf-test please see https://github.com/yinxusen/spark/tree/bandit.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yinxusen/spark SPARK-12098
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10105.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10105
----
commit fa728e8068fd4ec732bf5d10bf91d9ef4c5444b2
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-10-21T20:37:24Z
add bandit search
commit 85e0de61dddde37afc2253c52284c2f7b43d1ac4
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-10-21T20:51:06Z
refine arm
commit 73835cc887b15f5474e4e346c515940d08df7a55
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-10-21T21:14:47Z
refine search
commit 32362334ada727afc7f6b68c55fd2f2e660cd4b5
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-10-21T21:16:16Z
refine search 2
commit 5f812bf8b20b7c9f9aece198d915a183442148dc
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-10-21T21:37:18Z
refine search
commit 30a948ec6c327db3035133c377ff25cb46138a92
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-10-22T18:53:24Z
refine imports
commit 68543d83baaa38c4e111ae14f00a901310824c89
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-10-22T19:02:30Z
add bandit test
commit 77c50a943931e8ed53e12c492f0cfd4866f238ca
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-10-22T20:09:32Z
fix style
commit 92aee07e6e0876e1ad7c97d58c83eba0634a9aab
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-10-22T20:26:39Z
add bandit example
commit 1db3b251ac0dab91172a35854b52ce5875333397
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-20T06:13:50Z
push with errors, talk with someone
commit 8d760c4d88fa65a629f227a74f34bdc5a9ec138b
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-24T09:01:19Z
fix type errors
commit 0bc4ebfcefd78f01162acc184adc56608082fa76
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-24T09:02:58Z
Merge branch 'master' into bandit
commit f7d7f53b24f388966735bf5cedda6122e40420e6
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-24T10:11:33Z
add package object
commit 9285367f03c88cb71190fa483402c1f3cd7ab6bb
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-24T15:00:57Z
a runnable strong type check version
commit e240b3ecf92f45f9b9912d3e64bf7a824670ebda
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-24T16:07:10Z
change arm into common class
commit 4b8c441ffaa633b8a46a564e17e7b7feb65f28e7
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-24T16:14:44Z
fix all search strategies
commit 36302898d86d691b653440476387809f842c4a84
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-24T16:18:21Z
fix code style
commit 49d13a6863f11382a6e16cae39f4e65741e90d1c
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-24T16:23:36Z
fix style
commit 988c27e0eca0c920019a2ef929365e54b75f31d6
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-24T16:39:02Z
add comments
commit e97236623e0566f654f28e3ac220429154076988
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-29T07:23:20Z
fix errors
commit ec21779ec7783e4d05163daae5c3d6a828df9b7e
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-29T07:41:58Z
fix small bugs
commit 98ea474dc5b5d812ed1ddda006a7a6e3d2395408
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-29T14:48:39Z
add perf test
commit 47647ede5b5e9341c61086f1b4699f45107c6050
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-29T15:23:04Z
fix search and num of iterations
commit 4e6f6dc8761e62feeb3290ef19987b94a375fc53
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-29T16:11:08Z
get records of each iteration
commit 370193a3907d123d3b676da7b3dd3c020b46fcaa
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-29T16:41:20Z
add summary
commit 0112d800d8b57bd137e836372e20340f2fef411d
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-29T17:13:03Z
refine it
commit b63b0c32a4487a014838f7333714288f00608906
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-11-29T17:25:37Z
fix errors
commit e54926a880d3c9aab4fc7afb0becfee8ec137336
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-12-02T10:29:03Z
add for painting
commit be270a7d5b87ae198234f9ebb2aa37ee0030de67
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-12-02T14:58:43Z
add more comments
commit 2b7064a59c261972e6bea73decf10c713faebef4
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-12-02T15:09:51Z
fix typo
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12098] Cross validator with multi-arm b...
Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/10105#issuecomment-169250966
@jkbradley Get it and close it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12098] Cross validator with multi-arm b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10105#issuecomment-161392711
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47075/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12098] Cross validator with multi-arm b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10105#issuecomment-161392614
**[Test build #47075 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47075/consoleFull)** for PR 10105 at commit [`e261872`](https://github.com/apache/spark/commit/e26187206e73f2e98ad38e31cf89b119fa2b5390).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:\n * `trait BanditValidatorParams extends ValidatorParams with HasMaxIter `\n * `class BanditValidator(override val uid: String)`\n * `class Arm(`\n * `trait Controllable extends Params with HasMaxIter `\n * `abstract class Search `\n * `class StaticSearch extends Search `\n * `class ExponentialWeightsSearch extends Search `\n
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12098] Cross validator with multi-arm b...
Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/10105#issuecomment-169251108
Close it for more feedbacks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12098] Cross validator with multi-arm b...
Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/10105#issuecomment-161497171
The error is caused by my modification in `LogisticRegression`. I add `Controllable` to it without implementing the save/load for new params.
However, this is not the right place to add those functionalities here, I plan to add warm-start to estimators in other JIRA issue. Then go back and fix the error here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12098] Cross validator with multi-arm b...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10105#issuecomment-161392709
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12098] Cross validator with multi-arm b...
Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen closed the pull request at:
https://github.com/apache/spark/pull/10105
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12098] Cross validator with multi-arm b...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/10105#issuecomment-169104918
@yinxusen I just commented on the JIRA about this, but could we please close this issue for now? I'd like to postpone this feature due to limited review bandwidth. But definitely post it as a Spark package, and see if you can get some feedback from users. Thanks very much.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12098] Cross validator with multi-arm b...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10105#issuecomment-161381057
**[Test build #47075 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47075/consoleFull)** for PR 10105 at commit [`e261872`](https://github.com/apache/spark/commit/e26187206e73f2e98ad38e31cf89b119fa2b5390).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org