You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@systemml.apache.org by Alexandre V Evfimievski <ev...@us.ibm.com> on 2017/08/23 20:04:41 UTC

Re: Bayesian optimizer support for SystemML.

Hi Janardhan,

The number of parameters could be rather large, that's certainly an issue 
for Bayesian Optimization.  A perfect implementation would, perhaps, pick 
a sample of parameters and a sample of the dataset for every iteration. It 
seems that Sobol sequences require generating primitive polynomials of 
large degree.  What is better: a higher-dimensional B.O., or a 
lower-dimensional one combined with parameter sampling?  Probably the 
latter.  By the way, in cases where parameters feed into heuristics, there 
may be considerable independence across the set of parameters, especially 
when conditioned by a specific dataset record.  Each heuristic targets 
certain situations that arise in some records.  Not sure how to take 
advantage of this.

Thanks,
Sasha

From:   Janardhan Pulivarthi <ja...@gmail.com>
To:     Alexandre V Evfimievski <ev...@us.ibm.com>, npansar@us.ibm.com, 
dev@systemml.apache.org
Date:   08/10/2017 09:39 AM
Subject:        Re: Bayesian optimizer support for SystemML.

Hi Sasha,

And one more thing, I would like to ask, what are you thinking about 
`sobol` function. What is the dimension requirement and pattern of 
sampling?. Please help me understand, what are the tasks exactly that we 
are going to optimize, in SystemML.

Surrogate slice sampling - What are your thoughts about it.

Thank you very much,
Janardhan 

On Wed, Jul 26, 2017 at 12:25 AM, Alexandre V Evfimievski <
evfimi@us.ibm.com> wrote:
Hi, Janardhan,

We are still studying Bayesian Optimization (B.O.), you are ahead of us!  
Just one comment:  The "black box" loss function that is being optimized 
is not always totally black.  Sometimes it is a sum of many small 
black-box functions.  Suppose we want to train a complex system with many 
parameters over a large dataset.  The system involves many heuristics, and 
the parameters feed into these heuristics.  We want to minimize a loss 
function, which is a sum of individual losses per each data record.  We 
want to use B.O. to find an optimal vector of parameters.  The parameters 
affect the system's behavior in complex ways and do not allow for the 
computation of a gradient.  However, because the loss is a sum of many 
losses, when running B.O., we have a choice: either to run each test over 
the entire dataset, or to run over a small sample of the dataset (but try 
more parameter vectors per hour, say).  The smaller the sample, the higher 
the variance of the loss.  Not sure which implementation of B.O. is the 
best to handle such a case.

Thanks,
Alexandre (Sasha)

From:        Janardhan Pulivarthi <ja...@gmail.com>
To:        dev@systemml.apache.org
Date:        07/25/2017 10:33 AM
Subject:        Re: Bayesian optimizer support for SystemML.

Hi Niketan and Mike,

As we are trying to implement this Bayesian Optimization, should we take
input from more committers as well as this optimizer approach seems to 
have
a couple of ways to implement. We may need to find out which suits us the
best.

Thanks,
Janardhan

On Sat, Jul 22, 2017 at 3:41 PM, Janardhan Pulivarthi <
janardhan.pulivarthi@gmail.com> wrote:

> Dear committers,
>
> We will be planning to add bayesian optimizer support for both the ML 
and
> Deep learning tasks for the SystemML. Relevant jira link:
> https://issues.apache.org/jira/browse/SYSTEMML-979
>
> The following is a simple outline of how we are going implement it. 
Please
> feel free to make any kind of changes. In this google docs link:
> http://bit.do/systemml-bayesian
>
> Description:
>
> Bayesian optimization is a sequential design strategy for global
> optimization of black-box functions that doesn’t require derivatives.
>
> Process:
>
>    1.
>
>    First we select a point that will be the best as far as the no. of
>    iterations that has happened.
>    2.
>
>    Candidate point selection with sampling from Sobol quasirandom
>    sequence generator the space.
>    3.
>
>    Gaussian process hyperparameter sampling with surrogate slice 
sampling
>    method.
>
>
> Components:
>
>    1.
>
>    Selecting the next point to Evaluate.
>
> [image: nextpoint.PNG]
>
> We specify a uniform prior for the mean, m, and width 2 top-hat priors 
for
> each of the D length scale parameters. As we expect the observation 
noise
> generally to be close to or exactly zero, v(nu) is given a horseshoe
> prior. The covariance amplitude theta0 is given a zero mean, unit 
variance
> lognormal prior, theta0 ~ ln N (0, 1).
>
>
>
>    1.
>
>    Generation of QuasiRandom Sobol Sequence.
>
> Which kind of sobol patterns are needed?
>
> [image: sobol patterns.PNG]
>
> How many dimensions do we need?
>
> This paper argues that its generation target dimension is 21201. [pdf 
link
> <
https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%20constructing.pdf
>
> ]
>
>
>
>    1.
>
>    Surrogate Slice Sampling.
>
> [image: surrogate data sampling.PNG]
>
>
> References:
>
> 1. For the next point to evaluate:
>
> https://papers.nips.cc/paper/4522-practical-bayesian-

> optimization-of-machine-learning-algorithms.pdf
>
>  http://www.dmi.usherb.ca/~larocheh/publications/gpopt_nips_appendix.pdf
>
>
> 2. QuasiRandom Sobol Sequence Generator:
>
> https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%
> 20constructing.pdf
>
>
> 3. Surrogate Slice Sampling:
>
> http://homepages.inf.ed.ac.uk/imurray2/pub/10hypers/hypers.pdf
>
>
>
> Thank you so much,
>
> Janardhan
>
>
>
>

Re: Bayesian optimizer support for SystemML.

Posted by Niketan Pansare <np...@us.ibm.com>.

Hi Janardhan,

My 2 cents: I would recommend first creating a script that runs satisfactorily with respect to accuracy when compared to other R packages. For example: let's try to run http://blog.revolutionanalytics.com/2016/06/bayesian-optimization-of-machine-learning-models.html

Then we can consider making it efficient in terms of memory and performance as well as try it on neural network. We can definitely consider it for 1.0, but won't classify it as blocking feature :)

Thanks,

Niketan 

> On Sep 4, 2017, at 10:54 AM, Janardhan Pulivarthi <ja...@gmail.com> wrote:
> 
> Hi Sasha, Niketan, and Mike, (sorry, if I missed out on someone)
> 
> So far we have encountered some problems and situations where we need some
> more thinking. But, until then let us start a preliminary script, for
> checking different scenarios with our existing top level algorithms and
> deep learning algorithms.
> 
> Along with the previously proposed ones, we can try
> 1. The constraints (both constrained & unconstrained)
> 
> 2. Convergence rate check, may be for settling at a prior (and our
> convergence criteria, based upon  Convergence Rates for Efficient Global
> Optimization Algorithms: https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_pdf_1101.3501v3.pdf&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=cBK_uPTXiS6wJosPiOeHih9xntdvGTpypL9z57EBwhc&s=zHC42K_Iib6am-ds3klIvsXNfnma5DHni3gT3S-nKAw&e=  )
> 
> May be we could implement some priors, instead of one particularly.
> 
> 
> I am planning to keep my schedule free for a month to only focus on this
> implementation. Owing to its importance for the neural networks where we
> need less memory consumption especially to fit into the GPUs, It would be
> great if we could ship this with `1.0` release.
> 
> *Design document: *https://urldefense.proofpoint.com/v2/url?u=http-3A__bit.do_systemml-2Dbayesian&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=cBK_uPTXiS6wJosPiOeHih9xntdvGTpypL9z57EBwhc&s=VTz1FwedfWF26arhh9x7h_NSQeMm-B7w4EM7Sw3f8MA&e= 
> 
> Thanks you very much,
> Janardhan
> 
> 
> 
> On Wed, Aug 23, 2017 at 4:04 PM, Alexandre V Evfimievski <ev...@us.ibm.com>
> wrote:
> 
>> Hi Janardhan,
>> 
>> The number of parameters could be rather large, that's certainly an issue
>> for Bayesian Optimization.  A perfect implementation would, perhaps, pick a
>> sample of parameters and a sample of the dataset for every iteration.  It
>> seems that Sobol sequences require generating primitive polynomials of
>> large degree.  What is better: a higher-dimensional B.O., or a
>> lower-dimensional one combined with parameter sampling?  Probably the
>> latter.  By the way, in cases where parameters feed into heuristics, there
>> may be considerable independence across the set of parameters, especially
>> when conditioned by a specific dataset record.  Each heuristic targets
>> certain situations that arise in some records.  Not sure how to take
>> advantage of this.
>> 
>> Thanks,
>> Sasha
>> 
>> 
>> 
>> From:        Janardhan Pulivarthi <ja...@gmail.com>
>> To:        Alexandre V Evfimievski <ev...@us.ibm.com>, npansar@us.ibm.com,
>> dev@systemml.apache.org
>> Date:        08/10/2017 09:39 AM
>> 
>> Subject:        Re: Bayesian optimizer support for SystemML.
>> ------------------------------
>> 
>> 
>> 
>> Hi Sasha,
>> 
>> And one more thing, I would like to ask, what are you thinking about
>> `sobol` function. What is the dimension requirement and pattern of
>> sampling?. Please help me understand, what are the tasks exactly that we
>> are going to optimize, in SystemML.
>> 
>> Surrogate slice sampling - What are your thoughts about it.
>> 
>> Thank you very much,
>> Janardhan
>> 
>> On Wed, Jul 26, 2017 at 12:25 AM, Alexandre V Evfimievski <
>> *evfimi@us.ibm.com* <ev...@us.ibm.com>> wrote:
>> Hi, Janardhan,
>> 
>> We are still studying Bayesian Optimization (B.O.), you are ahead of us!
>> Just one comment:  The "black box" loss function that is being optimized is
>> not always totally black.  Sometimes it is a sum of many small black-box
>> functions.  Suppose we want to train a complex system with many parameters
>> over a large dataset.  The system involves many heuristics, and the
>> parameters feed into these heuristics.  We want to minimize a loss
>> function, which is a sum of individual losses per each data record.  We
>> want to use B.O. to find an optimal vector of parameters.  The parameters
>> affect the system's behavior in complex ways and do not allow for the
>> computation of a gradient.  However, because the loss is a sum of many
>> losses, when running B.O., we have a choice: either to run each test over
>> the entire dataset, or to run over a small sample of the dataset (but try
>> more parameter vectors per hour, say).  The smaller the sample, the higher
>> the variance of the loss.  Not sure which implementation of B.O. is the
>> best to handle such a case.
>> 
>> Thanks,
>> Alexandre (Sasha)
>> 
>> 
>> 
>> From:        Janardhan Pulivarthi <*janardhan.pulivarthi@gmail.com*
>> <ja...@gmail.com>>
>> To:        *dev@systemml.apache.org* <de...@systemml.apache.org>
>> Date:        07/25/2017 10:33 AM
>> Subject:        Re: Bayesian optimizer support for SystemML.
>> ------------------------------
>> 
>> 
>> 
>> Hi Niketan and Mike,
>> 
>> As we are trying to implement this Bayesian Optimization, should we take
>> input from more committers as well as this optimizer approach seems to have
>> a couple of ways to implement. We may need to find out which suits us the
>> best.
>> 
>> Thanks,
>> Janardhan
>> 
>> On Sat, Jul 22, 2017 at 3:41 PM, Janardhan Pulivarthi <
>> *janardhan.pulivarthi@gmail.com* <ja...@gmail.com>> wrote:
>> 
>>> Dear committers,
>>> 
>>> We will be planning to add bayesian optimizer support for both the ML and
>>> Deep learning tasks for the SystemML. Relevant jira link:
>>> *https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SYSTEMML-2D979-2A&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=cBK_uPTXiS6wJosPiOeHih9xntdvGTpypL9z57EBwhc&s=oQw4dhQZGB4GQAm3r4YuKEs0bMzNe6sFPIBg1pYoQvE&e= 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SYSTEMML-2D979&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=cBK_uPTXiS6wJosPiOeHih9xntdvGTpypL9z57EBwhc&s=vOFmVUMaKYStl4tzJSAa9ijfPnZ2NE-FhrtUvBv_9N4&e= >
>>> 
>>> The following is a simple outline of how we are going implement it.
>> Please
>>> feel free to make any kind of changes. In this google docs link:
>>> *https://urldefense.proofpoint.com/v2/url?u=http-3A__bit.do_systemml-2Dbayesian-2A&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=cBK_uPTXiS6wJosPiOeHih9xntdvGTpypL9z57EBwhc&s=4-BfTG3VXrLAZ95oBC3jZxDXhg77KgbEZSRL6F9kZVo&e=  <https://urldefense.proofpoint.com/v2/url?u=http-3A__bit.do_systemml-2Dbayesian&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=cBK_uPTXiS6wJosPiOeHih9xntdvGTpypL9z57EBwhc&s=VTz1FwedfWF26arhh9x7h_NSQeMm-B7w4EM7Sw3f8MA&e= >
>>> 
>>> Description:
>>> 
>>> Bayesian optimization is a sequential design strategy for global
>>> optimization of black-box functions that doesn’t require derivatives.
>>> 
>>> Process:
>>> 
>>>   1.
>>> 
>>>   First we select a point that will be the best as far as the no. of
>>>   iterations that has happened.
>>>   2.
>>> 
>>>   Candidate point selection with sampling from Sobol quasirandom
>>>   sequence generator the space.
>>>   3.
>>> 
>>>   Gaussian process hyperparameter sampling with surrogate slice sampling
>>>   method.
>>> 
>>> 
>>> Components:
>>> 
>>>   1.
>>> 
>>>   Selecting the next point to Evaluate.
>>> 
>>> [image: nextpoint.PNG]
>>> 
>>> We specify a uniform prior for the mean, m, and width 2 top-hat priors
>> for
>>> each of the D length scale parameters. As we expect the observation noise
>>> generally to be close to or exactly zero, v(nu) is given a horseshoe
>>> prior. The covariance amplitude theta0 is given a zero mean, unit
>> variance
>>> lognormal prior, theta0 ~ ln N (0, 1).
>>> 
>>> 
>>> 
>>>   1.
>>> 
>>>   Generation of QuasiRandom Sobol Sequence.
>>> 
>>> Which kind of sobol patterns are needed?
>>> 
>>> [image: sobol patterns.PNG]
>>> 
>>> How many dimensions do we need?
>>> 
>>> This paper argues that its generation target dimension is 21201. [pdf
>> link
>>> <
>> *https://urldefense.proofpoint.com/v2/url?u=https-3A__researchcommons.waikato.ac.nz_bitstream_handle_10289_967_Joe-2520constructing.pdf-2A&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=cBK_uPTXiS6wJosPiOeHih9xntdvGTpypL9z57EBwhc&s=CNYhZq2zp0DQ2WjO1mo2Zx0WlVJPYI95a9_VROKswS0&e= 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__researchcommons.waikato.ac.nz_bitstream_handle_10289_967_Joe-2520constructing.pdf&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=cBK_uPTXiS6wJosPiOeHih9xntdvGTpypL9z57EBwhc&s=Pa9Vx7Tfm3jNrkp8SSqP8Srv6q8KXPjTfWNsCWFvaYs&e= >
>>> 
>>> ]
>>> 
>>> 
>>> 
>>>   1.
>>> 
>>>   Surrogate Slice Sampling.
>>> 
>>> [image: surrogate data sampling.PNG]
>>> 
>>> 
>>> References:
>>> 
>>> 1. For the next point to evaluate:
>>> 
>>> *https://urldefense.proofpoint.com/v2/url?u=https-3A__papers.nips.cc_paper_4522-2Dpractical-2Dbayesian-2D-2A&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=cBK_uPTXiS6wJosPiOeHih9xntdvGTpypL9z57EBwhc&s=UfKt8IEcBfNuUpOUKauYQqLNK72E_olYw0gn02PuEVk&e= 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__papers.nips.cc_paper_4522-2Dpractical-2Dbayesian-2D&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=cBK_uPTXiS6wJosPiOeHih9xntdvGTpypL9z57EBwhc&s=Fg2_xYvGsV5r0VGpUhAfGvxhk4YkamhD8Pg75y9kAkw&e= >
>> 
>>> optimization-of-machine-learning-algorithms.pdf
>>> 
>>> 
>> *https://urldefense.proofpoint.com/v2/url?u=http-3A__www.dmi.usherb.ca_-7Elarocheh_publications_gpopt-5Fnips-5Fappendix.pdf-2A&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=cBK_uPTXiS6wJosPiOeHih9xntdvGTpypL9z57EBwhc&s=BdXE2w9XBqVWpStrp-iw7TFhMY5VDLqTbJEAAIuwJMM&e= 
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.dmi.usherb.ca_-7Elarocheh_publications_gpopt-5Fnips-5Fappendix.pdf&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=cBK_uPTXiS6wJosPiOeHih9xntdvGTpypL9z57EBwhc&s=c0jba2G-CHy2qhNE3fY6FsHDPyrnB_43u4B-So2lX1I&e= >
>>> 
>>> 
>>> 2. QuasiRandom Sobol Sequence Generator:
>>> 
>>> *https://urldefense.proofpoint.com/v2/url?u=https-3A__researchcommons.waikato.ac.nz_bitstream_handle_10289_967_Joe-25-2A&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=cBK_uPTXiS6wJosPiOeHih9xntdvGTpypL9z57EBwhc&s=Lyzs2gNheBr8sFyBOfIFQytZwlJzgX_mx3ZBHjNpnUA&e= 
>>> 20constructing.pdf
>>> 
>>> 
>>> 3. Surrogate Slice Sampling:
>>> 
>>> *https://urldefense.proofpoint.com/v2/url?u=http-3A__homepages.inf.ed.ac.uk_imurray2_pub_10hypers_hypers.pdf-2A&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=cBK_uPTXiS6wJosPiOeHih9xntdvGTpypL9z57EBwhc&s=TLTkqi4jYF-cWO_65iKk5pcgeli4abtxS5XYIBWznII&e= 
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__homepages.inf.ed.ac.uk_imurray2_pub_10hypers_hypers.pdf&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0&m=cBK_uPTXiS6wJosPiOeHih9xntdvGTpypL9z57EBwhc&s=ZZ1m_VehOkfdJZctmsKkkWDZYyp-u3V0OEd_GHEZBGs&e= >
>>> 
>>> 
>>> 
>>> Thank you so much,
>>> 
>>> Janardhan
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>

Re: Bayesian optimizer support for SystemML.

Posted by Janardhan Pulivarthi <ja...@gmail.com>.

Hi Sasha, Niketan, and Mike, (sorry, if I missed out on someone)

So far we have encountered some problems and situations where we need some
more thinking. But, until then let us start a preliminary script, for
checking different scenarios with our existing top level algorithms and
deep learning algorithms.

Along with the previously proposed ones, we can try
1. The constraints (both constrained & unconstrained)

2. Convergence rate check, may be for settling at a prior (and our
convergence criteria, based upon  Convergence Rates for Efficient Global
Optimization Algorithms: https://arxiv.org/pdf/1101.3501v3.pdf )

May be we could implement some priors, instead of one particularly.


I am planning to keep my schedule free for a month to only focus on this
implementation. Owing to its importance for the neural networks where we
need less memory consumption especially to fit into the GPUs, It would be
great if we could ship this with `1.0` release.

*Design document: *http://bit.do/systemml-bayesian

Thanks you very much,
Janardhan



On Wed, Aug 23, 2017 at 4:04 PM, Alexandre V Evfimievski <ev...@us.ibm.com>
wrote:

> Hi Janardhan,
>
> The number of parameters could be rather large, that's certainly an issue
> for Bayesian Optimization.  A perfect implementation would, perhaps, pick a
> sample of parameters and a sample of the dataset for every iteration.  It
> seems that Sobol sequences require generating primitive polynomials of
> large degree.  What is better: a higher-dimensional B.O., or a
> lower-dimensional one combined with parameter sampling?  Probably the
> latter.  By the way, in cases where parameters feed into heuristics, there
> may be considerable independence across the set of parameters, especially
> when conditioned by a specific dataset record.  Each heuristic targets
> certain situations that arise in some records.  Not sure how to take
> advantage of this.
>
> Thanks,
> Sasha
>
>
>
> From:        Janardhan Pulivarthi <ja...@gmail.com>
> To:        Alexandre V Evfimievski <ev...@us.ibm.com>, npansar@us.ibm.com,
> dev@systemml.apache.org
> Date:        08/10/2017 09:39 AM
>
> Subject:        Re: Bayesian optimizer support for SystemML.
> ------------------------------
>
>
>
> Hi Sasha,
>
> And one more thing, I would like to ask, what are you thinking about
> `sobol` function. What is the dimension requirement and pattern of
> sampling?. Please help me understand, what are the tasks exactly that we
> are going to optimize, in SystemML.
>
> Surrogate slice sampling - What are your thoughts about it.
>
> Thank you very much,
> Janardhan
>
> On Wed, Jul 26, 2017 at 12:25 AM, Alexandre V Evfimievski <
> *evfimi@us.ibm.com* <ev...@us.ibm.com>> wrote:
> Hi, Janardhan,
>
> We are still studying Bayesian Optimization (B.O.), you are ahead of us!
> Just one comment:  The "black box" loss function that is being optimized is
> not always totally black.  Sometimes it is a sum of many small black-box
> functions.  Suppose we want to train a complex system with many parameters
> over a large dataset.  The system involves many heuristics, and the
> parameters feed into these heuristics.  We want to minimize a loss
> function, which is a sum of individual losses per each data record.  We
> want to use B.O. to find an optimal vector of parameters.  The parameters
> affect the system's behavior in complex ways and do not allow for the
> computation of a gradient.  However, because the loss is a sum of many
> losses, when running B.O., we have a choice: either to run each test over
> the entire dataset, or to run over a small sample of the dataset (but try
> more parameter vectors per hour, say).  The smaller the sample, the higher
> the variance of the loss.  Not sure which implementation of B.O. is the
> best to handle such a case.
>
> Thanks,
> Alexandre (Sasha)
>
>
>
> From:        Janardhan Pulivarthi <*janardhan.pulivarthi@gmail.com*
> <ja...@gmail.com>>
> To:        *dev@systemml.apache.org* <de...@systemml.apache.org>
> Date:        07/25/2017 10:33 AM
> Subject:        Re: Bayesian optimizer support for SystemML.
> ------------------------------
>
>
>
> Hi Niketan and Mike,
>
> As we are trying to implement this Bayesian Optimization, should we take
> input from more committers as well as this optimizer approach seems to have
> a couple of ways to implement. We may need to find out which suits us the
> best.
>
> Thanks,
> Janardhan
>
> On Sat, Jul 22, 2017 at 3:41 PM, Janardhan Pulivarthi <
> *janardhan.pulivarthi@gmail.com* <ja...@gmail.com>> wrote:
>
> > Dear committers,
> >
> > We will be planning to add bayesian optimizer support for both the ML and
> > Deep learning tasks for the SystemML. Relevant jira link:
> > *https://issues.apache.org/jira/browse/SYSTEMML-979*
> <https://issues.apache.org/jira/browse/SYSTEMML-979>
> >
> > The following is a simple outline of how we are going implement it.
> Please
> > feel free to make any kind of changes. In this google docs link:
> > *http://bit.do/systemml-bayesian* <http://bit.do/systemml-bayesian>
> >
> > Description:
> >
> > Bayesian optimization is a sequential design strategy for global
> > optimization of black-box functions that doesn’t require derivatives.
> >
> > Process:
> >
> >    1.
> >
> >    First we select a point that will be the best as far as the no. of
> >    iterations that has happened.
> >    2.
> >
> >    Candidate point selection with sampling from Sobol quasirandom
> >    sequence generator the space.
> >    3.
> >
> >    Gaussian process hyperparameter sampling with surrogate slice sampling
> >    method.
> >
> >
> > Components:
> >
> >    1.
> >
> >    Selecting the next point to Evaluate.
> >
> > [image: nextpoint.PNG]
> >
> > We specify a uniform prior for the mean, m, and width 2 top-hat priors
> for
> > each of the D length scale parameters. As we expect the observation noise
> > generally to be close to or exactly zero, v(nu) is given a horseshoe
> > prior. The covariance amplitude theta0 is given a zero mean, unit
> variance
> > lognormal prior, theta0 ~ ln N (0, 1).
> >
> >
> >
> >    1.
> >
> >    Generation of QuasiRandom Sobol Sequence.
> >
> > Which kind of sobol patterns are needed?
> >
> > [image: sobol patterns.PNG]
> >
> > How many dimensions do we need?
> >
> > This paper argues that its generation target dimension is 21201. [pdf
> link
> > <
> *https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%20constructing.pdf*
> <https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%20constructing.pdf>
> >
> > ]
> >
> >
> >
> >    1.
> >
> >    Surrogate Slice Sampling.
> >
> > [image: surrogate data sampling.PNG]
> >
> >
> > References:
> >
> > 1. For the next point to evaluate:
> >
> > *https://papers.nips.cc/paper/4522-practical-bayesian-*
> <https://papers.nips.cc/paper/4522-practical-bayesian->
>
> > optimization-of-machine-learning-algorithms.pdf
> >
> >
> *http://www.dmi.usherb.ca/~larocheh/publications/gpopt_nips_appendix.pdf*
> <http://www.dmi.usherb.ca/~larocheh/publications/gpopt_nips_appendix.pdf>
> >
> >
> > 2. QuasiRandom Sobol Sequence Generator:
> >
> > *https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%*
> > 20constructing.pdf
> >
> >
> > 3. Surrogate Slice Sampling:
> >
> > *http://homepages.inf.ed.ac.uk/imurray2/pub/10hypers/hypers.pdf*
> <http://homepages.inf.ed.ac.uk/imurray2/pub/10hypers/hypers.pdf>
> >
> >
> >
> > Thank you so much,
> >
> > Janardhan
> >
> >
> >
> >
>
>
>
>
>
>
>

Re: Bayesian optimizer support for SystemML.

Posted by Janardhan Pulivarthi <ja...@gmail.com>.

Hi Sasha,

I believe, when slice sampling, if the slice is not narrow enough as shown
in the left side graphs, there is a possibility that we are going to escape
this region of the objective function. Please find the fig. 1 of the
attached paper (if you haven't seen). So, after a good number of runs the
slice sampling doesn't seem to improve but at `t = 60` the custom algorithm
discussed in the paper seems to give a good result [ right side graphs].

Not sure whether we have this kind of objective functions in our algorithms
!





Thank you very much,
Janardhan


On Tue, Sep 12, 2017 at 2:02 PM, Janardhan Pulivarthi <
janardhan.pulivarthi@gmail.com> wrote:

> Hi Sasha,
>
> 1. According to clause 8.2.2 in paper attached, the author recommends
> lower-dimensional B.O.
> 2. It seems in most of the cases the small dimension for the sobol, seems
> to be sufficient.
> 3. About considering parameters independence which feed into heuristics, I
> dropped a mail to *Prof. Ryan P. Adams* and hoping a response soon.
>
> I am implementing a preliminary script as Niketan pointed out and let you
> know, once I complete the skeleton.
>
> Thanks,
> Janardhan
>
> On Wed, Aug 23, 2017 at 4:04 PM, Alexandre V Evfimievski <
> evfimi@us.ibm.com> wrote:
>
>> Hi Janardhan,
>>
>> The number of parameters could be rather large, that's certainly an issue
>> for Bayesian Optimization.  A perfect implementation would, perhaps, pick a
>> sample of parameters and a sample of the dataset for every iteration.  It
>> seems that Sobol sequences require generating primitive polynomials of
>> large degree.  What is better: a higher-dimensional B.O., or a
>> lower-dimensional one combined with parameter sampling?  Probably the
>> latter.  By the way, in cases where parameters feed into heuristics, there
>> may be considerable independence across the set of parameters, especially
>> when conditioned by a specific dataset record.  Each heuristic targets
>> certain situations that arise in some records.  Not sure how to take
>> advantage of this.
>>
>> Thanks,
>> Sasha
>>
>>
>>
>> From:        Janardhan Pulivarthi <ja...@gmail.com>
>> To:        Alexandre V Evfimievski <ev...@us.ibm.com>,
>> npansar@us.ibm.com, dev@systemml.apache.org
>> Date:        08/10/2017 09:39 AM
>>
>> Subject:        Re: Bayesian optimizer support for SystemML.
>> ------------------------------
>>
>>
>>
>> Hi Sasha,
>>
>> And one more thing, I would like to ask, what are you thinking about
>> `sobol` function. What is the dimension requirement and pattern of
>> sampling?. Please help me understand, what are the tasks exactly that we
>> are going to optimize, in SystemML.
>>
>> Surrogate slice sampling - What are your thoughts about it.
>>
>> Thank you very much,
>> Janardhan
>>
>> On Wed, Jul 26, 2017 at 12:25 AM, Alexandre V Evfimievski <
>> *evfimi@us.ibm.com* <ev...@us.ibm.com>> wrote:
>> Hi, Janardhan,
>>
>> We are still studying Bayesian Optimization (B.O.), you are ahead of us!
>> Just one comment:  The "black box" loss function that is being optimized is
>> not always totally black.  Sometimes it is a sum of many small black-box
>> functions.  Suppose we want to train a complex system with many parameters
>> over a large dataset.  The system involves many heuristics, and the
>> parameters feed into these heuristics.  We want to minimize a loss
>> function, which is a sum of individual losses per each data record.  We
>> want to use B.O. to find an optimal vector of parameters.  The parameters
>> affect the system's behavior in complex ways and do not allow for the
>> computation of a gradient.  However, because the loss is a sum of many
>> losses, when running B.O., we have a choice: either to run each test over
>> the entire dataset, or to run over a small sample of the dataset (but try
>> more parameter vectors per hour, say).  The smaller the sample, the higher
>> the variance of the loss.  Not sure which implementation of B.O. is the
>> best to handle such a case.
>>
>> Thanks,
>> Alexandre (Sasha)
>>
>>
>>
>> From:        Janardhan Pulivarthi <*janardhan.pulivarthi@gmail.com*
>> <ja...@gmail.com>>
>> To:        *dev@systemml.apache.org* <de...@systemml.apache.org>
>> Date:        07/25/2017 10:33 AM
>> Subject:        Re: Bayesian optimizer support for SystemML.
>> ------------------------------
>>
>>
>>
>> Hi Niketan and Mike,
>>
>> As we are trying to implement this Bayesian Optimization, should we take
>> input from more committers as well as this optimizer approach seems to
>> have
>> a couple of ways to implement. We may need to find out which suits us the
>> best.
>>
>> Thanks,
>> Janardhan
>>
>> On Sat, Jul 22, 2017 at 3:41 PM, Janardhan Pulivarthi <
>> *janardhan.pulivarthi@gmail.com* <ja...@gmail.com>> wrote:
>>
>> > Dear committers,
>> >
>> > We will be planning to add bayesian optimizer support for both the ML
>> and
>> > Deep learning tasks for the SystemML. Relevant jira link:
>> > *https://issues.apache.org/jira/browse/SYSTEMML-979*
>> <https://issues.apache.org/jira/browse/SYSTEMML-979>
>> >
>> > The following is a simple outline of how we are going implement it.
>> Please
>> > feel free to make any kind of changes. In this google docs link:
>> > *http://bit.do/systemml-bayesian* <http://bit.do/systemml-bayesian>
>> >
>> > Description:
>> >
>> > Bayesian optimization is a sequential design strategy for global
>> > optimization of black-box functions that doesn’t require derivatives.
>> >
>> > Process:
>> >
>> >    1.
>> >
>> >    First we select a point that will be the best as far as the no. of
>> >    iterations that has happened.
>> >    2.
>> >
>> >    Candidate point selection with sampling from Sobol quasirandom
>> >    sequence generator the space.
>> >    3.
>> >
>> >    Gaussian process hyperparameter sampling with surrogate slice
>> sampling
>> >    method.
>> >
>> >
>> > Components:
>> >
>> >    1.
>> >
>> >    Selecting the next point to Evaluate.
>> >
>> > [image: nextpoint.PNG]
>> >
>> > We specify a uniform prior for the mean, m, and width 2 top-hat priors
>> for
>> > each of the D length scale parameters. As we expect the observation
>> noise
>> > generally to be close to or exactly zero, v(nu) is given a horseshoe
>> > prior. The covariance amplitude theta0 is given a zero mean, unit
>> variance
>> > lognormal prior, theta0 ~ ln N (0, 1).
>> >
>> >
>> >
>> >    1.
>> >
>> >    Generation of QuasiRandom Sobol Sequence.
>> >
>> > Which kind of sobol patterns are needed?
>> >
>> > [image: sobol patterns.PNG]
>> >
>> > How many dimensions do we need?
>> >
>> > This paper argues that its generation target dimension is 21201. [pdf
>> link
>> > <
>> *https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%20constructing.pdf*
>> <https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%20constructing.pdf>
>> >
>> > ]
>> >
>> >
>> >
>> >    1.
>> >
>> >    Surrogate Slice Sampling.
>> >
>> > [image: surrogate data sampling.PNG]
>> >
>> >
>> > References:
>> >
>> > 1. For the next point to evaluate:
>> >
>> > *https://papers.nips.cc/paper/4522-practical-bayesian-*
>> <https://papers.nips.cc/paper/4522-practical-bayesian->
>>
>> > optimization-of-machine-learning-algorithms.pdf
>> >
>> >
>> *http://www.dmi.usherb.ca/~larocheh/publications/gpopt_nips_appendix.pdf*
>> <http://www.dmi.usherb.ca/~larocheh/publications/gpopt_nips_appendix.pdf>
>> >
>> >
>> > 2. QuasiRandom Sobol Sequence Generator:
>> >
>> > *https://researchcommons.waikato.ac.nz/bitstream/handle/10289/967/Joe%*
>> > 20constructing.pdf
>> >
>> >
>> > 3. Surrogate Slice Sampling:
>> >
>> > *http://homepages.inf.ed.ac.uk/imurray2/pub/10hypers/hypers.pdf*
>> <http://homepages.inf.ed.ac.uk/imurray2/pub/10hypers/hypers.pdf>
>> >
>> >
>> >
>> > Thank you so much,
>> >
>> > Janardhan
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>
>>
>>
>